Video description
Apache Hadoop is a freely available open source tool-set that
enables big data analysis. This Hadoop Fundamentals LiveLessons
tutorial demonstrates the core components of Hadoop including
Hadoop Distriuted File Systems (HDFS) and MapReduce. In addition,
the tutorial demonstrates how to use Hadoop at several levels
including the native Java interface, C++ pipes, and the universal
streaming program interface. Examples of how to use high level
tools include the Pig scripting language and the Hive 'SQL like'
interface. Finally, the steps for installing Hadoop on a desktop
virtual machine, in a Cloud environment, and on a local stand-alone
cluster are presented. Topics covered in this tutorial apply to
Hadoop version 2 (i.e., MR2 or Yarn).
The source code repository for this LiveLesson can be found at
www.clustermonkey.net/download/LiveLessons/Hadoop_Fundamentals/.
About the Author:
Douglas Eadline, PhD, began his career as a practitioner and a
chronicler of the Linux Cluster HPC revolution and now documents
big data analytics. Starting with the first Beowulf How To
document, Dr. Eadline has written hundreds of articles, white
papers, and instructional documents covering virtually all aspects
of HPC computing. Prior to starting and editing the popular
ClusterMonkey.net web site in 2005, he served as Editorinchief
for ClusterWorld Magazine, and was Senior HPC Editor for Linux
Magazine. Currently, he is a consultant to the HPC industry and
writes a monthly column in HPC Admin Magazine. Both clients and
readers have recognized Dr. Eadline's ability to present a
"technological value proposition" in a clear and accurate style. He
has practical hands on experience in many aspects of HPC including,
hardware and software design, benchmarking, storage, GPU, cloud,
and parallel computing.
Table of Contents
Introduction
Introduction to Hadoop Fundamentals LiveLessons
00:02:30
Lesson 1: Background Concepts
Learning objectives
00:00:35
1.1 Understand the problem Hadoop solves
00:10:37
1.2 Understand the Hadoop approach
00:03:36
1.3 Understand the Hadoop Project
00:06:42
Lesson 2: Running Hadoop on a Desktop or Laptop
Learning objectives
00:00:56
2.1 Install Hortonworks HDP Sandbox
00:09:53
Lesson 3: The Hadoop Distributed File System
Learning objectives
00:00:45
3.1 Understand HDFS basics
00:24:12
3.2a Use HDFS tools
00:17:53
3.2b Do HDFS administration
00:26:18
3.3 Use HDFS in programs
00:17:33
Lesson 4: Hadoop MapReduce
Learning objectives
00:00:46
4.1 Understand the MapReduce paradigm
00:07:45
4.2 Develop and run a Java MapReduce application
00:15:51
4.3 Understand how MapReduce works
00:17:45
Lesson 5: Hadoop Examples
Learning objectives
00:00:35
5.1 Use the Streaming Interface
00:10:37
5.2 Use the Pipes Interface
00:07:24
5.3 Run the Hadoop grep example
00:06:28
5.4 Debug MapReduce
00:11:04
Lesson 6: Higher Level Tools
Learning objectives
00:00:41
6.1 Use Pig
00:07:59
6.2 Use Hive
00:06:37
Lesson 7: Setting Up Hadoop in the Cloud
Learning objectives
00:00:35
7.1 Use Whirr to launch Hadoop in the Cloud
00:12:08
Lesson 8: Set Up Hadoop on a Local Cluster
Learning objectives
00:00:41
8.1 Specify and prepare servers
8.2 Install and configure Hadoop Core
8.3 Install and configure Pig and Hive
8.4 Install and configure Ganglia
8.5 Perform simple administration and monitoring