Hadoop, 2nd Edition

Video description

The standard for large-scale data processing, Hadoop makes your data truly accessible. This course offers an in-depth tour of the Hadoop ecosystem, providing detailed instruction on setting up and running a Hadoop cluster, batch processing data with Pig, Hive's SQL dialect, MapReduce, and everything else you need to parse, access, and analyze your data.

Publisher resources

Download Example Code

Introduction

What …

Hadoop, 2nd Edition

Video description

Publisher resources

Download Example Code

Introduction

What Is Big Data?

About The Author

Historical Approaches

Big data In The Modern World

The Hadoop Approach

Hadoop Hardware Requirements

Hadoop Core Vs. Ecosystem

Hadoopable Problems

Hadoop Support Companies

Hadoop Basics

HDFS And MapReduce

Hadoop Run Modes And Job Types

Hadoop Software Requirements And Recommendations

Hadoop in the Cloud - Amazon Web Services

Lab - Installing Hadoop From CDH With Cloudera Manager - Part 1

Lab - Installing Hadoop From CDH With Cloudera Manager - Part 2

Lab - Installing Hadoop From CDH With Cloudera Manager - Part 3

Lab - Installing Hadoop From CDH With Cloudera Manager - Part 4

Introduction To Hive And Pig Interface

Installing Cloudera Quickstart VM

Hadoop Distributed File System (HDFS)

HDFS Architecture

HDFS File Write Walkthrough

Secondary Name Node

Lab - Using HDFS - Part 1

Lab - Using HDFS - Part 2

HA And Federation Basics

HDFS Access Controls

MapReduce

MapReduce Explained

MapReduce Architecture

MapReduce Code Walkthrough - Part 1

MapReduce Code Walkthrough - Part 2

MapReduce Job Walkthrough

Rack Awareness

Advanced MapReduce - Partioners, Combiners, Comparators And More

Partitioner Code Walkthrough

Java Concerns

Logging And Debugging

Debugging Basics

Benchmarking With Teragen And Terasort

Hive, Pig, And Impala

Comparing Hive, Pig And Impala

Hive Basics

Hive Patterns And Anti-Patterns

Lab - Hive Basic Usage

Pig Basics

Pig Patterns And Anti-Patterns

Lab - Pig Basic Usage

Impala Fundamentals

Data Import And Export

Import And Export Options

Flume Introduction

Lab - Using Flume

HDFS Interaction Tools

Sqoop Introduction

Lab - Using Sqoop

Oozie Introduction

Conclusion

Wrap-Up

Introduction

Course Agenda And Instructor

Core Hadoop Components

Basic Overview Of Hadoop Core Components: HDFS

Hadoop Core Components Overview

What Is Map/Reduce?

YARN: Components And Architecture

Pre-YARN Architecture

YARN Architecture And Daemons

Scheduling, Running And Monitoring Applications In YARN

Running Jobs In YARN

YARN Parameters

YARN Cluster Resource Allocation

Failure Handling

YARN Logs

Hands On With YARN

Conclusion

Summary

Introduction

What Is Apache Hive And Who Uses It?

About The Author

What You Should Expect From This Video

Connecting To Hive

Hive CLI

Beeline

HUE

JDBC

Creating Tables And Loading Data

Creating A Table

Loading Data

Hive Record Structure

Hive Data Types

Manipulating Tables With HiveQL

Select Statement - Part 1

Select Statement - Part 2

Inserting Data Into A Hive Table Using HiveQL

Creating A Table Using HiveQL

Views And Partitions

Creating And Using Views

Creating And Using Partitions

Functions And Using Transform

Built In Functions

User Defined Functions

Transforming Data With Custom Scripts

Hive Execution Engines

Map Reduce

Tez

Spark

Conclusion

Wrap Up

Overview of the Video Course

A Distributed Computing Environment

The Motivation for Hadoop

A Brief History of Hadoop

Understanding the Hadoop Architecture

Setting Up A Pseudo-Distributed Environment

The Distributed File System (HDFS)

Distributed Computing with MapReduce

Word Count - the “Hello, World” of Hadoop!

Computing with Hadoop

How a MapReduce Job Works

Mappers and Reducers in Detail

Working with Hadoop via the Command Line: Starting HDFS and Yarn

Working with Hadoop via the Command Line: Loading Data into HDFS

Working with Hadoop via the Command Line: Running a MapReduce Job

How To Use Our Github Goodies

Working in Python with Hadoop Streaming

Common MapReduce Tasks

Spark on Hadoop 2

Creating a Spark Application with Python

The Hadoop Ecosystem

Data Warehousing with Hadoop

Higher Order Data Flows

Other Notable Projects

Working with Data on Hive

Introduction to Hive

Interacting with Data via the Hive Console

Creating Databases, Tables, and Schemas for Hive

Loading Data into Hive from HDFS

Querying Data and Performing Aggregations With Hive

Towards Last Mile Computing

Decomposing Large Data Sets to a Computational Space

Linear Regressions

Summarizing Documents with TF-IDF

Classification of Text

Parallel Canopy Clustering

Computing Recommendations via Linear Log-Likelihoods

Introduction to Clickstream Case Study

Requirements

Data Modeling

Data Ingest

Data Processing Engines - Part 1

Data Processing Engines - Part 2

Data Processing Patterns

Orchestration

Putting It All Together

Demo

Start your Free Trial

Self paced

Go to the Course
We have partnered with providers to bring you collection of courses, When you buy through links on our site, we may earn an affiliate commission from provider.