The standard for large-scale data processing, Hadoop makes your data truly accessible. This course offers an in-depth tour of the Hadoop ecosystem, providing detailed instruction on setting up and running a Hadoop cluster, batch processing data with Pig, Hive's SQL dialect, MapReduce, and everything else you need to parse, access, and analyze your data.
The standard for large-scale data processing, Hadoop makes your data truly accessible. This course offers an in-depth tour of the Hadoop ecosystem, providing detailed instruction on setting up and running a Hadoop cluster, batch processing data with Pig, Hive's SQL dialect, MapReduce, and everything else you need to parse, access, and analyze your data.
Lab - Installing Hadoop From CDH With Cloudera Manager - Part 1
Lab - Installing Hadoop From CDH With Cloudera Manager - Part 2
Lab - Installing Hadoop From CDH With Cloudera Manager - Part 3
Lab - Installing Hadoop From CDH With Cloudera Manager - Part 4
Introduction To Hive And Pig Interface
Installing Cloudera Quickstart VM
Hadoop Distributed File System (HDFS)
HDFS Architecture
HDFS File Write Walkthrough
Secondary Name Node
Lab - Using HDFS - Part 1
Lab - Using HDFS - Part 2
HA And Federation Basics
HDFS Access Controls
MapReduce
MapReduce Explained
MapReduce Architecture
MapReduce Code Walkthrough - Part 1
MapReduce Code Walkthrough - Part 2
MapReduce Job Walkthrough
Rack Awareness
Advanced MapReduce - Partioners, Combiners, Comparators And More
Partitioner Code Walkthrough
Java Concerns
Logging And Debugging
Debugging Basics
Benchmarking With Teragen And Terasort
Hive, Pig, And Impala
Comparing Hive, Pig And Impala
Hive Basics
Hive Patterns And Anti-Patterns
Lab - Hive Basic Usage
Pig Basics
Pig Patterns And Anti-Patterns
Lab - Pig Basic Usage
Impala Fundamentals
Data Import And Export
Import And Export Options
Flume Introduction
Lab - Using Flume
HDFS Interaction Tools
Sqoop Introduction
Lab - Using Sqoop
Oozie Introduction
Conclusion
Wrap-Up
Introduction
Course Agenda And Instructor
Core Hadoop Components
Basic Overview Of Hadoop Core Components: HDFS
Hadoop Core Components Overview
What Is Map/Reduce?
YARN: Components And Architecture
Pre-YARN Architecture
YARN Architecture And Daemons
Scheduling, Running And Monitoring Applications In YARN
Running Jobs In YARN
YARN Parameters
YARN Cluster Resource Allocation
Failure Handling
YARN Logs
Hands On With YARN
Conclusion
Summary
Introduction
What Is Apache Hive And Who Uses It?
About The Author
What You Should Expect From This Video
Connecting To Hive
Hive CLI
Beeline
HUE
JDBC
Creating Tables And Loading Data
Creating A Table
Loading Data
Hive Record Structure
Hive Data Types
Manipulating Tables With HiveQL
Select Statement - Part 1
Select Statement - Part 2
Inserting Data Into A Hive Table Using HiveQL
Creating A Table Using HiveQL
Views And Partitions
Creating And Using Views
Creating And Using Partitions
Functions And Using Transform
Built In Functions
User Defined Functions
Transforming Data With Custom Scripts
Hive Execution Engines
Map Reduce
Tez
Spark
Conclusion
Wrap Up
Overview of the Video Course
A Distributed Computing Environment
The Motivation for Hadoop
A Brief History of Hadoop
Understanding the Hadoop Architecture
Setting Up A Pseudo-Distributed Environment
The Distributed File System (HDFS)
Distributed Computing with MapReduce
Word Count - the “Hello, World” of Hadoop!
Computing with Hadoop
How a MapReduce Job Works
Mappers and Reducers in Detail
Working with Hadoop via the Command Line: Starting HDFS and Yarn
Working with Hadoop via the Command Line: Loading Data into HDFS
Working with Hadoop via the Command Line: Running a MapReduce Job
How To Use Our Github Goodies
Working in Python with Hadoop Streaming
Common MapReduce Tasks
Spark on Hadoop 2
Creating a Spark Application with Python
The Hadoop Ecosystem
The Hadoop Ecosystem
Data Warehousing with Hadoop
Higher Order Data Flows
Other Notable Projects
Working with Data on Hive
Introduction to Hive
Interacting with Data via the Hive Console
Creating Databases, Tables, and Schemas for Hive
Loading Data into Hive from HDFS
Querying Data and Performing Aggregations With Hive
Towards Last Mile Computing
Decomposing Large Data Sets to a Computational Space
Linear Regressions
Summarizing Documents with TF-IDF
Classification of Text
Parallel Canopy Clustering
Computing Recommendations via Linear Log-Likelihoods
Introduction to Clickstream Case Study
Requirements
Data Modeling
Data Ingest
Data Processing Engines - Part 1
Data Processing Engines - Part 2
Data Processing Patterns
Orchestration
Putting It All Together
Demo
Q
Start your Free Trial Self paced Go to the Course We have partnered with providers to bring you collection of courses, When you buy through links on our site, we may earn an affiliate commission from provider.
This site uses cookies. By continuing to use this website, you agree to their use.I Accept