Video description
Working with Big Data: Infrastructure, Algorithms, and Visualizations LiveLessons presents a high level overview of big data and how to use key tools to solve your data challenges. This introduction to the three areas of big data includes:
- Infrastructure - how to store and process big data
- Algorithms - how to integrate algorithms into your big data stack and an introduction to classification
- Visualizations - an introduction to creating visualizations in JavaScript using D3.js
The goal was not to be exhaustive, but rather, to provide a higher level view of how all the pieces of a big data architecture work together.
About the Author:
Paul Dix is the author of “Service Oriented Design with Ruby and Rails.” He is a frequent speaker at conferences and user groups including Web 2.0, RubyConf, RailsConf, The Gotham Ruby Conference, and Scotland on Rails. Paul is the founder and organizer of the NYC Machine Learning Meetup, which has over 2,900 members. In the past he has worked at startups and larger companies like Google, Microsoft, and McAfee. Currently, Paul is a co-founder at Errplane, a cloud based service for monitoring and alerting on application performance and metrics. He lives in New York City.
Table of Contents
Introduction
Introduction to Working with Big Data LiveLessons
00:03:17
What is Big Data?
00:05:26
Lesson 1: Unstructured Storage and Hadoop
Learning objectives
00:00:49
1.1 Set up a basic Hadoop installation
00:16:14
1.2 Write data into the Hadoop file system
00:07:41
1.3 Write a Hadoop streaming job to process text files
00:17:55
Lesson 2: Structured Storage and Cassandra
Learning objectives
00:01:00
2.1 Set up a basic Cassandra installation
00:10:16
2.2 Create a Cassandra schema for storing data
00:17:04
2.3 Store and retrieve data from Cassandra using the Ruby library
00:07:39
2.4 Write data into Cassandra from a Hadoop streaming job
00:20:14
2.5 Use the Hadoop reduce phase to parallelize writes
00:15:09
Lesson 3: Real Time Processing and Messaging
Learning objectives
00:01:07
3.1 Set up the Kafka messaging system
00:08:02
3.2 Publish and consume data from Kafka in Ruby
00:11:05
3.3 Aggregate log files into Hadoop using Kafka and a Ruby consumer
00:13:55
3.4 Create horizontally scalable message consumers
00:11:35
3.5 Sample messages using Kafka’s partitioning
00:10:47
3.6 Create redundant message consumers for high availability
00:27:50
Lesson 4: Working with Machine Learning Algorithms
Learning objectives
00:00:57
4.1 Grasp the concepts of machine learning and implement the k-nearest neighbors algorithm
00:25:47
4.2 Understand the basics of distance metrics and implement euclidean distance and cosine similarity
00:26:44
4.3 Transform raw data into a matrix and convert a text document into the vector space model
00:22:42
4.4 Use k-nearest neighbors to make predictions
00:18:41
4.5 Improve execution time by reducing the search space
00:11:08
Lesson 5: Experimentation and Running Algorithms in Production
Learning objectives
00:00:58
5.1 Use cross validation to test a predictive model
00:17:37
5.2 Integrate a trained model into production
00:09:06
5.3 Version a model and track feedback data
00:03:35
5.4 Write a test harness to compare versioned models
00:09:22
5.5 Test new predicted models in production
00:02:41
Lesson 6: Basic Visualizations
Learning objectives
00:00:53
6.1 Prepare raw data for use in visualizations
00:13:10
6.2 Use core functions of the D3 JavaScript visualizaiton toolkit
00:13:17
6.3 Use D3 to create a barchart
00:07:56
6.4 Use D3 to create a time series
00:15:29