Streaming data enables you to rapidly assess and respond to events, but only if you have the right methods for processing it. In this unique O’Reilly video collection—taken from live sessions at Strata + Hadoop World 2015 in San Jose, California—you’ll learn about several analytics tools and event mining techniques from experts in the field.
Learn how to capture, process, and respond to high-velocity data quickly. This video collection includes:
Going Real-time: Data Collection and Stream Processing with Apache Kafka
jay kreps (Confluent)
Discover what happens when every click, impression, database change, and application log is available as a real-time stream of well-structured data—based on real-world examples from LinkedIn and other organizations.
Stream Processing Everywhere—What to Use?
Jim Scott (MapR Technologies, Inc.)
To help you decide which solution to use for processing data from social media streams and sensor devices in real time, Jim compares three Apache projects—Storm, Spark, and Samza.
From Source to Solution: Building a System for Machine and Event-Oriented Data
Eric Sammer (Rocana)
Follow the flow of data through an end-to-end system built to handle tens of terabytes an hour of event-oriented data, providing real time streaming, in-memory, SQL, and batch access to this data. You’ll learn how Hadoop, Kafka, Solr, and Impala/Hive were stitched together to build this system.
Spark Streaming—The State of the Union, and Beyond
Tathagata Das (Databricks)
Spark Streaming extends the core Apache Spark API to perform large-scale stream processing. In this session, you’ll learn interesting use cases of Spark Streaming in the wild, as well as interesting developments like the brand new Python API.
Dynamic Events in Massive Data Streams, from Astrophysics to Marketing Automation
Kirk Borne (George Mason University)
Big data stream analytics and massive event mining techniques are critical in several domains, including astrophysics (the Large Synoptic Survey Telescope), social uprisings, health epidemics, seismology, cybersecurity, and more. Kirk address these parallels, their big data applications, and some anticipated analytics solutions, including Decision Science-as-a-Service.
TSAR (the TimeSeries AggregatoR)—How to Count Tens of Billions of Daily Events in Real Time Using Open Source Technologies
Anirudh Todi (Twitter Inc.)
Find out how Twitter built TSAR from the ground up with Python and Scala on technologies such as Storm and Kafka, and learn the challenges they faced in scaling it to process tens of billions of events per day.
Streaming Analytics: It’s Not The Same Game
Subutai Ahmad (Numenta, Inc.)
The existing big data paradigm that requires storing data for batch analysis and extensive modeling by a human expert is incredibly inefficient. In this session, you’ll explore streaming data algorithms that are highly automated, adapt to changing statistics, and naturally deal with temporal data streams. The open source project NuPIC uses many of the core ideas.
Realtime Data Analysis Patterns
Mikio Braun (TU Berlin)
Examine the use of realtime data analysis patterns from data acquisition and processing to storage of historic data. You’ll learn about an architecture that includes approximative algorithms at its core for use cases, such as social media data and user real-time profiling and recommendation.
The IoT P2P Backbone
Bruno Fernandez-Ruiz (Yahoo)
Under current constraints, many sensor devices only send inferred metrics rather than store or broadcast raw datasets. And devices that can send raw data only do so when there’s a good connection, leading to latency in generating predictions. This insightful talk looks into these issues.
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Robert Grossman (University of Chicago)
Three case studies yielded several lessons on how to build anomaly detection systems for different operational systems. You’ll learn eight useful techniques that researchers identified from these case studies, including how best to deploy these techniques.