Spark is one of today’s most popular distributed computation engines for processing and analyzing big data. This course provides data engineers, data scientist and data analysts interested in exploring the technology of data streaming with practical experience in using Spark. You’ll learn about the Spark Structured Streaming API, the powerful Catalyst query optimizer, the Tungsten execution engine, and more in this hands-on course …
Mastering Spark for Structured Streaming
Video description
Spark is one of today’s most popular distributed computation engines for processing and analyzing big data. This course provides data engineers, data scientist and data analysts interested in exploring the technology of data streaming with practical experience in using Spark. You’ll learn about the Spark Structured Streaming API, the powerful Catalyst query optimizer, the Tungsten execution engine, and more in this hands-on course where you’ll build small several applications that leverage all the aspects of Spark 2.0. While not a requirement, the course works best for those with some Scala experience.
Understand the main features of Spark and its advantages over existing systems
Learn the basics of parallelism, streaming computation, and Spark streaming
Explore the distinctions between Spark Structured Streaming and legacy DStream APIs
Understand how to write to and use the Spark Structured Streaming API
Learn about the new Catalyst query optimizer and the Tungsten execution engine
Discover how Scala and Spark Structured Streaming simplify distributed streaming tasks
Gain hands-on experience building applications using Spark 2.0
Michael Li is the founder of The Data Incubator, which provides big data corporate training and a selective eight-week fellowship for PhDs transitioning into industry. Previously, he worked as a data scientist, software engineer, and researcher at Foursquare, Google, Andreessen Horowitz, J.P. Morgan, and NASA. He is a regular contributor to VentureBeat, The Next Web, and Harvard Business Review. Michael earned his Ph.D. at Princeton and was a Marshall Scholar in Cambridge.
Selecting and Filtering Columns Using Structured Streaming
GroupBy and Aggregation in Structured Streaming
Joining Structured Stream with Datasets
SQL Queries in Spark Structured Streaming
DStream Comparison
Comparing Structured Streaming with DStream
Custom Receivers in Spark DStream
Iterative Wordcount Using Spark DStream
Cumulative Wordcount using Spark DStream
Benefits of Spark Tungsten
Tungsten Performance Benefit Demonstration
Benefits of Spark Catalyst
Viewing Query Plans in Spark Shell
Visualizing Query Stages in Spark UI Viewer
Viewing Spark Catalyst-Optimized Physical Plans
Standalone Spark Streaming Applications
Writing Standalone Spark Streaming Applications
Two Environments for Running Spark
Spark Streaming Standalone Code - Meetup Events Example
Scala Build Tool (SBT) and Spark
Compiling and Building a Standalone Spark Application
Spark Twitter Streaming Example
Start your Free Trial Self paced Go to the Course We have partnered with providers to bring you collection of courses, When you buy through links on our site, we may earn an affiliate commission from provider.
This site uses cookies. By continuing to use this website, you agree to their use.I Accept