Spark Programming in Scala for Beginners with Apache Spark 3
Video description
A carefully designed and error-free tested course on Spark programming in Scala for beginners using Apache Spark 3. Reinforce your journey with hands-on and practical content throughout.
About This Video
A comprehensive course designed for the beginner-level for Spark programming in Scala
Deep dive into Spark 3 architecture and data engineering
Complete tested source code and examples used on Apache Spark 3.0.0 …
Spark Programming in Scala for Beginners with Apache Spark 3
Video description
A carefully designed and error-free tested course on Spark programming in Scala for beginners using Apache Spark 3. Reinforce your journey with hands-on and practical content throughout.
About This Video
A comprehensive course designed for the beginner-level for Spark programming in Scala
Deep dive into Spark 3 architecture and data engineering
Complete tested source code and examples used on Apache Spark 3.0.0 open-source distribution from the author’s end
In Detail
Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. Since its release, Apache Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at a massive scale. It has quickly become the largest open-source community in big data. So, mastering Apache Spark opens a wide range of professional opportunities.
This course starts with an introduction to Apache Spark where you see what Apache Spark is in brief. Then, you will be installing and using Apache Spark. After that, you will look at the Spark execution model and architecture in detail. Next, you will learn the Spark programming model and developer experience. Following that, you will look at the Spark Structured API foundation, and Spark data sources and sinks. Then, you will explore Spark Data frame and dataset transformations along with aggregations in Apache Spark. Finally, you will look at the Spark Data frame joins in detail.
By the end of this course, you will understand Spark programming and apply that knowledge to build data engineering solutions.
Audience
This course is designed for software engineers willing to develop a data engineering pipeline and application using Apache Spark. It is also for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. It will also be beneficial for the managers and architects who do not directly work with Spark implementation, and still, they work with the people who implement Apache Spark at the ground level.
Before proceeding with the course, you will need basic knowledge of the Scala programming language.
Apache Spark in Cloud - Databricks Community and Notebooks
Apache Spark in Hadoop Ecosystem - Zeppelin Notebooks
Chapter 3 : Spark Execution Model and Architecture
Execution Methods - How to Run Spark Programs?
Spark Distributed Processing Model - How Your Program Runs?
Spark Execution Modes and Cluster Managers
Summarizing Spark Execution Models - When to Use What?
Working with Spark Shell - Demo
Installing Multi-Node Spark Cluster - Demo
Working with Notebooks in Cluster - Demo
Working with Spark Submit - Demo
Section Summary
Chapter 4 : Spark Programming Model and Developer Experience
Creating Spark Project Build Configuration
Configuring Spark Project Application Logs
Creating Spark Session
Configuring Spark Session
Data Frame Introduction
Data Frame Partitions and Executors
Spark Transformations and Actions
Spark Jobs Stages and Tasks
Understanding Your Execution Plan
Unit Testing Spark Application
Debugging Spark Driver and Executor
Spark Application Logs in a Cluster
Rounding Off Summary
Chapter 5 : Spark Structured API Foundation
Introduction to Spark APIs
Introduction to Spark RDD API
Dataset Versus Data Frame
Working with Spark Dataset
Working with Spark SQL
Spark SQL Engine and Catalyst Optimizer
Section Summary
Chapter 6 : Spark Data Sources and Sinks
Introduction to Spark Sources and Sinks
Spark DataFrameReader API
Reading CSV, JSON, and Parquet files
Creating Spark DataFrame Schema
Spark DataFrameWriter API
Writing Your Data and Managing Layout
Spark Databases and Tables
Working with Spark SQL Tables
Chapter 7 : Spark DataFrame and Dataset Transformations
Introduction to Data Transformation
Working with DataFrame Rows
DataFrame Rows and Unit Testing
DataFrame Rows and Unstructured data
Working with DataFrame Columns
Creating and Using UDF
Miscellaneous Transformations
Chapter 8 : Aggregations in Apache Spark
Aggregating DataFrames
Grouping Aggregations
Windowing Aggregations
Chapter 9 : Spark DataFrame Joins
DataFrame Joins and Column Name Ambiguity
Outer Joins in DataFrame
Internals of Spark Join and Shuffle
Optimizing Your Joins
Implementing Bucket Joins
Start your Free Trial Self paced Go to the Course We have partnered with providers to bring you collection of courses, When you buy through links on our site, we may earn an affiliate commission from provider.
This site uses cookies. By continuing to use this website, you agree to their use.I Accept