Spark Programming in Scala for Beginners with Apache Spark 3

Video description

A carefully designed and error-free tested course on Spark programming in Scala for beginners using Apache Spark 3. Reinforce your journey with hands-on and practical content throughout.

About This Video

A comprehensive course designed for the beginner-level for Spark programming in Scala
Deep dive into Spark 3 architecture and data engineering
Complete tested source code and examples used on Apache Spark 3.0.0 …

Spark Programming in Scala for Beginners with Apache Spark 3

Video description

A carefully designed and error-free tested course on Spark programming in Scala for beginners using Apache Spark 3. Reinforce your journey with hands-on and practical content throughout.

About This Video

A comprehensive course designed for the beginner-level for Spark programming in Scala
Deep dive into Spark 3 architecture and data engineering
Complete tested source code and examples used on Apache Spark 3.0.0 open-source distribution from the author’s end

In Detail

Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. Since its release, Apache Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at a massive scale. It has quickly become the largest open-source community in big data. So, mastering Apache Spark opens a wide range of professional opportunities.

This course starts with an introduction to Apache Spark where you see what Apache Spark is in brief. Then, you will be installing and using Apache Spark. After that, you will look at the Spark execution model and architecture in detail. Next, you will learn the Spark programming model and developer experience. Following that, you will look at the Spark Structured API foundation, and Spark data sources and sinks. Then, you will explore Spark Data frame and dataset transformations along with aggregations in Apache Spark. Finally, you will look at the Spark Data frame joins in detail.

By the end of this course, you will understand Spark programming and apply that knowledge to build data engineering solutions.

Audience

This course is designed for software engineers willing to develop a data engineering pipeline and application using Apache Spark. It is also for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. It will also be beneficial for the managers and architects who do not directly work with Spark implementation, and still, they work with the people who implement Apache Spark at the ground level.

Before proceeding with the course, you will need basic knowledge of the Scala programming language.

Publisher resources

Download Example Code

Chapter 1 : Apache Spark introduction

Big Data History and Primer

Understanding the Data Lake Landscape

What Is Apache Spark?

Chapter 2 : Installing and Using Apache Spark

Spark Development Environments

Apache Spark in Local Mode Command Line REPL

Apache Spark in the IDE - IntelliJ IDEA

Apache Spark in Cloud - Databricks Community and Notebooks

Apache Spark in Hadoop Ecosystem - Zeppelin Notebooks

Chapter 3 : Spark Execution Model and Architecture

Execution Methods - How to Run Spark Programs?

Spark Distributed Processing Model - How Your Program Runs?

Spark Execution Modes and Cluster Managers

Summarizing Spark Execution Models - When to Use What?

Working with Spark Shell - Demo

Installing Multi-Node Spark Cluster - Demo

Working with Notebooks in Cluster - Demo

Working with Spark Submit - Demo

Section Summary

Chapter 4 : Spark Programming Model and Developer Experience

Creating Spark Project Build Configuration

Configuring Spark Project Application Logs

Creating Spark Session

Configuring Spark Session

Data Frame Introduction

Data Frame Partitions and Executors

Spark Transformations and Actions

Spark Jobs Stages and Tasks

Understanding Your Execution Plan

Unit Testing Spark Application

Debugging Spark Driver and Executor

Spark Application Logs in a Cluster

Rounding Off Summary

Chapter 5 : Spark Structured API Foundation

Introduction to Spark APIs

Introduction to Spark RDD API

Dataset Versus Data Frame

Working with Spark Dataset

Working with Spark SQL

Spark SQL Engine and Catalyst Optimizer

Section Summary

Chapter 6 : Spark Data Sources and Sinks

Introduction to Spark Sources and Sinks

Spark DataFrameReader API

Reading CSV, JSON, and Parquet files

Creating Spark DataFrame Schema

Spark DataFrameWriter API

Writing Your Data and Managing Layout

Spark Databases and Tables

Working with Spark SQL Tables

Chapter 7 : Spark DataFrame and Dataset Transformations

Introduction to Data Transformation

Working with DataFrame Rows

DataFrame Rows and Unit Testing

DataFrame Rows and Unstructured data

Working with DataFrame Columns

Creating and Using UDF

Miscellaneous Transformations

Chapter 8 : Aggregations in Apache Spark

Aggregating DataFrames

Grouping Aggregations

Windowing Aggregations

Chapter 9 : Spark DataFrame Joins

DataFrame Joins and Column Name Ambiguity

Outer Joins in DataFrame

Internals of Spark Join and Shuffle

Optimizing Your Joins

Implementing Bucket Joins

Start your Free Trial

Self paced

Go to the Course
We have partnered with providers to bring you collection of courses, When you buy through links on our site, we may earn an affiliate commission from provider.