Spark Programming in Python for Beginners with Apache Spark 3
Video description
Build data engineering solutions with Spark programming in Python
About This Video
Build your own data engineering solutions using Spark structured API in Python
Gain an in-depth understanding of the Apache Hadoop architecture, ecosystem, and practices
Learn to apply Spark programming basics
In Detail
If you are looking to expand your knowledge in data engineering or want to level up your portfolio by adding Spark …
Spark Programming in Python for Beginners with Apache Spark 3
Video description
Build data engineering solutions with Spark programming in Python
About This Video
Build your own data engineering solutions using Spark structured API in Python
Gain an in-depth understanding of the Apache Hadoop architecture, ecosystem, and practices
Learn to apply Spark programming basics
In Detail
If you are looking to expand your knowledge in data engineering or want to level up your portfolio by adding Spark programming to your skillset, then you are in the right place. This course will help you understand Spark programming and apply that knowledge to build data engineering solutions. This course is example-driven and follows a working session-like approach. We will be taking a live coding approach and explaining all the concepts needed along the way.
In this course, we will start with a quick introduction to Apache Spark, then set up our environment by installing and using Apache Spark. Next, we will learn about Spark execution model and architecture, and about Spark programming model and developer experience. Next, we will cover Spark structured API foundation and then move towards Spark data sources and sinks.
Then we will cover Spark Dataframe and dataset transformations. We will also cover aggregations in Apache Spark and finally, we will cover Spark Dataframe joins.
By the end of this course, you will be able to build data engineering solutions using Spark structured API in Python.
Audience
This course is designed for software engineers willing to develop a data engineering pipeline and application using Apache Spark; for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure, for managers and architects who do not directly work with Spark implementation but work with the people who implement Apache Spark at the ground level.
This course does not require any prior knowledge of Apache Spark or Hadoop; only programming knowledge using Python programming language is required.
What is Apache Spark - An Introduction and Overview
Chapter 2 : Installing and Using Apache Spark
Spark Development Environments
Mac Users - Apache Spark in Local Mode Command Line REPL
Windows Users - Apache Spark in Local Mode Command Line REPL
Mac Users - Apache Spark in the IDE - PyCharm
Windows Users - Apache Spark in the IDE - PyCharm
Apache Spark in Cloud - Databricks Community and Notebooks
Apache Spark in Anaconda - Jupyter Notebook
Chapter 3 : Spark Execution Model and Architecture
Execution Methods - How to Run Spark Programs?
Spark Distributed Processing Model - How Your Program Runs?
Spark Execution Modes and Cluster Managers
Summarizing Spark Execution Models - When to Use What?
Working with PySpark Shell - Demo
Installing Multi-Node Spark Cluster - Demo
Working with Notebooks in Cluster - Demo
Working with Spark Submit - Demo
Section Summary
Chapter 4 : Spark Programming Model and Developer Experience
Creating Spark Project Build Configuration
Configuring Spark Project Application Logs
Creating Spark Session
Configuring Spark Session
Data Frame Introduction
Data Frame Partitions and Executors
Spark Transformations and Actions
Spark Jobs Stages and Task
Understanding your Execution Plan
Unit Testing Spark Application
Rounding off Summary
Chapter 5 : Spark Structured API Foundation
Introduction to Spark APIs
Introduction to Spark RDD API
Working with Spark SQL
Spark SQL Engine and Catalyst Optimizer
Section Summary
Chapter 6 : Spark Data Sources and Sinks
Spark Data Sources and Sinks
Spark DataFrameReader API
Reading CSV, JSON and Parquet files
Creating Spark DataFrame Schema
Spark DataFrameWriter API
Writing Your Data and Managing Layout
Spark Databases and Tables
Working with Spark SQL Tables
Chapter 7 : Spark Dataframe and Dataset Transformations
Introduction to Data Transformation
Working with Dataframe Rows
DataFrame Rows and Unit Testing
Dataframe Rows and Unstructured data
Working with Dataframe Columns
Creating and Using UDF
Misc Transformations
Chapter 8 : Aggregations in Apache Spark
Aggregating Dataframes
Grouping Aggregations
Windowing Aggregations
Chapter 9 : Spark Dataframe Joins
Dataframe Joins and Column Name Ambiguity
Outer Joins in Dataframe
Internals of Spark Join and shuffle
Optimizing Your Joins
Implementing Bucket Joins
Chapter 10 : Keep Learning
Final Word
Start your Free Trial Self paced Go to the Course We have partnered with providers to bring you collection of courses, When you buy through links on our site, we may earn an affiliate commission from provider.
This site uses cookies. By continuing to use this website, you agree to their use.I Accept