Building Batch Data Pipelines on Google Cloud
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.
Review different methods of data loading: EL, ELT and ETL and when to use what
Run Hadoop on Dataproc, leverage Cloud Storage, and optimize Dataproc jobs
Use Dataflow to build your data processing pipelines
Manage data pipelines with Data Fusion and Cloud Composer
Syllabus
Syllabus - What you will learn from this course
Week 1
Introduction
Week 2
Introduction to Building Batch Data Pipelines
Week 3
Executing Spark on Dataproc
Week 4
Serverless Data Processing with Dataflow
Week 5
Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
Week 6
Course Summary
FAQ
Can I preview a course before enrolling?
Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.
What will I get when I enroll?
Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.
When will I receive my Course Certificate?
If you complete the course successfully, your electronic Course Certificate will be added to your Accomplishments page - from there, you can print your Course Certificate or add it to your LinkedIn profile.
Why can’t I audit this course?
This course is one of a few offered on Coursera that are currently available only to learners who have paid or received financial aid, when available.
Reviews
Good course covering Dataproc, Dataflow, Dataprep and the labs ofcourse..
great way to get introduced to batch data pipelines in GCP.
There were too many labs with services that take 30-40 minutes just to spin up. I wouldn't have a problem with all the labs if the services took 2-5 minutes to spin up.
takes time understand , video makes little bore but in practice to enjoy doing but try to mention required time for excuetion or waiting time to task to executeto ece
Informative on various features. But cloud fusion and dataflow are not very clearly explained in detail.. expecting more on this. Want to learn more on the pipeline topic please.