These practice tests for Databricks Certified Data Engineer Associate are designed to help you understand all of the exam topics so you can pass the exam but also help you learn the concepts of the Databricks platform. It is very important to finish all 5 practice exams because each exam focuses on certain features of Databricks so you don't want to miss out on those prior to taking the exam. My goal here is not just to help you pass the exam but help you learn the Databricks platform along the way.
Here is a summary of the Databricks Certified Data Engineer Associate certification exam:
The exam assesses an individual’s ability to use the Databricks Lakehouse Platform to complete introductory data engineering tasks. This includes an understanding of the Lakehouse Platform and its workspace, its architecture, and its capabilities. It also assesses the ability to perform multi-hop architecture ETL tasks using Apache Spark SQL and Python in both batch and incrementally processed paradigms. Finally, the exam assesses the tester’s ability to put basic ETL pipelines and Databricks SQL queries and dashboards into production while maintaining entity permissions. Individuals who pass this certification exam can be expected to complete basic data engineering tasks using Databricks and its associated tools.
The minimally qualified candidate should be able to:
Understand how to use and the benefits of using the Databricks Lakehouse Platform and its tools, including : 24 % (11/45)
Data Lakehouse (architecture, descriptions, benefits)
Data Science and Engineering workspace (clusters, notebooks, data storage)
Delta Lake (general concepts, table management and manipulation, optimizations)
Build ETL pipelines using Apache Spark SQL and Python, including: 29% (13/45)
Relational entities (databases, tables, views)
ELT (creating tables, writing data to tables, cleaning data, combining and reshaping tables, SQL UDFs)
Python (facilitating Spark SQL with string manipulation and control flow, passing data between PySpark and Spark SQL)
Incrementally process data, including: 22% (10/45)
Structured Streaming (general concepts, triggers, watermarks)
Auto Loader (streaming reads)
Multi-hop Architecture (bronze-silver-gold, streaming applications)
Delta Live Tables (benefits and features)
Build production pipelines for data engineering applications and Databricks SQL queries and dashboards, including: 16% (10/45)
Jobs (scheduling, task orchestration, UI)
Dashboards (endpoints, scheduling, alerting, refreshing)
Understand and follow best security practices, including: 9% (4/45)
Unity Catalog (benefits and features)
Entity Permissions (team-based permissions, user-based permissions)
Duration
Testers will have 90 minutes to complete the certification exam.
Questions
There are 45 multiple-choice questions on the certification exam. The questions will be distributed by high-level topic in the following way:
Databricks Lakehouse Platform – 24% (11/45)
ELT with Spark SQL and Python – 29% (13/45)
Incremental Data Processing – 22% (10/45)
Production Pipelines – 16% (7/45)
Data Governance – 9% (4/45)