Apache Kudu is an entirely new storage manager for the Hadoop ecosystem. It addresses many of the most difficult architectural issues in Big Data, including the Hadoop "storage gap" problem common when building near real-time analytical applications. This vexing issue has prevented many applications from transitioning to Hadoop-based architectures.
In this course, you'll learn why Kudu exists, when to use it, the key concepts of Kudu's design, and how it enables simple, real-time analytics without the need for separate batch and speed layers. Designed for developers, architects, and engineers with some limited experience using Hadoop ecosystem components like HDFS, Hive, Spark, or Impala, the course describes how to architect Kudu applications that are low-risk, fast, scalable, and reliable.
- Explore skills critical to the "big data" toolbox of any developer, architect, or engineer
- Learn how Kudu solves the Hadoop storage gap problem
- Understand Kudu's design goals, strengths, and weaknesses
- Discover how Kudu reads and writes data
- Master the concepts that make Kudu-based applications low-risk, scalable, and fast
Ryan Bosshart is a Principal Systems Engineer at Cloudera, where he leads a specialized team focused on Hadoop ecosystem storage technologies such as HDFS, Hbase, and Kudu. An architect and builder of large-scale distributed systems since 2006, Ryan is co-chair of the Twin Cities Spark and Hadoop User Group. He speaks about Hadoop technologies at conferences throughout North America and holds a degree in computer science from Augsburg College.