DENG-254: Preparing with Cloudera Data Engineering

Duration: 4 Days (32 Hours)

DENG-254: Preparing with Cloudera Data Engineering Course Overview:

Conducted over four days, this immersive training course is designed to equip developers with the essential concepts and skills to harness the power of Apache Spark for crafting high-performance, parallel applications within the Cloudera Data Platform (CDP) environment.

The course combines theoretical understanding with hands-on experience, enabling participants to proficiently develop Spark applications that seamlessly integrate with core components of CDP. Through practical exercises, students will become adept at crafting Spark applications, while also gaining expertise in utilizing Spark SQL to query structured data. Moreover, the curriculum delves into leveraging Hive capabilities for data ingestion and denormalization, as well as handling substantial volumes of “big data” stored within a distributed file system.

Intended Audience:

  • This course is designed for developers and data engineers. All students are expected to have basic Linux experience, and basic proficiency with either Python or Scala programming languages.

Learning Objectives of DENG-254: Preparing with Cloudera Data Engineering:

During this course, you will learn how to:

  • Distribute, store, and process data in a CDP cluster
  • Write, configure, and deploy Apache Spark applications
  • Use the Spark interpreters and Spark applications to explore, process, and analyze distributed data
  • Query data using Spark SQL, DataFrames, and Hive tables
  • Deploy a Spark application on the Data Engineering Service
HDFS Introduction
  • HDFS Overview
  • HDFS Components and Interactions
  • Additional HDFS Interactions
  • Ozone Overview
  • Exercise: Working with HDFS
  • YARN Overview
  • YARN Components and Interaction
  • Working with YARN
  • Exercise: Working with YARN
  • Resilient Distributed Datasets (RDDs)
  • Exercise: Working with RDDs
  • Introduction to DataFrames
  • Exercise: Introducing DataFrames
  • Exercise: Reading and Writing DataFrames
  • Exercise: Working with Columns
  • Exercise: Working with Complex Types
  • Exercise: Combining and Splitting DataFrames
  • Exercise: Summarizing and Grouping DataFrames
  • Exercise: Working with UDFs
  • Exercise: Working with Windows
  • About Hive
  • Transforming data with Hive QL
  • Exercise: Working with Partitions
  • Exercise: Working with Buckets
  • Exercise: Working with Skew
  • Exercise: Using Serdes to Ingest Text Data
  • Exercise: Using Complex Types to Denormalize Data
  • Hive and Spark Integration
  • Exercise: Spark Integration with Hive
  • Shuffle
  • Skew
  • Order
  • Spark Distributed Processing
  • Exercise: Explore Query Execution Order
  • DataFrame and Dataset Persistence
  • Persistence Storage Levels
  • Viewing Persisted RDDs
  • Exercise: Persisting DataFrames
  • Create and Trigger Ad-Hoc Spark Jobs
  • Orchestrate a Set of Jobs Using Airflow
  • Data Lineage using Atlas
  • Auto-scaling in Data Engineering Service
  • Optimize Workloads, Performance, Capacity
  • Identify Suboptimal Spark Jobs

DENG-254: Preparing with Cloudera Data Engineering Course Prerequisites

  • Basic knowledge of SQL is helpful.  Prior knowledge of Spark and Hadoop is not required.

Discover the perfect fit for your learning journey

Choose Learning Modality

Live Online

  • Convenience
  • Cost-effective
  • Self-paced learning
  • Scalability

Classroom

  • Interaction and collaboration
  • Networking opportunities
  • Real-time feedback
  • Personal attention

Onsite

  • Familiar environment
  • Confidentiality
  • Team building
  • Immediate application

Training Exclusives

This course comes with following benefits:

  • Practice Labs.
  • Get Trained by Certified Trainers.
  • Access to the recordings of your class sessions for 90 days.
  • Digital courseware
  • Experience 24*7 learner support.

Got more questions? We’re all ears and ready to assist!

Request More Details

Please enable JavaScript in your browser to complete this form.

Subscribe to our Newsletter

Please enable JavaScript in your browser to complete this form.
×