ARCH-492: Architecting Cloudera Edge to AI

Duration: 4 Days (32 Hours)

ARCH-492: Architecting Cloudera Edge to AI Course Overview:

Designing Edge to AI Applications” is an intensive four-day educational event tailored to delve into sophisticated aspects of big data architecture for crafting edge-to-AI applications. Encompassing streaming, operational data processing, analytics, and machine learning, this workshop offers a platform to unite technical experts in collaborative exploration of intricate solutions for demanding business challenges.

The workshop navigates through overarching big data architecture concerns before channeling them into the creation of an innovative system design. Through dynamic engagement, participants apply theoretical knowledge to real-world scenarios, fostering comprehensive discussions that yield detailed insights. This interactive setting facilitates the absorption of techniques for architecting robust big data systems, enriched not only by Cloudera’s expertise but also the collective experiences of peers.

Specifically, the workshop tackles advanced big data architecture facets such as data formats, transformations, transactions, real-time and batch processing, machine learning, scalability, fault tolerance, security, and privacy. By addressing these topics, the workshop systematically mitigates the risk of unsound architecture choices and suboptimal technology selections, culminating in a comprehensive and practical skillset.

Intended Audience:

  • Participants should mainly be architects, developer team leads, big data developers, data engineers, senior analysts, dev ops admins and machine learning developers who are working on big data or streaming applications and have an interest in how to design and develop such applications on CDP.

Learning Objectives of ARCH-492: Architecting Cloudera Edge to AI:

  • Cloudera Data Platform
  • Big Data Architecture
  • Building Scalable applications
  • Building Fault Tolerant Solutions
  • Security and Privacy Deployment on Public, Private, and Hybrid Cloud
  • Team activity: Team Introductions
  • HDFS 
  • HBase
  • Kudu
  • Map Reduce
  • Spark, including SparkSQL and SparkML
  • Hive
  • Impala
  • Relational Database Management Systems
  • Spark streaming
  • Apache Flume
  • Apache NiFi
  • Apache Kafka
  • Oz Metropolitan
  • Architectural questions
  • Team activity:  Review Metroz Use Cases and Logical Architecture
  • Definition
  • Minimizing risk of an unsound architecture
  • Selecting a vertical slice
  • Team activity: Metroz Vertical Slice
  • Real time, near real time processing
  • Batch processing
  • Data access patterns
  • Delivery and processing guarantees
  • Data consistency and ACID transactions
  • Stream processing guarantees
  • Machine Learning pipelines
  • Team activity: Metroz Processing
  • Three V’s of Big Data
  • Data Lifecycle
  • Data Formats
  • Transforming Data
  • Team activity: Metroz Data Requirements
  • Scale up, scale out, scale to X
  • Determining if an application will scale
  • Poll: scalable airport terminal designs
  • Spark scalability and parallel processing
  • Scalable storage engines: HDFS, Ozone, Kafka and Kudu
  • Team activity: Scaling Metroz
  • Principles
  • Transparency
  • Hardware vs. Software redundancy
  • Tolerating disasters
  • Stateless functional fault tolerance
  • Stateful fault tolerance
  • Replication and group consistency
  • Application tolerance for failures
  • Team activity: Failures in Metroz
  • Principles
  • Security Architecture
  • Knox Security Architecture
  • Ranger Security Architecture
  • Setting security policies with Ranger
  • Threat Analysis
  • Team activity: Securing Metroz
  • Cluster sizing and evolution
  • On-premise vs. Cloud
  • Edge computing
  • Team activity: Deploying Metroz
  • Architecture artifacts
  • Team activity: Metroz Physical Architecture
  • Review of Uber and Lyft Big data platforms
  • Review of Metroz CDP solution architectures

ARCH-492: Architecting Cloudera Edge to AI Course Prerequisites

  • To gain the most from the workshop, participants should have working knowledge of popular Big Data and streaming technologies such as HDFS, Spark, Kafka, Hive/Impala, Data Formats, and relational database management systems. Detailed API level knowledge is not needed, as there will not be any programming activities and instead the focus will be on architecture design.

Discover the perfect fit for your learning journey

Choose Learning Modality

Live Online

  • Convenience
  • Cost-effective
  • Self-paced learning
  • Scalability


  • Interaction and collaboration
  • Networking opportunities
  • Real-time feedback
  • Personal attention


  • Familiar environment
  • Confidentiality
  • Team building
  • Immediate application

Training Exclusives

This course comes with following benefits:

  • Practice Labs.
  • Get Trained by Certified Trainers.
  • Access to the recordings of your class sessions for 90 days.
  • Digital courseware
  • Experience 24*7 learner support.

Got more questions? We’re all ears and ready to assist!

Request More Details

Please enable JavaScript in your browser to complete this form.

Subscribe to our Newsletter

Please enable JavaScript in your browser to complete this form.