Architecting Cloudera Edge to AI -

ARCH-492: Architecting Cloudera Edge to AI

Duration: 4 Days (32 Hours)

Overview

Course Details

Prerequisites

ARCH-492: Architecting Cloudera Edge to AI Course Overview:

Designing Edge to AI Applications” is an intensive four-day educational event tailored to delve into sophisticated aspects of big data architecture for crafting edge-to-AI applications. Encompassing streaming, operational data processing, analytics, and machine learning, this workshop offers a platform to unite technical experts in collaborative exploration of intricate solutions for demanding business challenges.

The workshop navigates through overarching big data architecture concerns before channeling them into the creation of an innovative system design. Through dynamic engagement, participants apply theoretical knowledge to real-world scenarios, fostering comprehensive discussions that yield detailed insights. This interactive setting facilitates the absorption of techniques for architecting robust big data systems, enriched not only by Cloudera’s expertise but also the collective experiences of peers.

Specifically, the workshop tackles advanced big data architecture facets such as data formats, transformations, transactions, real-time and batch processing, machine learning, scalability, fault tolerance, security, and privacy. By addressing these topics, the workshop systematically mitigates the risk of unsound architecture choices and suboptimal technology selections, culminating in a comprehensive and practical skillset.

Intended Audience:

Participants should mainly be architects, developer team leads, big data developers, data engineers, senior analysts, dev ops admins and machine learning developers who are working on big data or streaming applications and have an interest in how to design and develop such applications on CDP.

Learning Objectives of ARCH-492: Architecting Cloudera Edge to AI:

Cloudera Data Platform
Big Data Architecture
Building Scalable applications
Building Fault Tolerant Solutions
Security and Privacy Deployment on Public, Private, and Hybrid Cloud

Introduction

Team activity: Team Introductions

Technology Review

HDFS
HBase
Kudu
Map Reduce
Spark, including SparkSQL and SparkML
Hive
Impala
Relational Database Management Systems
Spark streaming
Apache Flume
Apache NiFi
Apache Kafka

Workshop Application Use Cases

Oz Metropolitan
Architectural questions
Team activity: Review Metroz Use Cases and Logical Architecture

Application Vertical Slice

Definition
Minimizing risk of an unsound architecture
Selecting a vertical slice
Team activity: Metroz Vertical Slice

Application Processing

Real time, near real time processing
Batch processing
Data access patterns
Delivery and processing guarantees
Data consistency and ACID transactions
Stream processing guarantees
Machine Learning pipelines
Team activity: Metroz Processing

Application Data

Three V’s of Big Data
Data Lifecycle
Data Formats
Transforming Data
Team activity: Metroz Data Requirements

Scalable Applications

Scale up, scale out, scale to X
Determining if an application will scale
Poll: scalable airport terminal designs
Spark scalability and parallel processing
Scalable storage engines: HDFS, Ozone, Kafka and Kudu
Team activity: Scaling Metroz

Fault-Tolerant Distributed Systems

Principles
Transparency
Hardware vs. Software redundancy
Tolerating disasters
Stateless functional fault tolerance
Stateful fault tolerance
Replication and group consistency
Application tolerance for failures
Team activity: Failures in Metroz

Security and Privacy

Principles
Security Architecture
Knox Security Architecture
Ranger Security Architecture
Setting security policies with Ranger
Threat Analysis
Team activity: Securing Metroz

Deployment

Cluster sizing and evolution
On-premise vs. Cloud
Edge computing
Team activity: Deploying Metroz

Software Architecture

Architecture artifacts
Team activity: Metroz Physical Architecture

Potential CDP Solutions

Review of Uber and Lyft Big data platforms
Review of Metroz CDP solution architectures

ARCH-492: Architecting Cloudera Edge to AI Course Prerequisites

To gain the most from the workshop, participants should have working knowledge of popular Big Data and streaming technologies such as HDFS, Spark, Kafka, Hive/Impala, Data Formats, and relational database management systems. Detailed API level knowledge is not needed, as there will not be any programming activities and instead the focus will be on architecture design.

Discover the perfect fit for your learning journey

Choose Learning Modality

Live Online

Convenience
Cost-effective
Self-paced learning
Scalability

Classroom

Interaction and collaboration
Networking opportunities
Real-time feedback
Personal attention

Onsite

Familiar environment
Confidentiality
Team building
Immediate application

Training Exclusives

This course comes with following benefits:

Practice Labs.
Get Trained by Certified Trainers.
Access to the recordings of your class sessions for 90 days.
Digital courseware
Experience 24*7 learner support.

Request Free Demo

Got more questions? We’re all ears and ready to assist!