Building Data Lakes on AWS (ANBDLK)
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Module 1: Introduction to Data Lakes
- Value and components of data lakes
- Common architectures
Module 2: Data Ingestion, Cataloging, and Preparation
- Data lake storage and ingestion
- AWS Glue crawlers and data catalog
- Lab: Set up a simple data lake
Module 3: Data Processing and Analytics
- Data processing with AWS Glue
- Analyzing data with Amazon Athena
Module 4: Building a Data Lake with AWS Lake Formation
- Features and benefits of AWS Lake Formation
- Lab: Build a data lake
Module 5: Additional Lake Formation Configurations
- Automate data lake creation
- Visualize data with Amazon QuickSight
- Lab: Data visualization using Amazon QuickSight
Module 6: Architecture and Course Review
- Post course knowledge check
- Architecture review
Building Batch Data Analytics Solutions on AWS (DABATC)
Module 1: Introduction to Amazon EMR
- Using Amazon EMR in analytics solutions
- EMR cluster architecture
- Interactive Demo: Launching an EMR cluster
Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage
- Storage optimization
- Data ingestion techniques
Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR
- Apache Spark use cases
- Spark concepts
- Interactive Demo: Connect to an EMR cluster
- Practice Lab: Low-latency data analytics using Apache Spark
Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive
- Using Hive to process batch data
- Practice Lab: Batch data processing with Hive
Module 5: Serverless Data Processing
- Serverless data processing with AWS Glue
- Practice Lab: Orchestrate data processing with AWS Step Functions
Module 6: Security and Monitoring of Amazon EMR Clusters
- Securing EMR clusters
- Interactive Demo: Client-side encryption with EMRFS
- Monitoring and troubleshooting
Module 7: Designing Batch Data Analytics Solutions
- Batch data analytics use cases
- Activity: Designing a batch data analytics workflow
Module B: Developing Modern Data Architectures on AWS
- Modern data architectures
Building Data Analytics Solutions Using Amazon Redshift (DAREDS)
Module 1: Using Amazon Redshift in the Data Analytics Pipeline
- Overview of Amazon Redshift
- Setting up your data warehouse
Module 2: Introduction to Amazon Redshift
- Architecture and features
- Interactive Demos and Practice Labs
Module 3: Ingestion and Storage
- Techniques and demos
- Practice Labs
Module 4: Processing and Optimizing Data
- Transformation and querying
- Resource management
Module 5: Security and Monitoring of Amazon Redshift Clusters
- Securing and monitoring clusters
Module 6: Designing Data Warehouse Analytics Solutions
- Use case review and workflow design
Building Streaming Data Analytics Solutions on AWS (DASTRM)
Module 1: Using Streaming Services in the Data Analytics Pipeline
- Importance and concepts of streaming data analytics
Module 2: Introduction to AWS Streaming Services
- Kinesis and MSK
- Demos and Practice Labs
Module 3: Using Amazon Kinesis for Real-time Data Analytics
- Workloads and streams
- Demos and Practice Labs
Module 4: Securing, Monitoring, and Optimizing Amazon Kinesis
Module 5: Using Amazon MSK in Streaming Data Analytics Solutions
- Use cases and clusters
- Demos and Practice Labs
Module 6: Securing, Monitoring, and Optimizing Amazon MSK
Module 7: Designing Streaming Data Analytics Solutions
- Use case review and workflow design
This course equips you with the skills to build and manage data lakes, batch data analytics, and streaming data analytics solutions on AWS efficiently