ABCDEFGHIJKLMNOPQRSTUVWXYZAAAB
1
Topic S. No.TopicSub-topicActivities
Difficulty Level
Time in Minutes
Learning Outcomes
2
1Orientation1-min IntroductionDiscuss1001In this discusssion, students and instructor will introduce themselves.
3
1OrientationInvite and Add to Notion CurriculumLab2005In this lab, instructor will send a gmail invite of notion to students and students will join the notion workspace.
4
1OrientationUnderstand the CurriculumDiscuss10010In this discussion, Instructor will walk students through the curriculum and what all we are planning to learn in the training.
5
2Environment SetupInstall VS Code in Local ComputerVideo10015In this video, we will learn how to install VSCode in the local computer and then
6
2Environment SetupInstall SQL Workbench in Local ComputerLab20020In this lab, we will install Workbench tool in our local computer. This tool would help us in connecting with remote AWS SQL databases in upcoming labs.
7
2Environment SetupSetup Github Credentials in Local ComputerVideo20010In this video, we will learn how to setup the git credentials in the system. We will them follow the steps to setup our own credentials.
8
2Environment SetupCreate GitHub AccountLab1005In this video, we will learn how to create a GitHub account and then we will follow the steps to create our own github account.
9
2Environment SetupInstall Jupyter Lab in Local ComputerVideo20030In this video, we will learn how to install Jupyter lab in the computer and the follow the steps to install it in our own system.
10
2Environment SetupGetting Started with SageMaker Studio LabVideo10010In this video, we will learn about SageMaker studio lab environment and then create a request to get our own account.
11
2Environment SetupAccess to GitHub TeamsDiscuss1005In this discussion,Instructor will add all the students to the central github repo.
12
2Environment SetupSetup AWS Credentials in Local ComputerVideo20010In this video, we will learn how to setup the AWS credentials in the system. We will them follow the steps to setup our own credentials.
13
2Environment SetupCreate AWS Free-tier AccountLab20010In this video, we will learn how to create an AWS free-tier account and then follow the same steps to create our own brand new account.
14
2Environment SetupSnowflake Account SetupLab10030In this lab, we will setup a free-trial Snowflake account that will give us access to the Snowflake Warehouse system.
15
2Environment SetupLive Demo - Environment SetupDiscuss20010In this discussion, Instructor will show all the environment setup steps and how the fully setup environment looks.
16
2Environment SetupStudent’s demo - Environment SetupDiscuss10030In this discussion, students will demonstrate how they approached to setup the different steps of environment setup process.
17
2Environment SetupInvite and Add to Databricks WorkspaceLab10010In this lab, we will add students to the databricks workspace by sending email invites and signups.
18
2Environment SetupSetup Databricks Community EditionVideo10030In this assignment, we will sign-up to the Databricks Community Edition.
19
2Environment SetupSetup Preset BI Visualization Cloud AccountRead10020In this assignment, we will sign-up to the Preset cloud service. We will be deploying our BI dashboards here.
20
2Environment SetupSetup Airflow in Local ComputerLab20060Setup Airflow Orchestration tool in local computer.
21
2Environment SetupSetup Kafka in Local ComputerLab20030Setup Kafka Streaming tool in the local computer.
22
2Environment SetupSetup DBeaver in Local ComputerLab10010Setup DBeaver database access tool in the local computer.
23
2Environment SetupSetup AWS Managed Airflow (MWAA)Lab20030Setup AWS Managed Airflow (MWAA)
24
3Theory and NotesHow to become a Data EngineerVideo10010This short video will help us in setting the right expectations in terms of pre-requisites and job market.
25
3Theory and NotesData Engineering RoadmapRead2005Get an overview of how data engineering path looks. This roadmap doesn’t include all the components but still provides valuable information.
26
3Theory and NotesGetting Started with Google ColabVideo10010Learn about the basic of google colab notebook environment.
27
3Theory and NotesHow Data Engineering WorksVideo10015Find answers to the very basic questions - what is data engineering and how it works?
28
3Theory and NotesA Hypothetical Case StudyRead10010This case study would help us in understanding some roles and requirements of data engineering in organizations.
29
3Theory and NotesData Model DesigningVideo1005In this short video, we will learn some basics about data modeling.
30
3Theory and NotesData Modeling OverviewRead20020This blog post explain the basic concepts of data modeling and also explains using a retail case study.
31
3Theory and NotesQueriesRead30030This excerpt from a book would help us in understanding the queries in great detail. Queries are fundamental to the work of data engineering.
32
3Theory and NotesData ModelingRead30030This excerpt from a book help in understanding the data modeling concepts.
33
3Theory and NotesETL vs ELTRead10015In this article, we will learn about the difference between ETL and ELT concepts.
34
3Theory and NotesTransformationsRead10030Transformations
35
3Theory and NotesData EngineeringRead10030Data Engineering
36
3Theory and NotesWhiteboarding Meeting Case StudyRead10030Whiteboarding Meeting Case Study
37
3Theory and NotesLakehouse ArchitectureRead10030Lakehouse Architecture
38
3Theory and NotesETL ProcessRead10015ETL Process
39
3Theory and NotesEmerging Architectures for Modern Data InfrastructureRead30030Read the article to understand the emerging data pipelines and architectures.
40
3Theory and NotesBig Data Architectures — Lamdba & KappaRead20030Big Data Architectures — Lamdba & Kappa
41
3Theory and NotesBuilding Data Mesh Architectures on AWSRead30030
Data-first organizations are increasingly curious about a data mesh architecture. Learn how to design, build, and operationalize a data mesh architecture on AWS, one that helps businesses navigate their data challenges, optimize analytics processes, and deliver insights to the business faster.
42
3Theory and NotesSerialization and Compression Technical DetailsRead10030Serialization and Compression Technical Details
43
3Theory and NotesThe Future of Data EngineeringRead10030The Future of Data Engineering
44
3Theory and NotesData Warehousing SchemasRead10030Data Warehousing Schemas
45
3Theory and NotesFull Data Stack ObservabilityRead10030Full Data Stack Observability
46
3Theory and NotesThe Big Book of Data Engineering with DatabricksRead300300The Big Book of Data Engineering with Databricks
47
3Theory and NotesData SecurityRead20015Data Security
48
4Databases and WarehousesSetup AWS RDS Postgres DatabaseLab10030In this lab, we will setup yet another SQL database named Postgres using the same Amazon’s RDS service.
49
4Databases and WarehousesAWS DynamoDBVideo20090AWS DynamoDB
50
4Databases and WarehousesIntro to Data Warehousing with Amazon RedshiftVideo1005In this short video, we will learn about Amazon Redshift warehouse.
51
4Databases and Warehouses
Create Amazon Redshift Warehouse and connect with Python
Lab20020In this lab, we will create an Amazon Redshift Warehouse cluster and connect to this cluster with Python. We will run some query on top of it.
52
4Databases and WarehousesBuilding a Data Lake on S3Lab400120In this lab, we will create a full data lake on S3 and using other AWS resources as required.
53
4Databases and WarehousesExcel's Data ModelVideo20030Self-paced
54
4Databases and WarehousesData Warehouse Schema Design ConceptsVideo10020This video would help in quick revision of basic database and data warehouse concepts. Feel free to skip and fast-forward the sections to save time.
55
4Databases and WarehousesAWS RDS OverviewVideo1005These 2 short videos would help us in understanding AWS RDS service.
56
4Databases and WarehousesQuery Postgres with PythonLab30060In this lab, we will perform CRUD operations with the cloud Postgres database that we created earlier. We will use Python for this.
57
4Databases and WarehousesAWS RDS and Secrets ManagerLab20030
In this lab, we will store our database credentials in AWS Secrets Manager and Retrieve it back programatically using Python. Then we will run queries to our RDS cloud database to fetch some data.
58
4Databases and WarehousesSetup AWS RDS MySQL DatabaseLab20045In this lab, we will setup our MySQL database on cloud using Amazon’s RDS service.
59
4Databases and WarehousesAurora vs MySQLVideo10015In this short video, we will learn the difference between two very popular databases - MySQL and Aurora.
60
4Databases and WarehousesSnowflake Data Query with PythonLab20060In this lab, we will connect to snowflake from python and run SQL queries, fetch the data and analyse it.
61
4Databases and WarehousesDatabricks SQL Warehouse and Dashboard OverviewLab20010In this lab, we will learn about Databricks SQL by exploring warehouse, run SQL queries and creating dashboards. In part 1, we will overview the Databricks Warehouse.
62
4Databases and Warehouses
Databricks SQL Warehouse and Dashboard Part 2 - Hands-on
Lab20030
In this 2-part lab series, we learn about Databricks SQL by exploring warehouse, run SQL queries and creating dashboards. In this part 2, we learn the tool and process with hands-on practice.
63
4Databases and Warehouses
Create Amazon Redshift Warehouse and query with GUI
Lab20030In this lab, we will create an Amazon Redshift Warehouse cluster (or reuse existing one) and run simple SQL queries via Redshift Editor GUI on the browser.
64
4Databases and WarehousesAmazon Aurora Serverless v2Video20015Amazon Aurora Serverless v2
65
5Data Lakes and LakehousesLearn Databricks SQL from the ExpertsVideo20060Self-paced
66
5Data Lakes and LakehousesDatabricks SQL DemoVideo10030Self-paced
67
5Data Lakes and LakehousesS3 Data Lake OverviewRead20015This short read explains the S3 data lake concepts.
68
5Data Lakes and LakehousesBig Data and Data LakeRead20030This reading will build the foundation of data warehouse and data lakes.
69
5Data Lakes and LakehousesGetting Started with Databricks SQLVideo10030Getting Started with Databricks SQL
70
5Data Lakes and LakehousesDatabricks LakehouseVideo1005This short video introduces us to the Data Lakehouse concept.
71
5Data Lakes and LakehousesDelta Lake on DatabricksLab40030Learn about one of the innovative tool called delta lake and how it works. Interact with this lake in the databricks runtime environment.
72
5Data Lakes and LakehousesAWS User IAM and S3 BucketLab10050
In this lab, we will create an IAM user and give it the AdministratorAccess Permission. Then we will add this user credentials in our local IAM configuration. Then we will create an S3 bucket using this role and upload datasets using “aws s3 sync” and “aws s3 cp” commands.
73
5Data Lakes and LakehousesDelta Lakes BasicsVideo1005Understand the basics of delta lake
74
5Data Lakes and LakehousesAd Hoc Queries with Amazon AthenaRead30060Read this to understand Athen in detail.
75
5Data Lakes and LakehousesData Lake Tool ComparisonRead20015Data Lake Tool Comparison
76
5Data Lakes and LakehousesData LakesRead20010Data Lakes process flow
77
5Data Lakes and LakehousesWhy Lakehouse over Data warehouseRead10015Why Lakehouse over Data warehouse
78
6Data IngestionSnowpipe: Load data fast, analyze even fasterVideo30060
Getting the volume and variety of today’s data into your data warehouse is paramount to obtain immediate, data-driven insight. Unfortunately, legacy data warehouses require batch-oriented loading and scheduling at off-peak times to avoid contention with the crucial needs of data analytics users. Snowpipe is a new data loading service for Snowflake that significantly improves the process of making data available for analysis.
79
6Data IngestionAccelerating Data Ingestion with Databricks AutoloaderVideo20060Accelerating Data Ingestion with Databricks Autoloader
80
6Data IngestionDatabricks Autoloader Hands-on ExercisesLab300120Databricks Autoloader Hands-on Exercises
81
7
Data Processing and Transformation
DBT Tutorial (data built tool)Video1005In this video, we will learn about the basics of dbt tool.
82
7
Data Processing and Transformation
Data Transformation with PySparkVideo30090Watch this video series to get equipped with the basics of PySpark.
83
7
Data Processing and Transformation
Overview of Amazon EMR ServerlessVideo1005Understand what EMR is and what are the pros/cons of its serverless feature.
84
7
Data Processing and Transformation
AWS Lambda IntroductionVideo20010Watch the video to learn some basics of AWS Lambda.
85
7
Data Processing and Transformation
AWS Elastic Beanstalk Vs LambdaVideo1005Watch the video to get some idea of Elastic Beanstalk and how it is different from Lamdba.
86
7
Data Processing and Transformation
Data Engineering using Spark, Python and SQLVideo200800Collection of 400+ short videos.
87
7
Data Processing and Transformation
COVID-19 Data Engineering PipelineLab400180COVID-19 Data Engineering Pipeline
88
7
Data Processing and Transformation
Data TransformationRead30030
This excerpt from a book would help us in understanding the data transformation concepts in great detail. Data transformation is super essentials and falls in the plate of data engineering teams.
89
7
Data Processing and Transformation
SQL Zoo Fundamental SeriesLab30090In this tutorial series, we will learn about the fundamentals of SQL with hands-on practice.
90
7
Data Processing and Transformation
Spark in Action, Second EditionRead300300
Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms.
91
7
Data Processing and Transformation
Data Analysis with Python and PySparkRead300300
Data Analysis with Python and PySpark helps you solve the daily challenges of data science with PySpark. You’ll learn how to scale your processing capabilities across multiple machines while ingesting data from any source—whether that’s Hadoop clusters, cloud data storage, or local data files. Once you’ve covered the fundamentals, you’ll explore the full versatility of PySpark by building machine learning pipelines, and blending Python, pandas, and PySpark code.
92
7
Data Processing and Transformation
PySpark Fundamentals Hands-on PracticeLab30045Learn about the basics of PySpark with hands-on practice.
93
7
Data Processing and Transformation
DBT - Know moreRead20030DBT - Know more
94
7
Data Processing and Transformation
How to Integrate the “Great Expectations” Python Library with Databricks
Lab30060
Great Expectations is an amazing python library for data quality. It comes with integrations for Apache Spark, and dozens of preconfigured data expectations. Databricks is a top-tier data platform built on Spark. So you'd expect them to integrate seamlessly, but that is not quite the case.
95
7
Data Processing and Transformation
AWS Glue StudioLab20015Create an ETL job on PySpark distributed cluster using Glue Studio visual editor. Verify the result with Athena.
96
7
Data Processing and Transformation
Transform your data with dbt and Serverless architecture
Read30020Read this case to learn how organizations use dbt for transformation.
97
7
Data Processing and Transformation
Four Reasons that make DBT a great time saver for Data Engineers
Read20010Four Reasons that make DBT a great time saver for Data Engineers
98
7
Data Processing and Transformation
Python and SQL Hands-on ExercisesLab200NAPython and SQL Hands-on Exercises
99
7
Data Processing and Transformation
Hadoop and Spark Hands-onLab300480Hadoop and Spark Hands-on
100
7
Data Processing and Transformation
Python for Data EngineersRead10010Python for Data Engineers