Published using Google Docs
Class: Data & AI from First Principles
Updated automatically every 5 minutes

Data & AI from First Principles

Data & AI from First Principles

Overview

Introduction

How to Progress Through a Session

Foundation

Application

Technical Discussion

Value Discussion

The Book

Schedule

7/12 Session 0: Class Kickoff

7/19 Session 1: Data Systems I: Storage and Retrieval

Foundation

Application

Technical Discussion

Value Discussion

7/26 Session 2: Data Systems II: Consistency and the Cloud

Foundation

Application

Technical Discussion

Value Discussion

8/2 Session 3: Data Systems III: Processing and Streaming

Foundation

Application

Technical Discussion

Value Discussion

8/9 Session 4: The Lakehouse

Foundation

Application

Technical Discussion

Value Discussion

8/16 Session 5: MLOps

Foundation

Application

Technical Discussion

Value Discussion

8/23 Session 6: The Modern Data Architecture

Foundation

Application

8/30 Session 7: Managing Data in the Organization

Foundation

Application

Technical Discussion

Value Discussion

Overview

Introduction

We will focus specifically on timeless ideas that allow you to present tradeoffs specific to a use case’s needs. These concepts will not change even as the industry, products, and roles continue to evolve.

We will not be discussing technical how-to. For example, “why am I getting this Spark bug” are off-topic. You will have plenty of opportunities to learn technical implementation through regular experience, certification programs, and other field enablement. That being said, we will get very low-level and technical to reach our ultimate goal of understanding the “why” behind the tech.

We’ll start with history and theory and gradually use this foundation to learn the Databricks product and the modern ecosystem. Many of the readings are multiple years old, but their relevance today is a testament to their timelessness and therefore critical if we want to truly think from first principles.

How to Progress Through a Session

Foundation

The foundation includes classic chapters, papers, and blogs contributed to the field of data and analytics. While some of these were written by Databricks founders and employees, they are all vendor-agnostic contributions to the world. Some of them are many years old but were the original start to a new trend.

Application

Learn about the given products using your preferred learning style. This could be coding/tinkering or reading/listening to focus on areas you’re curious about. You will have more time in your day job to troubleshoot and implement specifics, but this class is about becoming aware of areas you haven’t had a chance to focus on so far. The goal is not to become an expert in everything but to create pointers in your head to areas you will later get deep into as you work.

Technical Discussion

We will have a small group discussion on technical topics that set the stage for why the product is valuable in a new way. Questions are open-ended and have multiple right answers. The goal is not to show off. In this training, we don’t need to memorize all the details to repeat to a customer, because you will learn this through other enablement.

Value Discussion

We will open up to the “why” behind our product, and practice articulating this value to different audiences with diverse needs. This is not “pitch training” – it’s about truly understanding the ideas behind products in the industry. Questions are open-ended and have multiple right answers.

The Book

In earlier Sessions, we will be reading Martin Kleppman: Designing Data-Intensive Architectures (2017)

Schedule

7/12 Session 0: Class Kickoff

We’ll discuss the class structure and answer questions.

7/19 Session 1: Data Systems I: Storage and Retrieval

Foundation

Application

Technical Discussion

Value Discussion

7/26 Session 2: Data Systems II: Consistency and the Cloud

Foundation

Application

Technical Discussion

Value Discussion

8/2 Session 3: Data Systems III: Processing and Streaming

Foundation

Application

Technical Discussion

Value Discussion

8/9 Session 4: The Lakehouse

Foundation

Application

Technical Discussion

Value Discussion

8/16 Session 5: MLOps

Note: in the interest of time, we will not be covering model training in this course. For a theoretical introduction to model training, read “ISLR.” For a more practical introduction, try Andrew Ng’s ML Courses. For deep learning specifically, use Jeremy Howard’s fast.ai course

Per the Google 2015 paper: “Only a small fraction of real-world ML systems is composed of the ML code [...]. The required surrounding infrastructure is vast and complex.”

Foundation

Application

Technical Discussion

Value Discussion

8/23 Session 6: The Modern Data Architecture

Foundation

Application

8/30 Session 7: Managing Data in the Organization

Foundation

Application

Technical Discussion

Value Discussion