1 of 30

Lecture 36

Conclusion

Summer 2022

2 of 30

Announcements

  • Final Exam tomorrow (8/12) from 10am-1pm
    • [ACTION REQUIRED] Email data8@berkeley.edu if you have not received your seating assignment.
    • At this time, we cannot make any changes to exam times. Students who cannot take the exam due to unforeseen circumstances will need to request an incomplete.
  • Kevin’s OH will take place from 11:30a-12:30p @ Soda 511
  • Ellen’s OH will take place from 2pm-3pm @ Zoom
  • 12pm-2pm OH today cancelled for review session, other OH as usual

3 of 30

Meme Monday - Thursday Edition

4 of 30

5 of 30

What's Next?

6 of 30

Fall 2022 Connector Courses

6

Data C88S (Stat 88)

Prob and Stats in Data Science

UGBA 88

Data and Decision

PHYSICS 88

Data Science Applications in Physics

Data C88C (CS 88)

Computational Structures

DATA 88E

Economic Models

POLISCI 88

Scientific Study of Politics

EPS 88

Python and Earth Science

LEGALST 88

Taking Measure of the Justice System

7 of 30

DSUS Student Teams

7

Infrastructure

Peer Consulting

Hone your skills as an educator and data scientist by working with Data Science Undergraduate Studies

Improve autograding and DataHub software to support courses across campus

Help fellow undergrads with data research, academic work, and data science technology.

External Pedagogy

Create a national community of practice for institutions to work with and learn from each other.

8 of 30

DSUS Student Teams

8

Connector

Assistants

Modules

Help instructors of Data Science Connector courses deliver and teach material.

Create curriculum materials for Connectors, Data-Enabled Courses, or short explorations into DS (modules).

Human Context and Ethics

Integrate critical thinking about ethical issues in relation to technology into the Berkeley data science program and community.

Hone your skills as an educator and data scientist by working with Data Science Undergraduate Studies

9 of 30

Data Science Discovery Research

9

9

Be a student researcher in a program that connects students with hands-on data science research- non-profits, start-ups, institutions, etc. Students from underrepresented minority groups and first-time researchers receive priority.

https://data.berkeley.edu/discovery

10 of 30

Programming

The programming content in Data 8 is part of what you’ll learn about programming in CS 88 or CS 61A.

What’s left?

  • How to write larger programs and think about them.
  • The concepts and language features that support writing larger programs.
  • How programming languages are executed.

CS 88 is 3 out of 4 units of CS 61A, but with more connections to data science in the examples.

11 of 30

Human Contexts and Ethics

Data science studies the real world, and there are important ethical considerations in doing so.

  • The impact of data collection and analysis
  • Fairness and bias in both data collection and prediction
  • Institutions that use data, such as companies & gov’t
  • The relationship between data and the law
  • Frameworks for reasoning about these complex issues

Data C104 and Info 188 are the most popular courses for students continuing from Data 8.

12 of 30

Probability

The probability content in Data 8 is part of what you’ll learn about probability in a lower-div probability course:�Data C88S (Stat 88), CS 70, Math 10B or 55, CivEng 93

While the Data Science major does not require a lower-division probability course, taking one is a good idea.

  • Understanding random events and probabilities for both categorical and numerical variables.
  • Concepts for reasoning about randomness.
  • Characteristics of commonly encountered distributions.

13 of 30

Data 100

Prerequisites: Data8 & programming

Co-requisite: Linear algebra

You are recommended to have taken linear algebra/lower-division probability before.

Very much a sequel to Data 8:

  • Data manipulation and visualization
  • Linear regression, but with multiple variables
  • Prediction and inference

14 of 30

Beyond Courses

While we have great resources in school, sometimes you can’t wait. There are amazing resources online:

  • Kaggle (Datasets)
  • Google Datasets Search (Datasets)
  • Analytics Vidhya (Blog/Tutorial)
  • Towards Data Science (Blog/Tutorial)
  • EdX/Coursera (MOOC)*

Get your hands dirty by trying it yourself!

15 of 30

Course Staff + AMA!

16 of 30

Data Science

17 of 30

History of Data Science

The history of data science is closely tied statistics & computer/electrical engineering)

  • Roots of regression/least squares date back to 1800s
  • 1962: Data Science term was coined first in a book describing the science to analyze large amounts of data
  • Early 2000s: Software-as-a-service & databases
  • 2010s: Powerful ML/analysis computation tools and further collection of data through Internet of Things

18 of 30

Why Data Science

  • Unprecedented access to data means that we can make new discoveries and more informed decisions
  • Computation is a powerful ally in data processing, visualization, prediction, and statistical inference
  • People can agree on evidence and measurement
  • Data and computation are everywhere: understanding and interpreting are more important than ever

19 of 30

Limitations of Data Science

  • Evidence and measurements are critical ingredients for good decision-making
    • ...but they’re not enough by themselves!
  • Data science is a powerful complement to qualitative analysis
    • ...but it’s not a replacement!

20 of 30

How to Analyze Data

Begin with a question from some domain, make reasonable assumptions about the data and a choice of methods.

Visualize, then quantify!

Perhaps the most important part: Interpretation of the results in the language of the domain, without statistical jargon.

21 of 30

How Not to Analyze Data

Begin with a question from some domain, make reasonable assumptions about the data and a choice of methods.

Visualize, then quantify!

Perhaps the most important part: Interpretation of the results in the language of the domain, without statistical jargon.

22 of 30

How to Analyze Data after Data 8

Begin with a question from some domain, make reasonable assumptions about the data and a choice of methods.

Visualize, then quantify! Do both using computation.

Perhaps the most important part: Interpretation of the results in the language of the domain, without statistical jargon.

23 of 30

The Design of Data 8

  • Table manipulation using Python
  • Working with whole distributions, not just means
  • Decisions based on sampling: assessing models
  • Estimation based on resampling
  • Understanding sampling variability
  • Prediction

24 of 30

Fun Readings

25 of 30

Food for Thought

26 of 30

27 of 30

One Last Thought

28 of 30

Kevin’s Journey Through Cal

2017: Freshman in Bio & Latin

2018: Took Data 8 & Declared Data Science (After 3 switches)

2020: Joined Staff 🧑‍🏫 (& Became Friends w/ Ellen 🙌)

2020: Started Research via URAP

2020: Dropped Data Science 📈 & Declared Computer Science 🖥

2021: Started Master’s in EECS 🖥 & Course Development (DS198 & Data 6 & TU-UCB Partnership)

2022: Lecturing Data 8 🐻 & Starting R&D Job 💼

2025: … ?

29 of 30

Thank You

30 of 30

Special Thanks

Students: You!

Leads: Emily, Joshua, Oscar, Padma, Prasann | uGSIs: Angela, Atticus, Jeffrey, Katherine, Kelsey, Kinsey, Noah, Paul, Ron, Samiha, Sara, Yiyan | Tutors: Aarushi, Andrea, Angela, Carrie, Evan, Jacob, Julianna, Kevin, Kristen, Lucy, Mangai, Mingsha, Richard, Selina, Stephen, Vaidehi, Zach | Readers: Ethan, Jonathan, Karen, Lisa, Natalie, Nikil, Raymond, Rebecca, Zayan

Faculty: Ani Adhikari, David Wagner, John DeNero, Swupnil Sahai

Advising/Support: DS Advising and Fred Smith

Infrastructure: Balaji Alwar, Eric Vandusen, Yuvi Panda