National Workshop

on Data Science Education

Foundations of Data Science: An Overview

Ani Adhikari

adhikari@berkeley.edu

24 June, 2019

National Workshop

on Data Science Education

Foundations of Data Science: An Overview

Ani Adhikari

adhikari@berkeley.edu

24 June, 2019

Fundamental Principle

All students should have the opportunity to learn how to

- reason sensibly based on data,
- make and interpret inferences based on data,
- and think critically about social and ethical implications of data,

regardless of their chosen area of specialization.

This Session

- Data 8 and beyond
- Pedagogical approach
- Student response

Putting it into Practice

- Start with freshmen
- No prerequisites

- Teach data science as a way of thinking:
- Computational thinking
- Inferential thinking

- Craft a program that builds on this foundation
- higher level courses in data science, including ethics
- courses that use data science in other fields

Connector Courses

- “A connector course lets you weave together core concepts and approaches from Data 8 with complementary ideas or areas.”
- Can be taken concurrently with Data 8, or later
- Some connectors use Data 8 ideas in a domain; others go deeper into the theory and computation

data.berkeley.edu/education/courses

Human Context and Ethics

“HCE education explores how human, social, and institutional structures and practices shape technical work around computing and data, as well as how data, data analytics, machine learning, artificial intelligence, and computing permeate and shape our individual and social lives.”

https://data.berkeley.edu/education/human-contexts-and-ethics

More Classes, a Major, and a Minor

- Data 8: Foundations of Data Science (data8.org)
- Data 100: Principles and Techniques of Data Science (ds100.org)
- Prob 140: Probability for Data Science (prob140.org)
- Data 102: Data, Inference, and Decisions (under construction; pilot in Fall 2019)

This Session

- Data 8 and beyond
- Pedagogical approach
- Student response

Statistics in the 21st century

“a reinvention of statistical education in the era of pervasive computation”

– Data Science Education Rapid Action Team, 2015

How to Teach Inferential Thinking

Written in Berkeley in the 1970’s, FPP transformed the way statistics is taught.

Analyzing Data: Three Main Steps

- The question, from some domain; reasonable assumptions about the data; choice of method

- Visualization and calculations

- Interpretation of the results in the language of the domain, without statistical jargon

A Common Approach

- The question, from some domain; reasonable assumptions about the data; choice of method

- Visualization and calculations

- Interpretation of the results in the language of the domain, without statistical jargon

FPP Approach

- The question, from some domain; reasonable assumptions about the data; choice of method

- Visualization and calculations

- Interpretation of the results in the language of the domain, without statistical jargon

Data 8 Approach

- The question, from some domain; reasonable assumptions about the data; choice of method

- Visualization and computation

- Interpretation of the results in the language of the domain, without statistical jargon

Data 8 Motto

Visualize, then Quantify

Approach to Computing

- Assume no programming background
- Students should not need a local installation
- Students should be able to interact with data during lecture
- Should be easy for students to experiment, recover from mistakes, create a narrative
- Should be easy for staff to distribute datasets and exercises
- System should incorporate some auto-grading

Environment and Materials

- Jupyter notebooks; Python 3

- JupyterHub
- Multi-user server for Jupyter notebooks
- Browser-based computation in the cloud

Source: https://github.com/data-8

New Python datascience library: https://pypi.python.org/pypi/datascience/

Some Benefits

- Working in a medium that students love
- Open, transparent analyses; the students have all the data
- Real and important problems
- Encourages collaboration

This Session

- Data 8 and beyond
- Pedagogical approach
- Student response

The Students, Spring 2019

- 49% first-years, 35% second-years
- 55% female
- 21% consider themselves to be a member of an underrepresented ethnic or racial minority within UC Berkeley
- Over 60 different majors
- At the start of the term, 38% said, “I have no skill at programming”

Student Response, Spring 2019

- I never thought I would ever code or program but this class made it really approachable.
- Learn[ed] to code in a way that I feel will actually be useful for me in the future, even as someone in a social sciences major.
- Loved the problem-solving skills this class taught me and how Data 8 showed me the various ways data science could be applied to multiple disciplines!
- Data manipulation helped change literally how I see the world.