1 of 28

Suraj Rampure

rampure@ucsd.edu

Data 6: Introduction to Computational Thinking with Data

A new pre-foundations course at UC Berkeley

2 of 28

Overview and history

3 of 28

Data 8: Foundations of Data Science

Data 8 in Spring 2021:

  • 1395 students
  • 58% had prior programming experience
  • 40% had prior programming and statistics experience

Idea: create a new, small-scale pre-Data 8 course for students who are interested in data science but not confident in their abilities to succeed in Data 8.

4 of 28

“Studying primarily humanities, I wanted to gain experience in another fairly unrelated sector (like coding) to help with potential careers, but was apprehensive about taking a large course like Data 8. Since I have no experience I was worried I would fail Data 8, thus this course seems much more doable and supportive.”

“I have always wanted to learn Python and other programming languages, but I did not want to struggle in a class full of expert programmers where I lacked the basic knowledge to even pass. This class allows me to actually learn CS in a setting where the teachers understand that I am starting from ground zero.”

“I really need the small learning environment that Data 94 provides in order to learn the basics of coding to succeed at higher level classes required for the Data Science Major while also being able to ask questions and be more involved in class.”

5 of 28

History

The course “Introduction to Computational Thinking with Data” is not brand new.

  • Summer 2017: Data 8R, taught by Henry Milner
  • Summer 2020: Data 6, taught by Ian Castro
  • Spring 2021: Data 94, taught by Suraj Rampure
    • This is the offering that will form the basis of the course moving forward; it is what we will discuss today
    • “94” is a temporary course code at Berkeley
    • Course website: http://data94.org

6 of 28

Course goals

Tangential goal: Give students the tools they need to work on projects of their own without having to take any future coursework.

🤩

Entice students to study data science further.

Prepare and build confidence in students for when they pursue data science coursework.

đź’Ş

7 of 28

Content

8 of 28

Data 8 syllabus

  • Tables are introduced almost immediately
  • Programming constructs are introduced as necessary, not all at the start
    • e.g. functions only introduced when covering applying
  • No while loops, dictionaries, recursion – instead, in-depth treatment of inference

9 of 28

Data 6 syllabus

Key differences from Data 8:

  1. Placement of tables (though, same datascience package)
  2. Additional programming constructs
  3. Placement and treatment of visualization

10 of 28

Placement of tables

  • In Data 6, tables are only introduced in Week 7, after fundamental programming constructs are covered
    • Done in hopes that students would grasp tabular manipulation more easily in this order
  • Issue: harder to motivate “data science” if tables don’t appear until later
  • Solution: interactivity and visualization

11 of 28

Additional programming constructs: while loops

  • Covered after if-statements and before for loops
  • Used as an opportunity to teach tracing
  • Allowed us to compare for loop and while loop implementations of the same procedure

Screenshot of lecture notes when tracing sum_first_n(4).

12 of 28

Additional programming constructs: lists

  • Indexing
    • Consequently, string indexing and manipulation
    • *Data 8 covers lists, but no list indexing
  • Discussed lists vs. arrays

Data 8 approach to creating arrays:

make_array(4, 9, 1, 2)

In Data 6:

13 of 28

Additional programming constructs: dictionaries

  • Discussed key-based vs. position-based data structures
    • Hoped this would ease the transition to tables and build computational thinking skills
  • Engaging examples (e.g. emojify)
  • Allowed us to discuss various file formats (e.g. JSON vs. CSV) before moving to tables
    • In an assignment, students indexed a JSON file from a Google Maps API query

Screenshot of lecture discussing a dictionaries example. (video)

14 of 28

Visualization

  • Different order
    • Data 8: scatter plots → line plots → bar plots → histograms
    • Data 6: bar plots → histograms → scatter plots → line plots
  • Less detailed treatment of histograms
    • Only frequency-based, rather than density-based
  • plotly instead of matplotlib
    • Interactive! (encourage others to adopt)
  • Maps
    • Covered in Data 8’s textbook, but not in any Data 8 assignments

Example map created in lecture, describing the year in which each Walmart in California was opened.

15 of 28

Synergy

  • Recall, the goal of Data 6 is not to replace Data 8, but to prepare students for it
  • Data 6 → 8 students will be familiar with most of the first module in Data 8
    • Can focus on the newer details, like detailed histograms
  • Second module onwards: Data 6 → 8 students will be exposed to new material, but their programming maturity will allow them to focus more on the inferential ideas and (hopefully) not get stuck with syntax issues

16 of 28

Logistics

17 of 28

Enrollment

  • Enrollment was by application only, to ensure that all students in the course were new to programming; questions asked:
    • “Have you ever programmed before? If yes, please elaborate.”
    • “What is your expected graduation semester?”
    • “Why do you want to take this class?”
    • Only criteria was that they were new to programming
  • Enrollment was capped to 30 students
    • 62 students applied
    • 45 were given seats
    • 25 started the class; 21 were in it by the end of the second week
    • 18 finished the class

18 of 28

Weekly schedule

Guiding principle: per campus, a 3 unit course should not require students to work more than 9 hours per week.

  • 3 hours of lecture per week (MWF), held on Zoom
  • 1 hour of lab per week (immediately following lecture on Friday), also on Zoom
    • No lab assignment; instead, ~20 minutes of review + ~30 minutes of working on homework
  • ~5 hours per week on homework
    • Like other DS courses, content was hosted on DataHub and linked from the course website (http://data94.org)
    • This target was met, per students’ responses on weekly surveys
  • 4 hours of (optional) office hours per week (2 instructor, 2 TA/tutor)

19 of 28

Quick Checks and Ed

  • To facilitate active learning, each lecture contained a few breaks for students to answer short questions, called “Quick Checks”
    • Short answer or MCQ, and graded on completeness, not correctness
    • Students had a few minutes to answer then we’d take up the solution together
  • Quick Checks were hosted on Ed
    • Also the platform used for discussion (in lieu of Piazza or Campuswire)

Screenshot of a Quick Check on Ed.

20 of 28

Homeworks and datasets

  • Homework was assigned roughly weekly (9 total)
  • Homeworks were Jupyter Notebooks interspersed with programming and interpretation questions
    • Used Otter Grader and Gradescope for grading
  • Homework 9: open-ended; students made GitHub Pages sites (example)

21 of 28

Examinations

  • 3 quizzes, each worth 5%, in lieu of 1-2 more substantial midterms
    • 1 each after Module 1, 2, and 3
    • Quiz 1 was in the form of a Jupyter Notebook; the prevalence of syntax errors made it hard to grade
    • Quiz 2 and 3 were in the form of Gradescope online assessments, featuring a mix of MC, short answer, and coding problems
  • Final exam: same format as Quiz 2 and 3
    • Clobber policy

Screenshot from Quiz 2.

22 of 28

Reception

23 of 28

Students’ understanding by topic

  • Data from end-of-semester survey
  • Students had an easier time with tables and vis than with earlier content
  • Students struggled with loops
    • We knew this during the semester from weekly surveys
  • Next time: for loops first; more time on beginning

24 of 28

Students’ future plans

  • Only 50% of the course will probably or definitely take Data 8
  • Even fewer will minor or major in DS
  • Most are interested in pursuing data science further
  • Factor: most students in our course were juniors or seniors, as a result of how we marketed the course

25 of 28

Students’ satisfaction

  • “How much do you feel like you learned in Data 94, in terms of new ideas and skills?”
    • Mean: 4.95/5
  • “How happy are you about your decision to take Data 94?”
    • Mean: 4.95/5
  • Not possible to conclude how well we prepared them for Data 8, since they haven’t taken it yet
    • Also can’t conclude if positive reception is due to content or small size

“This was an amazing course for a non–STEM major such as myself to really feel comfortable jumping in to Python. A class this small is such a rarity at Cal, especially for programming. It worked really well for me since I was mostly just looking to learn something new and challenge myself, but I also imagine this would be a great first step for someone wanting to continue with programming long–term. It was free from all the scary stereotypes that I'd heard from my friends in Data 8 or the intro CS series. I would advise just trusting the process, honestly. I felt very incompetent at first and while I'm still not that competent I learned way more than I thought I could.”

26 of 28

Moving forward

27 of 28

Future of Data 6

At Berkeley

  • Will be offered this summer at UC Berkeley, taught by Ian Castro (SU20 instructor) and Isaac Merritt (SP21 TA); largely the same course as the SP21 offering
    • Will be taught to students in the SEED program
  • Future after that currently unknown, but the goal is to have it be regularly offered

Elsewhere – who could adopt Data 6?

  • Places with a Data 8-like course that a subset of students without prior experiences struggle with
  • Places that want to teach an introductory programming course with a data-focus

28 of 28

Thanks!