1 of 13

Ambuj K. Singh

Capstone Panel

2021 National Workshop on Data Science Education

2 of 13

Background

  • Funded by NSF Harnessing �Data Revolution Grant

  • Promoting Data Science �across Central Coast
    • UCSB
    • Cal Poly
    • SBCC
    • CSUSB

  • Fellows at UCSB and Cal Poly

  • Summer internships at UCSB

  • DS pipeline

2

3 of 13

Data Science Curriculum

3

4 of 13

Data Science Capstone

  • CMPSC 190DD/DE/DF and PSTAT 197A/B/C
    • 3-quarter sequence, cross-listed in CS and PSTAT
    • 55 students in 2020-2021. Increase to 80 in 2021-2022.
    • 12 sponsored group projects at UCSB in 2020-2021

4

Joint capstone presentation with �Cal Poly on June 2

5 of 13

Objectives

  • Prepare students with data-intensive methodologies. Study machine learning, fairness, ethics, and responsible data practice. Major focus on interpreting, explaining, and communicating the results of analyses.

  • Upon completing the course sequence, students will be able to understand the data science process and the structure and the role of each of its constituent steps; engineer the appropriate machine learning method for a given problem; design and implement evaluation studies to compare the quality of performed data analysis; understand technical trade-offs associated with working with “Big Data”; understand ethical implications of data science work; and visualize the results of data analytical studies.

5

6 of 13

Fall Topics

  • Machine learning
    • Methods
    • Software infrastructure & programming
  • Fairness
  • Privacy
  • Statistical traps and causal learning
  • Reading
    • Textbook: The Ethical Algorithm: The Science of Socially Aware Algorithm Design, M. Kearns and A. Roth, Oxford University Press, 2019.
    • Collection of papers and online material

6

Grading

  • Class participation
  • Homework sets
  • Mini project

7 of 13

Fall Mini Project Goals

  • Understand the entire cycle from data to model to public policy
  • Estimate dynamics parameters
  • Understand intervention and fairness
  • Wastewater based epidemiology

7

Choi et al, Wastewater-based epidemiology biomarkers: Past, present and future,

TrAC Trends in Analytical Chemistry, Volume 105, 2018, Pages 453-469

8 of 13

Causal Relationships in Disease Spread

  • Social vulnerability affects exposure and comorbidity
  • Exposure affects incidence
  • Comorbidity affects viral load

8

Model disease dynamics

Intervene in a fair manner

9 of 13

Winter and Spring Leads

  • Instructors:
    • Alex Franks, Statistics and Applied Probability
    • Sang-Yun Oh, Statistics and Applied Probability
  • Other faculty mentors
    • Kate Kharitonova, Computer Science
    • Mike Ludkovski, Statistics and Applied Probability
    • Ambuj Singh, Computer Science
  • Project/Company mentors
  • Graduate TAs
    • Joshua Bang, Statistics and Applied Probability
    • Chau Tran, Statistics and Applied Probability
    • Jiajing Zheng , Statistics and Applied Probability
    • Sikun Lin, Computer Science

9

10 of 13

Winter and Spring Sponsors

10

11 of 13

Winter and Spring Projects

  • Projects
    • AI: Teaching a Machine to Learn Math (AppFolio)
    • Larval Fish Assemblage as an Indicator to Predict the Fisheries Catch (CalCOFI)
    • Speech and Text Analysis (Invoca)
    • Climate Change and Young Fish: the Relationship between pH and aspects of larval fish assemblage in the California Current (CalCOFI)
    • Mil Familias Data Analysis and Communication: Impact of Diabetes in the Latino Community (SDRI)
    • Mining Criminal Records based on HTML Data (Carpe Data)
    • How Pixel Differences can Affect Sensors in Self-driving Cars (FLIR)
    • Developing a data set for each of the 28 sites of the Long Term Ecological Research (LTER) Network  (NCEAS)
    • UCSB Undergraduate Alumni Tracking  (CSEP)
    • Bees + Flowers / GloBI Interactions (CCBER)
    • Tracking activity patterns in those recently infected by COVID-19 (Evidation Health)
    • Exploring and understanding a rare genetic mutation that causes early-onset Alzheimer’s (NRI)

11

12 of 13

Winter and Spring Curriculum

  • Reading assignments and class discussions
    • Textbook: Think Like a Data Scientist, by Bryan Godsey
    • Additional targeted readings
      • Data ethics
      • Data representation and figure design

  • Weekly meetings with faculty and company/project mentors

  • Project (video) presentations
    • Mid-term group presentation at the end of Winter
    • Final presentation in Spring
      • Video overview
      • Lighting talk
      • Written report (documenting �progress, code, etc.)
      • Presentation to sponsor/company

12

13 of 13

  • Diversity in student background is a challenge
    • Bootcamp planned in 21-22 prior to Fall course
  • High mentoring load
    • Difficult to scale
  • Excellent complement to classroom instruction

13

Lessons Learned