DSTL Lab Showcase
June 6, 2025
Welcome!
Prof. Samuel Lau
Asst. Teaching Prof. in HDSI
Pandas Tutor
Data science curriculum
LLMs in CS + DS Education
About the Lab
ContentGen
(PL: Ylesia)
ClassBuzz
(PL: Owen)
Ayush, Jiaen, Gabriel
WISE
(PL: Andrew)
Michelle, Jack, Chris, Parna
Sathvika, Achintya
Sam supervises 3 teams
A typical year in the lab
Fall: Recruit + teams form
Winter: Full-speed ahead!
Spring: Present work, write papers
Summer: Write papers, brainstorm new project ideas
SIGCSE 2025 in Pittsburgh
A typical week in the lab
All-hands meeting / lab social
Project team meeting
PLs meet with Sam
Hear from our lab members!
ContentGen
Ylesia Wu, Ayush Shah, Gabriel Cha, Jiaen Yu
Demo clip 2
Entering API key + ~2 examples
Motivation
Project Process Overview
Code base overview
User opens a notebook
Process Notebook Structure
Frontend
Backend
Send Notebook Content
User selects a notebook cell
+
enters a message
Save to frontend
Send relevant notebook details
1st API call for question + answer generation
2nd API call for JSON format correction
Update chat interface
+
Insert new question cell
Send LLM response
Development Process
User Interviews
Instructors’ reaction to the tool
Usefulness
Ease of Use
Instructors’ reaction to the tool
Usefulness
Ease of Use
After Interview - Next Steps
The Eval
Prompts
v0.1.0:
Naive one-paragraph prompt
v0.1.1:
More comprehensive prompt
- Context
- Instructions
- Response Format
- Notebook Details
Prompts
v0.1.4: Notebook Structure
v0.1.1:
Test Cases
Eval Data
Eval Data
baseline
detailed instructions
detailed w/ notebook structure
Reflection + Next Steps
Thank You!
Any questions?
Try it out yourself!
pip install contentgen
Class[Buzz]
Chris, Jack, Michelle, Owen, Parna
What is Class[Buzz]
Poll Types
Building on Class[Buzz]
Original Site Demo
Updated Site Demo
UI Updates
AI Summaries
Additional Features
Exporting data to CSV
Added filesystem (+restructuring)
Deleting users
Editing & deleting polls
Tech Stack
Frontend
Backend
Database
Code Base Overview
Life as a Class[Buzz] dev
Michelle’s Experience
hellooo
Jack’s Experience
The 8 weeks: const result = await groupResponses(sortedVotes);
Owen’s Experience
Hopes and Dreams
Conversation Trees
WISE
Watchful Intelligent Science Expert
WISE Team: Achintya, Sathvika, Andrew
Table of contents
01
03
02
04
Introduction
Problem Statement
Demo
Development Process
Research Structure & Findings
Final Product
Road Map
Research
01
Introduction
Imagine…
You find yourself asking…
What’s the best way to handle missing values in this context?
Why did this visualization break?
How do I interpret this pattern I’m seeing?
LLMs?
WISE
Introducing…
Demo Video
AI Assistant
Domain Aware Support
Smarter EDA Suggestions
Real Time & Convenient
Relevant Insights within Jupyter Environment
Task-Specific Guidance
02
Roadmap
Journey
Spring
-Learning Tools/Tech
-Building an Extension in Jupyter Environment
-Full Development & Iterative Design Process
-Features (DataFrame, Plots, Domain Knowledge)
-AI Capabilities
-Designing Research Study
-Conducting Study
-Analyzing Results
-Fine Tuning/Refining Extension
Fall
Winter
03
Extension
Architecture
You (unpaid intern)
plot(), head()
adjQ4_TotSpend_USD_vFinal_FINAL_USETHIS_notNullCleaned_bkp_v3__real.csv??
Frontend event tracker
POST request
Backend
NotebookState.JSON
(Peter) Parser
Lexicon Luther
Regex, tokenization
LLM
Dana Frames
Dr. Graphenstein
Graphenstein’s Monster
Refines
Technical Challenges
Problem
Solution
Relevant AI Suggestions
How to understand user’s notebook and analytical goals?
How to make Analysis relevant and useful?
Notebook Understanding
Final Product
Turns Ambiguity into Action
Empowers Data Scientists of All Levels
Saves Time
04
Research
What is helpful to a novice data scientist working with a dataset from an unfamiliar domain to push past the barrier to domain knowledge?
Designing Our Protocol
-Domain Specific EDA Task Set up by PhD Student
-Task Introduction
-Task Execution
-Post Task Interview
Participants
Procedure
7 PhD Students from different Labs/Domains
7 Novice Data Scientists
(3rd Year DSC Major Students)
Conducting the Study
-Researchers on mute observing
-Novice and Expert interacting during the task
-Separate post-task interviews about experiences
1-1.5 Hour Zoom Session
Some Recordings
Some Example Datasets
Telemetry Dataset
(Data Smith Lab)
Privacy and Security Domain
PI: HaoJian
DNA Dataset
(The Amariuta Lab)
Bioinformatics
PI: Tiffany Amariuta
Post Observation Analysis
Data Analysis
-Screen Recordings capture activity and verbal interaction
-Audio Transcriptions storing dialogue
-Observer Notes (Observations of Key Moments)
-Post-Task Interview Transcripts
-Qualitative Coding
>Interaction Patterns
- Quantitative Analysis
> Number of occurrences
-Iterative Process
Data Collection
Thematic Analysis
Qualitative Data Analysis
Braun, V., & Clarke, V. (2022). Toward good practice in thematic analysis: Avoiding common problems and be(com)ing a knowing researcher. International Journal of Transgender Health, 24(1), 1–6. https://doi.org/10.1080/26895269.2022.2129597
Post Interview Analysis
Evaluation & Next Steps
Rigorous Testing
Assess suggestions across domains and datasets
Finishing Paper
01
Improving Extension
Releasing Extension
02
03
04
Hopefully submitting to the ACM CHI journal
Implementing our final findings from the paper
Publishing extension as a package to PyPI
Thank You!
CREDITS: This presentation template was created by Slidesgo, and includes icons, infographics & images by Freepik
Fall Recruitment
Fall 2025 Timeline
Week 4: Teams start working!
Week 3: Interviews, members chosen
Week 1: Info session
Week 2: Applications due, interviews scheduled
What we ask:
Sam’s tip for standing out: Be as personal and specific as possible!
Thanks for coming!