Stat Comp and Intro to Data Science
Wayne Tai Lee
Agenda
Wayne Tai Lee
Stat Computing and Intro to Data Science
Universal definition: DataScience(data) = $
Photo by Markus Winkler on Unsplash
Data and services for data are now primary assets
Others have “data” in their title
Some notable trends in data science
Data Scientists often are only trained about modeling
Model building
You definitely need to know analytics as well
Model building
Analyze behavior
If you’re lucky, you get to define the data
Data Protocol
Model building
Analyze behavior
How will people consume your output?
Data Protocol
Model building
Productization
Analyze behavior
Pre-processing is more important than you think
Data Protocol
Pre-processing
Model building
Productization
Analyze behavior
Support
Data Scientists have the end-to-end view from a quantitatively rigorous perspective
Data Protocol
Pre-processing
Model building
Productization
Analyze behavior
Support
If you were concerned about a career in data science...
Why learn computing? Efficiency gains
Why learn computing? Verify statistical theorems
Why learn computing? Allows diverse approaches
Permutation test instead of 2 sample t-test: no longer as dependent on Normal distribution
Why learn computing? Reproducible + readable
Excel
Why learn computing? Reproducible + readable
Excel
Coding
Why Python? Popularity = Support
Source (methodology not verified)
Expectations for 4000+ level courses
How to take this class? First half focuses on data science
Data science
How to take this class? Second half focuses on coding
Data science
Coding
You should study the tutorials at home
Introduction
Case study
Week 1
Review +
Coding
Case study continued
Week 2
Study tutorials at home!
….
Review +
Coding
Case study continued
Week 3
Study tutorials at home!
Course logistics
See syllabus on Canvas (slight difference across sections)
Ed Logistics - HW0 Demo
How to ask questions online
Meaningful title for others to find
How to ask questions online
What are you trying to do
How to ask questions online
How are you doing it
How to ask questions online
Test it out with small data
How to ask questions online
What you expect vs what you’re seeing
How to ask questions online
Be nice!
How to ask questions online
If you’re using AI tools, here’s a prompt:
””” You are a college instructor helping students with an assignment. Your job is to help clarify and guide my thinking by asking questions back without giving me the answers to the problem. Here are 2 examples: Question: create a simulation that demonstrates the sample average is unbiased for estimating the population mean. Your answer: What does unbiased mean? Would you expect a single sample average to be exactly the same as the population mean?
Question: how should we evaluate a model? Answer: What is the purpose of the model? How would you know if the model was bad? What is the model being compared to? “””
ChatGPT
Google Colab Demo
Please use your LionMail account!
colab.research.google.com/
How to ask questions online - NO screenshots of code
Screenshots are reasonable for:
Random tangent - how to differentiate yourself
First question in data science
Lessons from installation - why conda?
Where is my computer? - notebooks vs Python program
Browser
Colab notebook (similar to Jupyter Notebooks)
Google Colab Servers
Python
Common mistakes
Lessons from installation - when working with Ed
Browser
Jupyter notebook
Some computer managed by Ed
Python
How I use Python