Pre-Announcement
databears.org/join
DS100: Fall 2018
Lecture 29 (Josh Hug / Fernando Perez): Conclusion
A Brief Look at What We’ve Done in this Class
Data Science Lifecycle
Formulate Question or Problem
Acquire and Clean Data
Exploratory Data Analysis
Draw Conclusions from Predictions and Inference
Reports, Decisions, and Solutions
Quick Fun Demo
Let’s look at that arbitrary DS100 dataset we had you create at the beginning of the semester.
Useful Libraries and Programming Tools
Core libraries and tools we used:
Less practice, but also:
Key Concepts
Sampling: Simple random / cluster / stratified samples.
Probability:
The regression problem:
Key Concepts
Loss function:
Specific loss functions:
Key Concepts
Linear regression models:
Surprisingly, the data to the right is linear!
Training vs. test set:
Key Concepts
Gradient descent: Descend the gradient of the average loss over all data.
Convexity:
�
Key Concepts
Bias/Variance tradeoff:
Regularization and cross validation:
Key Concepts
The classification problem:
Logistic regression:
Example where we have 2 features and 2 classes.
Key Concepts
Evaluating classifiers:
Key Concepts
Bootstrap: Lets us estimate our confidence in a population statistic using only one sample.
Pseudorandom number generator: Generates a sequence of "random-looking" numbers from a random seed.
Hypothesis testing:
A few (important!) odds and ends
Numerical issues
Condition number:
Higher dimensions:
Computational Topics That Concluded The Course
Labs
HWs and Projects
Data science lifecycle projects:
Grad project: Computer vision / image classification.
Course Reflections
Workflow Changes
Workflow changes this semester:
Curriculum Changes
Curriculum changes this semester:
Things We’d Like To Do Next Time
Things We’d Like to Hear About
HKN Survey coming soon. Here are some things we’d like to hear from you:
HKN Survey [~7:10/7:20 PM]
Ask Us Anything
Attendance:
yellkey.com/plan
What’s Next
Beyond Data 100
Things we didn’t focus on in Data 100:
Beyond Data 100
Things we didn’t focus on in Data 100:
Machine Learning
Recommended Courses
Machine learning courses:
Databases:
Intro to Deep Learning course - STAT 157, Spring 2019
Stat 157 Topics in Probability & Statistics: "Introduction to Deep Learning" (3 units)
TTh 3:30-5:00pm
The topic for this semester is Introduction to Deep Learning. This class provides a practical introduction to deep learning, including theoretical motivations and how to implement it in practice. As part of the course we will cover multilayer perceptrons, backpropagation, automatic differentiation, and stochastic gradient descent. Moreover, we introduce convolutional networks for image processing, starting from the simple LeNet to more recent architectures such as ResNet for highly accurate models. Secondly, we discuss sequence models and recurrent networks, such as LSTMs, GRU, and the attention mechanism. Throughout the course we emphasize efficient implementation, optimization and scalability, e.g. to multiple GPUs and to multiple machines. The goal of the course is to provide both a good understanding and good ability to build modern nonparametric estimators. The entire course is based on Jupyter notebooks to allow students to gain experience quickly. Supporting material can be found at www.diveintodeeplearning.org.
Instructors:
See Piazza (https://piazza.com/class/jkopvsyuy7g3u0?cid=1476) for more.
INFO 154: Data Mining and Analytics
More mathematical: CS 189, STAT 154
Recommended Courses
Probability/statistics foundations courses:
GSI recommended courses:
A few great (FREE) resources
A few more (from our Stanford colleagues):
Online Communities
GSIs recommended:
Applying Your Knowledge
There are countless untold stories lurking in publicly available datasets.
Source [Link]
Applying Your Knowledge
There are countless untold stories lurking in publicly available datasets.
Source [Link]
Applying Your Knowledge
There are countless untold stories lurking in publicly available datasets.
Let’s look at an example from a Democracy Working Group that I’m part of.
Applying Your Knowledge
Even though you are new to data science, you are also among the most skilled people in the world at data science.
Use your power wisely.
Student opportunities for spring 2019 (remember Tuesday?)
Don't forget the Data 001 Piazza!
https://piazza.com/class/j7s01y165odq5
Helping Out with Data100
Data 100 needs you!
Thanks to our incredible (u)GSI team!!!
This course is literally impossible without a team like this..
Please thank them, be kind to them, and we hope some of you will be in this same slide next semester!
Aakash
Allen
Aman
Ananth
Andrew
Caleb
Daniel
Ed
Junseo
Manana
Mian
Neil
Patrick
Sasank
Scott
Sona
Simon
Sumukh
Suraj
Tiffany
Tony
William
END