Statistics Bootcamp Day 5
17 September 2021
Happy Friday!
Think/write/jot to yourself:
Take a moment for a bit of reflection on the past week:
Now think forward to the upcoming academic year:
Rose: A highlight, success, or something positive that happened
Thorn: A challenge you experienced, or something you can use more support with
Bud: New ideas or something you’re looking forward to
Overview of the week
Monday: mindset, descriptive vs inferential statistics, exponents & logarithms
Tuesday: workflow, Stata workshop
Wednesday: probability, graphing in Stata
Thursday: variables, prediction equations
Friday: goal setting, hidden curriculum, matrix algebra basics, reading calculus
Our learning objectives
...articulate both personal and work-related goals for the upcoming academic year/quarter
...uncover some of the hidden curriculum in graduate school and articulate strategies for continuing to do so
...understand what a matrix and a vector are
...and how to multiply matrices with vectors
...be able to represent a prediction equation in matrix notation
…understand and interpret basic calculus relevant to a statistics context (e.g. limits, derivatives, integrals)
Our learning objectives
...articulate both personal and work-related goals for the upcoming academic year/quarter
...uncover some of the hidden curriculum in graduate school and articulate strategies for continuing to do so
...understand what a matrix and a vector are
...and how to multiply matrices with vectors
...be able to represent a prediction equation in matrix notation
…understand and interpret basic calculus relevant to a statistics context (e.g. limits, derivatives, integrals)
These are stretch goals!
2021-2022 Goal-Setting (Personal & Work)
Individually:
5 minutes
2021-2022 Goal-Setting (Personal & Work)
In small groups:
Share some or all of your fall-quarter goals, if you feel comfortable.
Some potential discussion topics include:
10 minutes
What is hidden curriculum?
What is hidden curriculum?
Implicit rules/norms and unspoken expectations → things that you will never be explicitly taught but will be vital to your success in grad school
This is an equity issue: your background, privilege, & cultural capital influence how much of the hidden curriculum is already known to you!
So what’s the solution?
Q&A from yesterday’s exit ticket
Stata time to catch up on Days 3 & 4
15 minutes
Make sure you are comfortable...
If you can do all these things, you’re in great shape to start SOC 381!
A little bit of math
The dependent variable, or the outcome
The independent variable(s), or the predictor(s)
The intercept/constant
The regression coefficient(s)
Ordinary Least Squares (OLS) Regression
OLS regression uses a prediction equation to predict values of an outcome variable.
Happiness = β0 + β1traffic + ε
Minimizing the sum of squared errors (SSE)
Y = 1025 X + 980
Dollars per tweet = 1025 (Millions of followers) + 980
Happiness = β0 + β1traffic + ε
Goal of OLS: Find values of β0 and β1 that minimize the sum of squared errors.
Happiness = β0 + β1traffic + ε
Goal of OLS: Find values of β0 and β1 that minimize the sum of squared errors.
Happiness = β0 + β1traffic + ε
Goal of OLS: Find values of β0 and β1 that minimize the sum of squared errors.
vector
vector
vector
Happiness = β0 + β1traffic + ε
Goal of OLS: Find values of β0 and β1 that minimize the sum of squared errors.
vector
vector
vector
scalar
scalar
How might we figure out what β0 and β1 should be?
Happiness = β0 + β1traffic + ε
Goal of OLS: Find values of β0 and β1 that minimize the sum of squared errors.
vector
vector
vector
scalar
scalar
scalar * vector → “distribute”
vector + vector → add straight across
Happiness = β0 + β1traffic + ε
Happiness = β0 + β1traffic + ε
vector + vector → add straight across
scalar + vector → add straight across
Happiness = β0 + β1traffic + β2dogs + ε
Goal of OLS: Find values of β0 and β1 and β2 that minimize the sum of squared errors.
Happiness = β0 + β1traffic + β2dogs + ε
Goal of OLS: Find values of β0 and β1 and β2 that minimize the sum of squared errors.
Happiness = β0 + β1traffic + β2dogs + ε
Happiness = β0 + β1traffic + β2dogs + ε
Happiness = β0 + β1traffic + β2dogs + ε
Happiness = β0 + β1traffic + β2dogs + ε
vector
vector
vector
scalar
scalar
scalar
vector
Happiness = β0 + β1traffic + β2dogs + ε
vector
vector
vector
scalar
scalar
scalar
vector
vector
MATRIX
vector
vector
scalar
Matrix-by-vector multiplication
MATRIX
vector
Matrix-by-vector multiplication
MATRIX
vector
3 x 2
2 x 1
These must match!
Matrix-by-vector multiplication
MATRIX
vector
3 x 2
2 x 1
These must match!
3 x 1
vector
Matrix-by-vector multiplication
MATRIX
vector
3 x 2
2 x 1
These must match!
3 x 1
vector
matrix * vector → go across rows in first matrix/vector and down columns in the second matrix/vector, first multiplying then adding
Happiness = β0 + β1traffic + β2dogs + ε
vector
MATRIX
vector
vector
scalar
Happiness = β0 + β1traffic + β2dogs + ε
vector
MATRIX
vector
vector
scalar
General OLS regression equation in matrix form:
The “i” subscript indicates that these are vectors or matrices, not scalars
The beta is bold to indicate it is a vector of multiple coefficients
The X is bold to indicate a matrix
It comes after the X in order for the matrix multiplication to work properly
Practice: Matrices
scalar * vector → “distribute”
vector + vector → add straight across
scalar + vector → add straight across
matrix * vector → go across rows in first matrix/vector and down columns in the second matrix/vector, first multiplying then adding
Goal of OLS: Find βs to minimize the sum of squared errors
There must be a better way to minimize the sum of squared errors than trial and error?
Goal of OLS: Find βs to minimize the sum of squared errors
There must be a better way to minimize the sum of squared errors than trial and error?
YES! CALCULUS!
Minimizing with respect to the sum of squared errors:
Y = β0 + e
We need to find the value of β0 that will give us the smallest sum of squared errors (SSE).
Minimizing with respect to the sum of squared errors:
Y = β0 + e
We need to find the value of β0 that will give us the smallest sum of squared errors (SSE).
All possible values of β0
SSE
Minimizing with respect to the sum of squared errors:
Y = β0 + e
We need to find the value of β0 that will give us the smallest sum of squared errors (SSE).
All possible values of β0
SSE
Minimizing with respect to the sum of squared errors:
Y = β0 + e
We need to find the value of β0 that will give us the smallest sum of squared errors (SSE).
All possible values of β0
SSE
How can we mathematically describe this point?
All possible values of β0
SSE
All possible values of β0
SSE
We minimize the SSE at:
The point where the slope of the [tangent] line is equal to 0
→ where the derivative == 0.
What is a derivative?
Essentially, the rate of change of a function at a given point.
What is a derivative?
Essentially, the rate of change of a function at a given point.
See handout for questions to answer!
New cases
Total cases
take the integral
take the derivative
Key Calculus Terms:
Probability density function (pdf)
Cumulative density function (cdf)
take the integral of the pdf
take the derivative of the cdf
Congrats on finishing bootcamp!
Fill out the post-bootcamp survey!
bit.ly/Bootcamp2021Feedback
& now time for ~happy hour~