1 of 57

Statistics Bootcamp Day 5

17 September 2021

2 of 57

Happy Friday!

Think/write/jot to yourself:

Take a moment for a bit of reflection on the past week:

What are 1-2 things you know, understand, or can do now that you didn’t know, understand, or could do at the start of this week?
What are 1-2 questions you have remaining from what we’ve covered during bootcamp?

Now think forward to the upcoming academic year:

What is one personal goal you have for this academic year?
What is one grad-school-related goal you have for this academic year?

3 of 57

Rose: A highlight, success, or something positive that happened

Thorn: A challenge you experienced, or something you can use more support with

Bud: New ideas or something you’re looking forward to

4 of 57

Overview of the week

Monday: mindset, descriptive vs inferential statistics, exponents & logarithms

Tuesday: workflow, Stata workshop

Wednesday: probability, graphing in Stata

Thursday: variables, prediction equations

Friday: goal setting, hidden curriculum, matrix algebra basics, reading calculus

5 of 57

Our learning objectives

...articulate both personal and work-related goals for the upcoming academic year/quarter

...uncover some of the hidden curriculum in graduate school and articulate strategies for continuing to do so

...understand what a matrix and a vector are

...and how to multiply matrices with vectors

...be able to represent a prediction equation in matrix notation

…understand and interpret basic calculus relevant to a statistics context (e.g. limits, derivatives, integrals)

6 of 57

Our learning objectives

...articulate both personal and work-related goals for the upcoming academic year/quarter

...uncover some of the hidden curriculum in graduate school and articulate strategies for continuing to do so

...understand what a matrix and a vector are

...and how to multiply matrices with vectors

...be able to represent a prediction equation in matrix notation

…understand and interpret basic calculus relevant to a statistics context (e.g. limits, derivatives, integrals)

These are stretch goals!

7 of 57

2021-2022 Goal-Setting (Personal & Work)

Individually:

Start breaking down your year-long goals into smaller chunks. What are some manageable goals for fall quarter that will help you towards these larger goals? �Consider making your goals SMART goals → → → �
Consider: What support might you need to accomplish these goals? (from peers, older grad students, faculty, friends, family, etc.)

5 minutes

8 of 57

2021-2022 Goal-Setting (Personal & Work)

In small groups:

Share some or all of your fall-quarter goals, if you feel comfortable.

Some potential discussion topics include:

How/where might you find the support you need to accomplish these goals?
What would it look like to break down these quarter-long goals into weekly goals?

10 minutes

9 of 57

What is hidden curriculum?

10 of 57

What is hidden curriculum?

Implicit rules/norms and unspoken expectations → things that you will never be explicitly taught but will be vital to your success in grad school

This is an equity issue: your background, privilege, & cultural capital influence how much of the hidden curriculum is already known to you!

So what’s the solution?

11 of 57

Q&A from yesterday’s exit ticket

See tips & tricks document on bootcamp website!

12 of 57

Stata time to catch up on Days 3 & 4

15 minutes

Make sure you are comfortable...

using a do file to:

Set your working directory
Open a log file
Open a data file in .dta or .csv format

Reading code annotations provided by your TA
Storing all relevant files to a working directory that’s backed up

If you can do all these things, you’re in great shape to start SOC 381!

13 of 57

A little bit of math

14 of 57

The dependent variable, or the outcome

The independent variable(s), or the predictor(s)

The intercept/constant

The regression coefficient(s)

Ordinary Least Squares (OLS) Regression

OLS regression uses a prediction equation to predict values of an outcome variable.

15 of 57

Happiness = β₀+ β₁traffic + ε

16 of 57

Minimizing the sum of squared errors (SSE)

Y = 1025 X + 980

Dollars per tweet = 1025 (Millions of followers) + 980

17 of 57

Happiness = β₀+ β₁traffic + ε

Goal of OLS: Find values of β₀and β₁ that minimize the sum of squared errors.

18 of 57

Happiness = β₀+ β₁traffic + ε

Goal of OLS: Find values of β₀and β₁ that minimize the sum of squared errors.

19 of 57

Happiness = β₀+ β₁traffic + ε

Goal of OLS: Find values of β₀and β₁ that minimize the sum of squared errors.

vector

20 of 57

Happiness = β₀+ β₁traffic + ε

Goal of OLS: Find values of β₀and β₁ that minimize the sum of squared errors.

vector

scalar

How might we figure out what β₀and β₁ should be?

21 of 57

Happiness = β₀+ β₁traffic + ε

Goal of OLS: Find values of β₀and β₁ that minimize the sum of squared errors.

vector

scalar

scalar * vector → “distribute”

vector + vector → add straight across

22 of 57

Happiness = β₀+ β₁traffic + ε

23 of 57

Happiness = β₀+ β₁traffic + ε

vector + vector → add straight across

scalar + vector → add straight across

24 of 57

Happiness = β₀+ β₁traffic + β₂dogs + ε

Goal of OLS: Find values of β₀and β₁ and β₂that minimize the sum of squared errors.

25 of 57

Happiness = β₀+ β₁traffic + β₂dogs + ε

Goal of OLS: Find values of β₀and β₁ and β₂that minimize the sum of squared errors.

26 of 57

Happiness = β₀+ β₁traffic + β₂dogs + ε

27 of 57

Happiness = β₀+ β₁traffic + β₂dogs + ε

28 of 57

Happiness = β₀+ β₁traffic + β₂dogs + ε

29 of 57

Happiness = β₀+ β₁traffic + β₂dogs + ε

vector

scalar

vector

30 of 57

Happiness = β₀+ β₁traffic + β₂dogs + ε

vector

scalar

vector

MATRIX

vector

scalar

31 of 57

Matrix-by-vector multiplication

MATRIX

vector

32 of 57

Matrix-by-vector multiplication

MATRIX

vector

3 x 2

2 x 1

These must match!

33 of 57

Matrix-by-vector multiplication

MATRIX

vector

3 x 2

2 x 1

These must match!

3 x 1

vector

34 of 57

Matrix-by-vector multiplication

MATRIX

vector

3 x 2

2 x 1

These must match!

3 x 1

vector

matrix * vector → go across rows in first matrix/vector and down columns in the second matrix/vector, first multiplying then adding

35 of 57

Happiness = β₀+ β₁traffic + β₂dogs + ε

vector

MATRIX

vector

scalar

36 of 57

Happiness = β₀+ β₁traffic + β₂dogs + ε

vector

MATRIX

vector

scalar

37 of 57

General OLS regression equation in matrix form:

The “i” subscript indicates that these are vectors or matrices, not scalars

The beta is bold to indicate it is a vector of multiple coefficients

The X is bold to indicate a matrix

It comes after the X in order for the matrix multiplication to work properly

38 of 57

Practice: Matrices

scalar * vector → “distribute”

vector + vector → add straight across

scalar + vector → add straight across

matrix * vector → go across rows in first matrix/vector and down columns in the second matrix/vector, first multiplying then adding

39 of 57

Goal of OLS: Find βs to minimize the sum of squared errors

There must be a better way to minimize the sum of squared errors than trial and error?

40 of 57

Goal of OLS: Find βs to minimize the sum of squared errors

There must be a better way to minimize the sum of squared errors than trial and error?

YES! CALCULUS!

41 of 57

Minimizing with respect to the sum of squared errors:

Y = β₀ + e

We need to find the value of β₀ that will give us the smallest sum of squared errors (SSE).

42 of 57

Minimizing with respect to the sum of squared errors:

Y = β₀ + e

We need to find the value of β₀ that will give us the smallest sum of squared errors (SSE).

All possible values of β₀

SSE

43 of 57

Minimizing with respect to the sum of squared errors:

Y = β₀ + e

We need to find the value of β₀ that will give us the smallest sum of squared errors (SSE).

All possible values of β₀

SSE

44 of 57

Minimizing with respect to the sum of squared errors:

Y = β₀ + e

We need to find the value of β₀ that will give us the smallest sum of squared errors (SSE).

All possible values of β₀

SSE

How can we mathematically describe this point?

45 of 57

All possible values of β₀

SSE

46 of 57

All possible values of β₀

SSE

We minimize the SSE at:

The point where the slope of the [tangent] line is equal to 0

→ where the derivative == 0.

47 of 57

What is a derivative?

Essentially, the rate of change of a function at a given point.

48 of 57

What is a derivative?

Essentially, the rate of change of a function at a given point.

Write the equation of this function.�
Where is the derivative of this function the largest?�
Where is the derivative of this function the smallest?

See handout for questions to answer!

49 of 57

Where is the derivative of this function the largest?�
Where is the derivative of this function the smallest?�
What does this tell us substantively?�

50 of 57

Where is the derivative of this function the largest?�
Where is the derivative of this function the smallest?�
What does this tell us substantively?�
What does the total area under the curve represent? Why might we want to know this?�

51 of 57

52 of 57

New cases

Total cases

Area under the curve sums to the total number of cases
Y-values represent the number of new cases on that date

The limit of the function as x→∞ is ???
Y-values represent the total number of cases through that date

take the integral

take the derivative

53 of 57

Key Calculus Terms:

Derivative: The rate of change of a function�
Limit: The value a function approaches�
Integral: The area under a function

54 of 57

Talk about the limits of this function. What do they tell us?�
Talk about the integrals of this function (assume that the total area under the curve is equal to 1 by definition). What do they tell us?

55 of 57

Talk about the limits of this function. What do they tell us?�
Talk about the derivatives of this function. What do they tell us?

56 of 57

Probability density function (pdf)

Cumulative density function (cdf)

Area under the curve sums to 1
Y-values represent the probability of getting that exact x-value

The limit of the function as x→∞ is 1
Y-values represent the probability of getting that x-value or lower.

take the integral of the pdf

take the derivative of the cdf

57 of 57

Congrats on finishing bootcamp!

Fill out the post-bootcamp survey!

bit.ly/Bootcamp2021Feedback

& now time for ~happy hour~