1 of 57

Statistics Bootcamp Day 5

17 September 2021

2 of 57

Happy Friday!

Think/write/jot to yourself:

Take a moment for a bit of reflection on the past week:

  • What are 1-2 things you know, understand, or can do now that you didn’t know, understand, or could do at the start of this week?
  • What are 1-2 questions you have remaining from what we’ve covered during bootcamp?

Now think forward to the upcoming academic year:

  • What is one personal goal you have for this academic year?
  • What is one grad-school-related goal you have for this academic year?

3 of 57

Rose: A highlight, success, or something positive that happened

Thorn: A challenge you experienced, or something you can use more support with

Bud: New ideas or something you’re looking forward to

4 of 57

Overview of the week

Monday: mindset, descriptive vs inferential statistics, exponents & logarithms

Tuesday: workflow, Stata workshop

Wednesday: probability, graphing in Stata

Thursday: variables, prediction equations

Friday: goal setting, hidden curriculum, matrix algebra basics, reading calculus

5 of 57

Our learning objectives

...articulate both personal and work-related goals for the upcoming academic year/quarter

...uncover some of the hidden curriculum in graduate school and articulate strategies for continuing to do so

...understand what a matrix and a vector are

...and how to multiply matrices with vectors

...be able to represent a prediction equation in matrix notation

…understand and interpret basic calculus relevant to a statistics context (e.g. limits, derivatives, integrals)

6 of 57

Our learning objectives

...articulate both personal and work-related goals for the upcoming academic year/quarter

...uncover some of the hidden curriculum in graduate school and articulate strategies for continuing to do so

...understand what a matrix and a vector are

...and how to multiply matrices with vectors

...be able to represent a prediction equation in matrix notation

…understand and interpret basic calculus relevant to a statistics context (e.g. limits, derivatives, integrals)

These are stretch goals!

7 of 57

2021-2022 Goal-Setting (Personal & Work)

Individually:

  1. Start breaking down your year-long goals into smaller chunks. What are some manageable goals for fall quarter that will help you towards these larger goals? Consider making your goals SMART goals → → → �
  2. Consider: What support might you need to accomplish these goals? (from peers, older grad students, faculty, friends, family, etc.)

5 minutes

8 of 57

2021-2022 Goal-Setting (Personal & Work)

In small groups:

Share some or all of your fall-quarter goals, if you feel comfortable.

Some potential discussion topics include:

  • How/where might you find the support you need to accomplish these goals?
  • What would it look like to break down these quarter-long goals into weekly goals?

10 minutes

9 of 57

What is hidden curriculum?

10 of 57

What is hidden curriculum?

Implicit rules/norms and unspoken expectations → things that you will never be explicitly taught but will be vital to your success in grad school

This is an equity issue: your background, privilege, & cultural capital influence how much of the hidden curriculum is already known to you!

So what’s the solution?

11 of 57

Q&A from yesterday’s exit ticket

  • See tips & tricks document on bootcamp website!

12 of 57

Stata time to catch up on Days 3 & 4

15 minutes

Make sure you are comfortable...

  • using a do file to:
    • Set your working directory
    • Open a log file
    • Open a data file in .dta or .csv format
  • Reading code annotations provided by your TA
  • Storing all relevant files to a working directory that’s backed up

If you can do all these things, you’re in great shape to start SOC 381!

13 of 57

A little bit of math

14 of 57

The dependent variable, or the outcome

The independent variable(s), or the predictor(s)

The intercept/constant

The regression coefficient(s)

Ordinary Least Squares (OLS) Regression

OLS regression uses a prediction equation to predict values of an outcome variable.

15 of 57

Happiness = β0 + β1traffic + ε

16 of 57

Minimizing the sum of squared errors (SSE)

Y = 1025 X + 980

Dollars per tweet = 1025 (Millions of followers) + 980

17 of 57

Happiness = β0 + β1traffic + ε

Goal of OLS: Find values of β0 and β1 that minimize the sum of squared errors.

18 of 57

Happiness = β0 + β1traffic + ε

Goal of OLS: Find values of β0 and β1 that minimize the sum of squared errors.

19 of 57

Happiness = β0 + β1traffic + ε

Goal of OLS: Find values of β0 and β1 that minimize the sum of squared errors.

vector

vector

vector

20 of 57

Happiness = β0 + β1traffic + ε

Goal of OLS: Find values of β0 and β1 that minimize the sum of squared errors.

vector

vector

vector

scalar

scalar

How might we figure out what β0 and β1 should be?

21 of 57

Happiness = β0 + β1traffic + ε

Goal of OLS: Find values of β0 and β1 that minimize the sum of squared errors.

vector

vector

vector

scalar

scalar

scalar * vector → “distribute”

vector + vector → add straight across

22 of 57

Happiness = β0 + β1traffic + ε

23 of 57

Happiness = β0 + β1traffic + ε

vector + vector → add straight across

scalar + vector → add straight across

24 of 57

Happiness = β0 + β1traffic + β2dogs + ε

Goal of OLS: Find values of β0 and β1 and β2 that minimize the sum of squared errors.

25 of 57

Happiness = β0 + β1traffic + β2dogs + ε

Goal of OLS: Find values of β0 and β1 and β2 that minimize the sum of squared errors.

26 of 57

Happiness = β0 + β1traffic + β2dogs + ε

27 of 57

Happiness = β0 + β1traffic + β2dogs + ε

28 of 57

Happiness = β0 + β1traffic + β2dogs + ε

29 of 57

Happiness = β0 + β1traffic + β2dogs + ε

vector

vector

vector

scalar

scalar

scalar

vector

30 of 57

Happiness = β0 + β1traffic + β2dogs + ε

vector

vector

vector

scalar

scalar

scalar

vector

vector

MATRIX

vector

vector

scalar

31 of 57

Matrix-by-vector multiplication

MATRIX

vector

32 of 57

Matrix-by-vector multiplication

MATRIX

vector

3 x 2

2 x 1

These must match!

33 of 57

Matrix-by-vector multiplication

MATRIX

vector

3 x 2

2 x 1

These must match!

3 x 1

vector

34 of 57

Matrix-by-vector multiplication

MATRIX

vector

3 x 2

2 x 1

These must match!

3 x 1

vector

matrix * vector → go across rows in first matrix/vector and down columns in the second matrix/vector, first multiplying then adding

35 of 57

Happiness = β0 + β1traffic + β2dogs + ε

vector

MATRIX

vector

vector

scalar

36 of 57

Happiness = β0 + β1traffic + β2dogs + ε

vector

MATRIX

vector

vector

scalar

37 of 57

General OLS regression equation in matrix form:

The “i” subscript indicates that these are vectors or matrices, not scalars

The beta is bold to indicate it is a vector of multiple coefficients

The X is bold to indicate a matrix

It comes after the X in order for the matrix multiplication to work properly

38 of 57

Practice: Matrices

scalar * vector → “distribute”

vector + vector → add straight across

scalar + vector → add straight across

matrix * vector → go across rows in first matrix/vector and down columns in the second matrix/vector, first multiplying then adding

39 of 57

Goal of OLS: Find βs to minimize the sum of squared errors

There must be a better way to minimize the sum of squared errors than trial and error?

40 of 57

Goal of OLS: Find βs to minimize the sum of squared errors

There must be a better way to minimize the sum of squared errors than trial and error?

YES! CALCULUS!

41 of 57

Minimizing with respect to the sum of squared errors:

Y = β0 + e

We need to find the value of β0 that will give us the smallest sum of squared errors (SSE).

42 of 57

Minimizing with respect to the sum of squared errors:

Y = β0 + e

We need to find the value of β0 that will give us the smallest sum of squared errors (SSE).

All possible values of β0

SSE

43 of 57

Minimizing with respect to the sum of squared errors:

Y = β0 + e

We need to find the value of β0 that will give us the smallest sum of squared errors (SSE).

All possible values of β0

SSE

44 of 57

Minimizing with respect to the sum of squared errors:

Y = β0 + e

We need to find the value of β0 that will give us the smallest sum of squared errors (SSE).

All possible values of β0

SSE

How can we mathematically describe this point?

45 of 57

All possible values of β0

SSE

46 of 57

All possible values of β0

SSE

We minimize the SSE at:

The point where the slope of the [tangent] line is equal to 0

→ where the derivative == 0.

47 of 57

What is a derivative?

Essentially, the rate of change of a function at a given point.

48 of 57

What is a derivative?

Essentially, the rate of change of a function at a given point.

  1. Write the equation of this function.�
  2. Where is the derivative of this function the largest?�
  3. Where is the derivative of this function the smallest?

See handout for questions to answer!

49 of 57

  • Where is the derivative of this function the largest?�
  • Where is the derivative of this function the smallest?�
  • What does this tell us substantively?�

50 of 57

  • Where is the derivative of this function the largest?�
  • Where is the derivative of this function the smallest?�
  • What does this tell us substantively?�
  • What does the total area under the curve represent? Why might we want to know this?

51 of 57

52 of 57

New cases

Total cases

  • Area under the curve sums to the total number of cases
  • Y-values represent the number of new cases on that date
  • The limit of the function as x→∞ is ???
  • Y-values represent the total number of cases through that date

take the integral

take the derivative

53 of 57

Key Calculus Terms:

  • Derivative: The rate of change of a function�
  • Limit: The value a function approaches�
  • Integral: The area under a function

54 of 57

  1. Talk about the limits of this function. What do they tell us?�
  2. Talk about the integrals of this function (assume that the total area under the curve is equal to 1 by definition). What do they tell us?

55 of 57

  • Talk about the limits of this function. What do they tell us?�
  • Talk about the derivatives of this function. What do they tell us?

56 of 57

Probability density function (pdf)

Cumulative density function (cdf)

  • Area under the curve sums to 1
  • Y-values represent the probability of getting that exact x-value
  • The limit of the function as x→∞ is 1
  • Y-values represent the probability of getting that x-value or lower.

take the integral of the pdf

take the derivative of the cdf

57 of 57

Congrats on finishing bootcamp!

Fill out the post-bootcamp survey!

bit.ly/Bootcamp2021Feedback

& now time for ~happy hour~