1 of 14

CSE 163

Statistical Testing�

Suh Young Choi��🎶 Listening to: Hades Soundtrack�💬 Before Class: Do you have a favorite class at UW (besides CSE 163, of course)?

STAT

2 of 14

Announcements

Checkpoint 3 due TONIGHT!

Resubmission Cycle closes tomorrow (2/10)

    • Last opportunity to submit Pokemon!

THA 4 due Thursday (2/12)

Project EDA / Portfolio Milestone out now; due next Thursday (2/19)

2

3 of 14

Last Time

  • Objects
  • Classes
  • Inheritance

This Time

  • Statistics
  • Probability
  • Research & Science

3

4 of 14

What is Statistics?

4

“The study of models to gather, understand, and draw conclusions from real-world data.”

  • Hunter Schafer, 2020 (and probably other people too)

Some uses:

  • Predicting disease outbreaks
  • Product testing
  • Sports analytics
  • Machine learning

5 of 14

Statistics 101

5

Summary Statistics

  • Number of values in our dataset
  • Mean
    • The “average” value
  • Median
    • The value in the middle of our dataset
  • Standard deviation
    • How spread out our values are
  • Min, max, mode, range, etc.

Distributions

  • How our values are distributed
    • Uniform
    • Normal
    • Among others!

6 of 14

Uniform Dist.

Everything equally likely

Examples:

  • Probability of fair coin flip: Heads/Tails
  • Probability of fair dice roll: All sides
  • Perfect random number generator

7 of 14

Normal Dist.

Bell shaped, centered around mean

Examples:

  • Height
  • IQ
  • Grades

8 of 14

Central Limit Theorem says…

Distribution of sample means approaches a normal distribution as the number of samples grows!

9 of 14

Hypothesis Tests

Null Hypothesis is the hypothesis that supports the pre-existing expectations or probability

Alternative Hypothesis is the hypothesis that something is different, one that if true, would allow us to reject the null hypothesis

Our p-value is the chance that this situation would happen in a world bound by the null hypothesis

  • If our p-value is lower than a given significance level (usually 0.05), it is significant enough to reject the null hypothesis

10 of 14

Type 1 and Type 2 Errors�������

** “Accept” is used here to mean “fail to reject”

11 of 14

Type 1 and Type 2 Errors

You visited Hall Health Center to check if you have the flu. Your doctor says you don’t have it, while you actually have the flu. What type of error did the doctor make?

Null Hypothesis:

You don’t have the flu

Answer on Slido!

# 3550867

12 of 14

Choosing the Right Test

Factors to consider

  • Sample size
  • Type of Variable
  • Number of Variables being compared
  • Whether you know or assume the underlying distribution

Types of Tests

  • Parametric or nonparametric?
  • Sample size greater than or less than 30?
  • One-tailed, two-tailed, paired?

12

13 of 14

Data Science is Science!

Some problems that occur in science

  • p-hacking
  • Multiple hypothesis testing without correction
  • HARKing
  • Lack of internal/external validity

Why does this matter?

  • Reliability
  • Reproducibility
  • Interpretability
  • Validity

13

14 of 14

Next Time

  • Machine Learning

Before Next Time

  • Complete Lesson 14 in Canvas for credit towards a Weekly Token
  • Turn in Checkpoint 3
  • Keep working on THA 4 ☺

14