1 of 33

Basic Statistical Tools

2 of 33

Agenda

  • Simple Data Tools
  • Dangers in Simplification
  • Examples

2

3 of 33

Simple Data Tools

How Can We Quantify What We See?

An Overview

3

1

4 of 33

Challenges With Statistics

  • What are the challenges in statistics?
  • Discuss with partners

4

5 of 33

Challenges With Statistics

  • What are the challenges in statistics?
  • Discuss with partners
    • Data can be “dirty” - errors, bias, measurement
    • There are often unmeasurable influences on measurable variables
    • The measured variables aren’t necessarily what you want to measure
    • The mathematics is relatively easy - the interpretation is always hard

5

6 of 33

Statistics Process

  • Collect data
  • Organize it
  • Analyze it
  • Represent it
  • Interpret and then reflect on conclusions

6

7 of 33

Example Data

  • Data for student marks on a quiz
  • Located here (called Unit 1 Day 2 Class Data on website)
  • Follow the process
    • Collect (done), Organize, Analyze, Represent, Interpret and Reflect
    • For the Analyze step, find the centre
    • How spread out is the centre?

7

8 of 33

Example Data

  • Data for student marks on a quiz
  • Centre:
    • Mean - 5.25
    • Median - 5.5
    • Mode - 7
  • Spread of data:
    • Variance - 6.39
    • Standard deviation - 2.53

8

9 of 33

Finding the Centre: Mean, Median, Mode

  • What does it mean to be average?

9

10 of 33

Finding the Centre: Mean, Median, Mode

  • What does it mean to be average?
  • Approximately the “middle” of the data
    • Mathematically, this is often called the centre
  • Some data has a centre that can be interpreted
  • Other data has a meaningless centre
    • Bi-modal, skewed, multi-modal, exponential distribution

10

11 of 33

Finding the Centre: Mean, Median, Mode

  • Mode
    • Most common element
  • Median
    • Exactly half elements less and half more than the value
  • Mean (average)
    • Sum of elements divided by number of elements

11

12 of 33

Width of the Centre: Variance, Standard Deviation

  • Width of the centre
  • How the data spreads out from the centre
  • Only makes sense with normal distributions
    • Also called Gaussian distributions
  • Variance

  • Standard deviation - square root of variance

12

13 of 33

Width of the Centre: Variance, Standard Deviation

  • Standard deviation - square root of variance
  • “Most” of the data is clustered around the mean
  • Distribution has long tails

13

14 of 33

Fitting Curves to Data

  • Approximating the data as a single curve
    • For single variable data
  • There are many ways to fit curves to data
  • A popular method is called Least Squares
    • Minimizes (using Calculus) the sum of the squares of the difference between the curve and the data points
  • Each method has trade offs - there is no single “best” method
  • More later in the course!

14

15 of 33

Dangers in Simplification

What Can Go Wrong With Losing Information?

15

2

16 of 33

Golden Rules of Statistics

  • Correlation is not causation
  • Simplification removes information, decreases subtlety and nuance
    • Corollary: second order effects can reverse conclusions
  • Averages of averages are meaningless
  • Data can be “dirty”
    • Bias, error, omissions

16

17 of 33

Example: Correlation is not Causation, Ex 1

17

18 of 33

Example: Correlation is not Causation, Ex 2

18

19 of 33

Example: Simplification Removes Information

  • What is going on here?

19

20 of 33

Qualitative and Quantitative Data

  • Qualitative:
    • Categories of data
    • More descriptive
    • What is your favourite pen colour?
    • How do you travel to school?
  • Quantitative:
    • Information that can be counted or measured
    • How many pens do you own?
    • How long did it take to get to school today?

20

21 of 33

Examples

Putting It Into Practice

21

3

22 of 33

Example 1

  • Describe the skew for the data shown:

22

23 of 33

Example 1

  • Describe the skew for the data shown:

  • Right Skew, Normal, Left Skew
  • Only the normal distribution has a sensible measure of central tendency (averages)

23

24 of 33

Example 2

  • A group of students was surveyed to find out how many hats each of them owned
  • Graph and comment on the data and central tendency

24

Hats

Freq

0

3

1

7

2

3

3

3

4

2

25 of 33

Example 2

  • A group of students was surveyed to find out how many hats each of them owned
  • Graph and comment on the data and central tendency
  • 18 students surveyed
  • Mode: 1
  • Median: 1
  • Mean: 1.67
  • Std Dev: 1.28
  • Hat Data

25

Hats

Freq

0

3

1

7

2

3

3

3

4

2

26 of 33

Example 3

  • All of the IB students in the school were asked how many minutes a day they spent on mathematics outside of class
  • Is the data continuous or discrete?
  • Is it qualitative or quantitative?
  • Is it normal or skewed?
  • Visualize and then infer meaning
  • How to calculate mean, median, and mode in this case?

26

Time (min)

# Students

0 - 15

21

15 - 30

32

30 - 45

35

45 - 60

41

60 - 75

27

75 - 90

11

27 of 33

Example 3

  • Discrete data
    • Quantitative?
    • More or less normal data
    • Slight skew left
  • How to calculate mean, median, and mode in this case?
    • Assume value in the centre of each bin, then use normal calculations
    • In this case, the values are 7.5, 22.5, 37.5, etc.

27

Time (min)

# Students

0 - 15

21

15 - 30

32

30 - 45

35

45 - 60

41

60 - 75

27

75 - 90

11

28 of 33

Example 4

  • Our Measurements:
  • Most students in our class are fully grown (height)
    • Let’s find out!
  • Measure height and analyze

28

29 of 33

Example 5

  • How many times do you smile each day?
    • Write down a guess
  • How many times does an average adult smile each day?
    • Write down a guess
  • How many times does an average child smile each day?
    • Write down a guess

29

30 of 33

Example 5

  • How many times do you smile each day?
    • Write down a guess
  • How many times does an average adult smile each day?
    • Write down a guess
  • How many times does an average child smile each day?
    • Write down a guess
  • Guesses from class

30

31 of 33

Example 5

  • How many smiles would you expect from this class in one day?
  • What is the “middle”, and spread of data?
  • Add a child to the class
    • How does this change anything?

31

32 of 33

Example 5

  • How many smiles would you expect from this class in one day?
  • What is the “middle”, and spread of data?
  • Add a child to the class
    • How does this change anything?

Potentially dubious reference on smiling

Happy adult - 40-50, “Normal” adult - 20, Child - 400

32

33 of 33

CREDITS

Special thanks to all the people who made and released these awesome resources for free:

  • Presentation template by SlidesCarnival

33