1 of 57

Basic Statistical Tools

2 of 57

Agenda

  • Simple Data Tools
  • Dangers in Simplification
  • Examples

2

3 of 57

Simple Data Tools

How Can We Quantify What We See?

An Overview

3

1

4 of 57

Challenges With Statistics

  • What are the challenges in statistics?
  • Discuss with partners

4

5 of 57

Challenges With Statistics

  • What are the challenges in statistics?
  • Discuss with partners
    • Data can be “dirty” - errors, bias, measurement
    • There are often unmeasurable influences on measurable variables
    • The measured variables aren’t necessarily what you want to measure
    • The mathematics is relatively easy - the interpretation is always hard

5

6 of 57

Statistics Process

  • Collect data
  • Organize it
  • Analyze it
  • Represent it
  • Interpret and then reflect on conclusions

6

7 of 57

Example Data

  • Data for student marks on a quiz
  • Located here
  • Follow the process
    • Collect (done), Organize, Analyze, Represent, Interpret and Reflect
    • For the Analyze step, find the centre
    • How spread out is the centre?

7

8 of 57

Example Data

  • Data for student marks on a quiz
  • Centre:
    • Mean - 5.25
    • Median - 5.5
    • Mode - 7
  • Spread of data:
    • Variance - 6.39
    • Standard deviation - 2.53

8

9 of 57

Finding the Centre: Mean, Median, Mode

  • What does it mean to be average?

9

10 of 57

Finding the Centre: Mean, Median, Mode

  • What does it mean to be average?
  • Approximately the “middle” of the data
    • Mathematically, this is often called the centre
  • Some data has a centre that can be interpreted
  • Other data has a meaningless centre
    • Bi-modal, skewed, multi-modal, exponential distribution

10

11 of 57

Finding the Centre: Mean, Median, Mode

  • Mode
    • Most common element
  • Median
    • Exactly half elements less and half more than the value
  • Mean (average)
    • Sum of elements divided by number of elements

11

12 of 57

Mean

  • Example 1) Find the mean grade (%): 65, 62, 75, 80, 71, 15

12

13 of 57

Weighted Mean

  • Example 2: This table shows the ages of students attending a university application workshop. What is the mean age?

13

Age

15

16

17

18

19

Number of people

9

14

13

6

2

14 of 57

Weighted Mean

Example 3: A teacher assigns the following weights:

A student achieves marks of 70%, 84% and 82% in the first 3 categories respectively. What is the student’s mark going into the exam?

What mark do they need on the final exam to earn a 78% in the course?

14

15 of 57

Grouped Data

The following table shows recent exam results and how many students scored in each interval. Find the mean grade.

15

16 of 57

Median

Median the middle data point when the data is listed in numerical order. If there is an even number of elements, use the mean of the middle two points.

Example 5: Find the median grade of the dataset used earlier:

65, 62, 75, 80, 71, 15

16

How does this differ from the mean? Why?

17 of 57

Mode

Mode the most frequently appearing element.

  • If there are no repeating elements, there is no mode.
  • If two or more elements occur most often, they are called bimodal (2), trimodal (3) or multimodal (4 or more).
  • This is the only measure suitable for non-numeric (qualitative) data.

17

18 of 57

Choosing the appropriate measure:

  • Mean is the most widely used and reported
  • Median used if:
    • Range of data is open ended (e.g. “x or greater” is a bin)
    • Dataset contains a few large outliers
    • Dataset is skewed
  • Mode is used for nominal/categorical data (data that is not ordered)

18

19 of 57

From the QuestionBank:

19

20 of 57

From the QuestionBank:

20

21 of 57

From the QuestionBank:

21

22 of 57

From the QuestionBank:

22

23 of 57

From the QuestionBank:

23

  1. What is the value of ‘p’?

24 of 57

From the QuestionBank:

24

  • What is the value of ‘p’?

25 of 57

From the QuestionBank:

25

b) What is the modal class?

26 of 57

From the QuestionBank:

26

b) What is the modal class?

27 of 57

From the QuestionBank:

27

c) Approximately 50% of performances sold less than ‘a’ tickets. Find ‘a’.

28 of 57

From the QuestionBank:

28

c) Approximately 50% of performances sold less than ‘a’ tickets. Find ‘a’.

29 of 57

Measures of centre

with technology…

In Excel/Google Sheets: (same formulas work in both applications)

Mean: =average(range)

Median: =median(range)

Mode: = mode(range)

Weighted mean:

=SUMPRODUCT(range1,range2)/sum(range2)

29

30 of 57

Measures of centre

with technology…

30

31 of 57

Measures of centre

with technology…

31

32 of 57

Desmos

AP Stats 1.4

32

33 of 57

Width of the Centre: Variance, Standard Deviation

  • Width of the centre
  • How the data spreads out from the centre
  • Only makes sense with normal distributions
    • Also called Gaussian distributions
  • Variance

  • Standard deviation - square root of variance

33

34 of 57

Finding Standard Deviation

34

65, 62, 75, 80, 71, 15

Mark Distance from the mean (Distance from the mean)2

35 of 57

Width of the Centre: Variance, Standard Deviation

  • Standard deviation - square root of variance
  • “Most” of the data is clustered around the mean
  • Distribution has long tails

35

36 of 57

Standard Deviation for Ungrouped Data on the GDC

36

37 of 57

Standard Deviation For Grouped Data on the GDC

37

38 of 57

Fitting Curves to Data

  • Approximating the data as a single curve
    • For single variable data
  • There are many ways to fit curves to data
  • A popular method is called Least Squares
    • Minimizes (using Calculus) the sum of the squares of the difference between the curve and the data points
  • Each method has trade offs - there is no single “best” method
  • More later in the course!

38

39 of 57

Dangers in Simplification

What Can Go Wrong With Losing Information?

39

2

40 of 57

Golden Rules of Statistics

  • Correlation is not causation
  • Simplification removes information, decreases subtlety and nuance
    • Corollary: second order effects can reverse conclusions
  • Averages of averages are meaningless
  • Data can be “dirty”
    • Bias, error, omissions

40

41 of 57

Example: Correlation is not Causation, Ex 1

41

42 of 57

Example: Correlation is not Causation, Ex 2

42

43 of 57

Example: Simplification Removes Information

  • What is going on here?

43

44 of 57

Qualitative and Quantitative Data

  • Qualitative:
    • Categories of data
    • More descriptive
    • What is your favourite pen colour?
    • How do you travel to school?
  • Quantitative:
    • Information that can be counted or measured
    • How many pens do you own?
    • How long did it take to get to school today?

44

45 of 57

Examples

Putting It Into Practice

45

3

46 of 57

Example 1

  • Describe the skew for the data shown:

46

47 of 57

Example 1

  • Describe the skew for the data shown:

  • Right Skew, Normal, Left Skew
  • Only the normal distribution has a sensible measure of central tendency (averages)

47

48 of 57

Example 2

  • A group of students was surveyed to find out how many hats each of them owned
  • Graph and comment on the data and central tendency

48

Hats

Freq

0

3

1

7

2

3

3

3

4

2

49 of 57

Example 2

  • A group of students was surveyed to find out how many hats each of them owned
  • Graph and comment on the data and central tendency
  • 18 students surveyed
  • Mode: 1
  • Median: 1
  • Mean: 1.67
  • Std Dev: 1.28
  • Hat Data

49

Hats

Freq

0

3

1

7

2

3

3

3

4

2

50 of 57

Example 3

  • All of the IB students in the school were asked how many minutes a day they spent on mathematics outside of class
  • Is the data continuous or discrete?
  • Is it qualitative or quantitative?
  • Is it normal or skewed?
  • Visualize and then infer meaning
  • How to calculate mean, median, and mode in this case?

50

Time (min)

# Students

0 - 15

21

15 - 30

32

30 - 45

35

45 - 60

41

60 - 75

27

75 - 90

11

51 of 57

Example 3

  • Discrete data
    • Quantitative?
    • More or less normal data
    • Slight skew left
  • How to calculate mean, median, and mode in this case?
    • Assume value in the centre of each bin, then use normal calculations
    • In this case, the values are 7.5, 22.5, 37.5, etc.

51

Time (min)

# Students

0 - 15

21

15 - 30

32

30 - 45

35

45 - 60

41

60 - 75

27

75 - 90

11

52 of 57

Example 4

  • Our Measurements:
  • Most students in our class are fully grown (height)
    • Let’s find out!
  • Measure height and analyze

52

53 of 57

Example 5

  • How many times do you smile each day?
    • Write down a guess
  • How many times does an average adult smile each day?
    • Write down a guess
  • How many times does an average child smile each day?
    • Write down a guess

53

54 of 57

Example 5

  • How many times do you smile each day?
    • Write down a guess
  • How many times does an average adult smile each day?
    • Write down a guess
  • How many times does an average child smile each day?
    • Write down a guess
  • Guesses from class

54

55 of 57

Example 5

  • How many smiles would you expect from this class in one day?
  • What is the “middle”, and spread of data?
  • Add a child to the class
    • How does this change anything?

55

56 of 57

Example 5

  • How many smiles would you expect from this class in one day?
  • What is the “middle”, and spread of data?
  • Add a child to the class
    • How does this change anything?

Potentially dubious reference on smiling

Happy adult - 40-50, “Normal” adult - 20, Child - 400

56

57 of 57

CREDITS

Special thanks to all the people who made and released these awesome resources for free:

  • Presentation template by SlidesCarnival

57