What is statistical inference?

Introduction: requirements and clarifications:

Posing an appropriate comparison investigative question using a given multivariate data set  (Pain point 1)

Discussing sample distributions

Discussing sampling variability including the variability of estimates  (Pain  point 2). (What happens if we take more than one sample (multiple samples)? Do you think ‘results’ from different samples will be same? Or will they be different?  Why?

Making an appropriate formal statistical inference  (Pain point 3)

Required quality of student response

Introduction (posing a ‘question’)

Purpose statement

Population and sample

Bootstrap sampling (it is a sampling process)

Sampling - a discussion (The Island)

Confidence Interval

Problem: posing a question

Posing an appropriate comparison investigative question using a given multivariate data set

Discussion : distribution

Discussing sample distributions

Review box plot

Sampling variability: (did we understand it?)

Discussing sampling variability including the variability of estimates

Inference: the core ‘business’ (we are dealing with the ‘difference of medians’ NOT the medians)

Making an appropriate formal statistical inference

Quality of student responses for M and E

Required quality of student response

Sampling variations_Confidence Interval_bootstrapping

The importance of being wrong: insightful discussion from Dr Nic on her blog

Important sites which might be helpful

INZight : instruction

What to write? Some of the things we must avoid. Are we talking about the ‘sample’ or ‘population’? Sample= ‘definite article’; population= ‘no definite article’.

PSSS   and  OSEM (describing distribution of data)

Exemplar (NZ Crash Statistics)

What is statistical inference?

Dr Nic explains stats inference on her blog post

2016-05-07_22-39-46.png

2016-05-07_22-39-46.png

Introduction: requirements and clarifications:

What is ‘Pain Point’? The discussion points on which the students struggle most.

Posing an appropriate comparison investigative question using a given multivariate data set  (Pain point 1)

Sufficient time needs to be allocated for students to research the context and acquire appropriate contextual knowledge. For all grades, students need to identify a purpose and pose an investigative question which is informed by this contextual knowledge. The question needs to be comparative, and needs to refer to the population and the parameter under investigation.

An appropriate question could be: "What is the difference between the median number of text messages sent per day by adults in New Zealand and the median number of text messages sent per day by teenagers in New Zealand?"

Discussing sample distributions

Students need to discuss, in context, what they see in the displays of the sample distributions. This could include central tendency, spread, shift and unusual values.

Discussing sampling variability including the variability of estimates  (Pain  point 2). (What happens if we take more than one sample (multiple samples)? Do you think ‘results’ from different samples will be same? Or will they be different?  Why?

Link to the animations on sampling variability

Link for the short clip from Chris Wild's animation on Sampling Variability

Students need to show an understanding that if they were to take another sample from the population this is likely to result in different displays and summary statistics.

Making an appropriate formal statistical inference  (Pain point 3)

Students need to use the bootstrap confidence interval for the difference of the medians/means to answer their investigative question. The inference needs to identify the population and the parameter. Students also need to show an understanding about the nature of the confidence interval.

An appropriate formal inference could be: “I am fairly sure that, in New Zealand, the median number of text messages sent by adults is more than the median number of text messages sent by teenagers and that the difference in the medians is between 12 and 17 text messages per day.”

Required quality of student response

For Merit, students need to justify all findings with reference to evidence from the displays and statistics, and to link the purpose and findings to their research.

For Excellence, students need to integrate the statistical and contextual knowledge gained from their research throughout the response, and to reflect on the process. Reflection could be shown by considering other relevant explanations.

Introduction (posing a ‘question’)

Purpose statement

Purpose of the investigation:

How to background setting...the context, 'There is a belief that... or 'I want to test the claim that...'

How to make a hypothesis and justify why. "I suspect that...  because"

Summary of contextual research.

Year 12 Informal Inference: REVIEW

Population and sample

2016-04-13_22-46-32.jpg

2016-04-13_22-46-32.jpg

Bootstrap sampling (it is a sampling process)

2016-04-13_22-46-32.jpg

2016-04-13_22-46-32.jpg

2016-04-13_22-46-32.jpg

Sampling - a discussion (The Island)

The Island_vardo_sampling - some considerations

Island in schools programme

random number generator

Census At School Resources: formal inference

Refer to this site for Achievement Standard requirement. Exemplar is also uploaded for students’ reference.

Confidence Interval

Problem: posing a question

Posing an appropriate comparison investigative question using a given multivariate data set

Sufficient time needs to be allocated for students to research the context and acquire appropriate contextual knowledge. For all grades, students need to identify a purpose and pose an investigative question which is informed by this contextual knowledge. The question needs to be comparative, and needs to refer to the population and the parameter under investigation.

An appropriate question could be: "What is the difference between the median number of text messages sent per day by adults in New Zealand and the median number of text messages sent per day by teenagers in New Zealand?"

Discussion : distribution

Discussing sample distributions

Students need to discuss, in context, what they see in the displays of the sample distributions. This could include central tendency, spread, shift and unusual values.

Review box plot

Murdoch Uni: Box plot: review

Sampling variability: (did we understand it?)

Link to the animations on Sampling variability

Discussing sampling variability including the variability of estimates

Students need to show an understanding that if they were to take another sample from the population this is likely to result in different displays and summary statistics.

Inference: the core ‘business’ (we are dealing with the ‘difference of medians’ NOT the medians)

Making an appropriate formal statistical inference

Students need to use the bootstrap confidence interval for the difference of the medians/means to answer their investigative question. The inference needs to identify the population and the parameter. Students also need to show an understanding about the nature of the confidence interval.

An appropriate formal inference could be: “I am fairly sure that, in New Zealand, the median number of text messages sent by adults is more than the median number of text messages sent by teenagers and that the difference in the medians is between 12 and 17 text messages per day.”

Note: the difference of the medians is the parameter!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Quality of student responses for M and E

Required quality of student response

For Merit, students need to justify all findings with reference to evidence from the displays and statistics, and to link the purpose and findings to their research.

For Excellence, students need to integrate the statistical and contextual knowledge gained from their research throughout the response, and to reflect on the process. Reflection could be shown by considering other relevant explanations.

Sampling variations_Confidence Interval_bootstrapping

Sampling Variability demonstration with sample size 10 &100

Following is an important link from Dr Nic’s blog. All the students of statistics must read it.

Dr Nic's blog post: Confidence Interval

The importance of being wrong: insightful discussion from Dr Nic on her blog

Important sites which might be helpful

Nayland College Mathematics site

MathsNZStudents_Jake Wills

INZight : instruction

Link to INZight instruction page

What to write? Some of the things we must avoid. Are we talking about the ‘sample’ or ‘population’? Sample= ‘definite article’; population= ‘no definite article’.

Achieved: Must cover every aspect of the statistical enquiry cycle

Use statistical methods to make a formal inference involves showing evidence of using each component of the statistical enquiry cycle.

Merit: Must justify aspects (an explanation for the choice of variables for the investigation).

Use statistical methods to make a formal inference, with justification involves linking components of the statistical enquiry cycle to the context, and/or to the populations, and referring to evidence such as sample statistics, data values, or features of visual displays in support of statements made.

Excellence: Must integrate statistical and contextual knowledge

Use statistical methods to make a formal inference, with statistical insight involves integrating statistical and contextual knowledge throughout the statistical enquiry cycle, and may include reflecting about the process; considering other relevant explanations.

Example

The purpose of this investigation is to see whether smoking affects the lung capacity of children. I wonder what the difference is between the Forced Expiratory Volume of child smokers, compared to children who do not smoke, in New Zealand. I assume it is lower, but how much lower is the FEV value?

What statistical word needs to be inserted into the investigative question?

Differentiating the sample and the population

Does this exemplar show that the student has a clear understanding of when they are talking about the sample and when they are talking about the population?

The use of the definite article “the” is useful when referring to sample data, and use no article when referring to the population.

Acknowledgement:

Anne Patel    and Jake wills.

Presented this in NZAMT Conference.

PSSS   and  OSEM (describing distribution of data)

Main features of analysis

Levels of analysis in categories

Notes

Position  (shift)

Obvious

Mean, median or mode.

(relative position)

Median is better for data affected by extreme values.

Specific

Compare between groups.

Evidence

Numerical values

Meaning

Spread

Obvious

Inter-quartile range (IQR)

Do not use RANGE (affected by extreme values).

Specific

Wide or narrow

Evidence

Numerical values

Meaning

Link it to context, insightful comments.

Shape

Obvious

Skew or symmetric. Very high peak or low peak.

Specific

Further clarification . Compare between groups.

Evidence

Numerical value.

Meaning

Link it to the context.

Special

Obvious

Cluster, groups, extreme values, outliers.

Specific

Comparison between groups.

Evidence

Numerical values.

Meaning

Link it to the context.

Problem

Posing a question

  • It must be a comparative question comparing  a single variable over two or more different groups in the population.
  • The statistic of interest (mean/median)
  • The variable which is being investigated.
  • Groups clearly described.
  • Population must be identified.

Predictions for the question you posed. “ I would expect the median ……..”.

Purpose: justify why you choose this variable.

Plan/data

Will use iNZight software to generate Bootstrap sample- explain bootstrap sampling.

Bootstrap sampling- what is it?

Population and sample size: discuss. Bootstrap sampling is resampling from a sample with replacement.

Analysis

(focus on sample)

PSSS  and OSEM- describe the distribution of data

Discuss sample statistic

Inference

(Population: unseen: we don’t have much information about it)

(based on the sample analysis, deciding on POPULATION characteristics).

Terms & statements…….

  • Tends to be
  • It is safe bet to say
  • Likely
  • Unlikely
  • Words referring to ‘uncertainty’. (Probability)
  • Confidence level, confidence interval, 95% CI
  • Unable to make a call
  • ‘Population mean/median will be enclosed by the CI’
  • 95% confident that population mean/median will be within the CI.
  • Lower limit of bootstrap CI is….
  • Upper limit of bootstrap CI is….

Conclusion

Answer the posed question.

Summary.

Exemplar (NZ Crash Statistics)

Extra note:

Compare OVS (overall visible spread) with the DBM (distance between median).

If  DBM =>⅓  of  OVS, it is likely there will be considerable  difference between median.

(This is only for preliminary discussion-

Describing the distribution

Acknowledgement:

One of the exemplars from NZQA site.