What is statistical inference?
Introduction: requirements and clarifications:
Posing an appropriate comparison investigative question using a given multivariate data set (Pain point 1)
Discussing sample distributions
Discussing sampling variability including the variability of estimates (Pain point 2). (What happens if we take more than one sample (multiple samples)? Do you think ‘results’ from different samples will be same? Or will they be different? Why?
Making an appropriate formal statistical inference (Pain point 3)
Required quality of student response
Introduction (posing a ‘question’)
Purpose statement
Population and sample
Bootstrap sampling (it is a sampling process)
Sampling  a discussion (The Island)
Confidence Interval
Problem: posing a question
Posing an appropriate comparison investigative question using a given multivariate data set
Discussion : distribution
Discussing sample distributions
Review box plot
Sampling variability: (did we understand it?)
Discussing sampling variability including the variability of estimates
Inference: the core ‘business’ (we are dealing with the ‘difference of medians’ NOT the medians)
Making an appropriate formal statistical inference
Quality of student responses for M and E
Required quality of student response
Sampling variations_Confidence Interval_bootstrapping
The importance of being wrong: insightful discussion from Dr Nic on her blog
Important sites which might be helpful
INZight : instruction
What to write? Some of the things we must avoid. Are we talking about the ‘sample’ or ‘population’? Sample= ‘definite article’; population= ‘no definite article’.
PSSS and OSEM (describing distribution of data)
Exemplar (NZ Crash Statistics)
Introduction: requirements and clarifications:What is ‘Pain Point’? The discussion points on which the students struggle most.
Posing an appropriate comparison investigative question using a given multivariate data set (Pain point 1)Sufficient time needs to be allocated for students to research the context and acquire appropriate contextual knowledge. For all grades, students need to identify a purpose and pose an investigative question which is informed by this contextual knowledge. The question needs to be comparative, and needs to refer to the population and the parameter under investigation. An appropriate question could be: "What is the difference between the median number of text messages sent per day by adults in New Zealand and the median number of text messages sent per day by teenagers in New Zealand?" 
Discussing sample distributionsStudents need to discuss, in context, what they see in the displays of the sample distributions. This could include central tendency, spread, shift and unusual values.

Making an appropriate formal statistical inference (Pain point 3)Students need to use the bootstrap confidence interval for the difference of the medians/means to answer their investigative question. The inference needs to identify the population and the parameter. Students also need to show an understanding about the nature of the confidence interval. An appropriate formal inference could be: “I am fairly sure that, in New Zealand, the median number of text messages sent by adults is more than the median number of text messages sent by teenagers and that the difference in the medians is between 12 and 17 text messages per day.”

Required quality of student responseFor Merit, students need to justify all findings with reference to evidence from the displays and statistics, and to link the purpose and findings to their research. For Excellence, students need to integrate the statistical and contextual knowledge gained from their research throughout the response, and to reflect on the process. Reflection could be shown by considering other relevant explanations.

Introduction (posing a ‘question’)
Purpose statementPurpose of the investigation: How to background setting...the context, 'There is a belief that... or 'I want to test the claim that...' How to make a hypothesis and justify why. "I suspect that... because" Summary of contextual research. 
Population and sample

Bootstrap sampling (it is a sampling process)


Confidence Interval

Problem: posing a questionPosing an appropriate comparison investigative question using a given multivariate data setSufficient time needs to be allocated for students to research the context and acquire appropriate contextual knowledge. For all grades, students need to identify a purpose and pose an investigative question which is informed by this contextual knowledge. The question needs to be comparative, and needs to refer to the population and the parameter under investigation. An appropriate question could be: "What is the difference between the median number of text messages sent per day by adults in New Zealand and the median number of text messages sent per day by teenagers in New Zealand?"

Discussion : distribution
Discussing sample distributionsStudents need to discuss, in context, what they see in the displays of the sample distributions. This could include central tendency, spread, shift and unusual values.

Sampling variability: (did we understand it?)
Link to the animations on Sampling variability
Discussing sampling variability including the variability of estimatesStudents need to show an understanding that if they were to take another sample from the population this is likely to result in different displays and summary statistics.

Inference: the core ‘business’ (we are dealing with the ‘difference of medians’ NOT the medians)Making an appropriate formal statistical inferenceStudents need to use the bootstrap confidence interval for the difference of the medians/means to answer their investigative question. The inference needs to identify the population and the parameter. Students also need to show an understanding about the nature of the confidence interval. An appropriate formal inference could be: “I am fairly sure that, in New Zealand, the median number of text messages sent by adults is more than the median number of text messages sent by teenagers and that the difference in the medians is between 12 and 17 text messages per day.” Note: the difference of the medians is the parameter!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
Quality of student responses for M and ERequired quality of student responseFor Merit, students need to justify all findings with reference to evidence from the displays and statistics, and to link the purpose and findings to their research. For Excellence, students need to integrate the statistical and contextual knowledge gained from their research throughout the response, and to reflect on the process. Reflection could be shown by considering other relevant explanations.

What to write? Some of the things we must avoid. Are we talking about the ‘sample’ or ‘population’? Sample= ‘definite article’; population= ‘no definite article’.
Achieved: Must cover every aspect of the statistical enquiry cycle Use statistical methods to make a formal inference involves showing evidence of using each component of the statistical enquiry cycle.
 Merit: Must justify aspects (an explanation for the choice of variables for the investigation). Use statistical methods to make a formal inference, with justification involves linking components of the statistical enquiry cycle to the context, and/or to the populations, and referring to evidence such as sample statistics, data values, or features of visual displays in support of statements made.
 Excellence: Must integrate statistical and contextual knowledge Use statistical methods to make a formal inference, with statistical insight involves integrating statistical and contextual knowledge throughout the statistical enquiry cycle, and may include reflecting about the process; considering other relevant explanations.

Example
•The purpose of this investigation is to see whether smoking affects the lung capacity of children. I wonder what the difference is between the Forced Expiratory Volume of child smokers, compared to children who do not smoke, in New Zealand. I assume it is lower, but how much lower is the FEV value? •What statistical word needs to be inserted into the investigative question?

Differentiating the sample and the population •Does this exemplar show that the student has a clear understanding of when they are talking about the sample and when they are talking about the population? •The use of the definite article “the” is useful when referring to sample data, and use no article when referring to the population.

Acknowledgement: Anne Patel and Jake wills. Presented this in NZAMT Conference. 

PSSS and OSEM (describing distribution of data)Main features of analysis  Levels of analysis in categories  Notes  Position (shift)
 Obvious  Mean, median or mode. (relative position) Median is better for data affected by extreme values.  Specific  Compare between groups.  Evidence  Numerical values  Meaning 
 Spread
 Obvious  Interquartile range (IQR) Do not use RANGE (affected by extreme values).  Specific  Wide or narrow  Evidence  Numerical values  Meaning  Link it to context, insightful comments.  Shape
 Obvious  Skew or symmetric. Very high peak or low peak.  Specific  Further clarification . Compare between groups.  Evidence  Numerical value.  Meaning  Link it to the context.  Special
 Obvious  Cluster, groups, extreme values, outliers.  Specific  Comparison between groups.  Evidence  Numerical values.  Meaning  Link it to the context. 

Problem  Posing a question  It must be a comparative question comparing a single variable over two or more different groups in the population.
 The statistic of interest (mean/median)
 The variable which is being investigated.
 Groups clearly described.
 Population must be identified.
 Predictions for the question you posed. “ I would expect the median ……..”.  Purpose: justify why you choose this variable. 

Plan/data  Will use iNZight software to generate Bootstrap sample explain bootstrap sampling.  Bootstrap sampling what is it?  Population and sample size: discuss. Bootstrap sampling is resampling from a sample with replacement. 
Analysis (focus on sample)  PSSS and OSEM describe the distribution of data  Discuss sample statistic 

Inference (Population: unseen: we don’t have much information about it) (based on the sample analysis, deciding on POPULATION characteristics).  Terms & statements…….  Tends to be
 It is safe bet to say
 Likely
 Unlikely
 Words referring to ‘uncertainty’. (Probability)
 Confidence level, confidence interval, 95% CI
 Unable to make a call
 ‘Population mean/median will be enclosed by the CI’
 95% confident that population mean/median will be within the CI.
 Lower limit of bootstrap CI is….
 Upper limit of bootstrap CI is….

Conclusion  Answer the posed question. Summary. 

Exemplar (NZ Crash Statistics)

Extra note:
 Compare OVS (overall visible spread) with the DBM (distance between median). If DBM =>⅓ of OVS, it is likely there will be considerable difference between median.
(This is only for preliminary discussion


Describing the distribution

Acknowledgement: One of the exemplars from NZQA site. 
