What is statistical inference?

Introduction: requirements and clarifications:

Posing an appropriate comparison investigative question using a given multivariate data set  (Pain point 1)

Discussing sample distributions

Discussing sampling variability including the variability of estimates  (Pain  point 2). (What happens if we take more than one sample (multiple samples)? Do you think ‘results’ from different samples will be same? Or will they be different?  Why?

Making an appropriate formal statistical inference  (Pain point 3)

Required quality of student response

Introduction (posing a ‘question’)

Purpose statement

Population and sample

Bootstrap sampling (it is a sampling process)

Sampling - a discussion (The Island)

Confidence Interval

Problem: posing a question

Posing an appropriate comparison investigative question using a given multivariate data set

Discussion : distribution

Discussing sample distributions

Review box plot

Sampling variability: (did we understand it?)

Discussing sampling variability including the variability of estimates

Inference: the core ‘business’ (we are dealing with the ‘difference of medians’ NOT the medians)

Making an appropriate formal statistical inference

Quality of student responses for M and E

Required quality of student response

Sampling variations_Confidence Interval_bootstrapping

The importance of being wrong: insightful discussion from Dr Nic on her blog

Important sites which might be helpful

INZight : instruction

What to write? Some of the things we must avoid. Are we talking about the ‘sample’ or ‘population’? Sample= ‘definite article’; population= ‘no definite article’.

PSSS   and  OSEM (describing distribution of data)

Exemplar (NZ Crash Statistics)

# What is statistical inference?

Dr Nic explains stats inference on her blog post

# Introduction: requirements and clarifications:

What is ‘Pain Point’? The discussion points on which the students struggle most.

### Posing an appropriate comparison investigative question using a given multivariate data set  (Pain point 1)

Sufficient time needs to be allocated for students to research the context and acquire appropriate contextual knowledge. For all grades, students need to identify a purpose and pose an investigative question which is informed by this contextual knowledge. The question needs to be comparative, and needs to refer to the population and the parameter under investigation.

An appropriate question could be: "What is the difference between the median number of text messages sent per day by adults in New Zealand and the median number of text messages sent per day by teenagers in New Zealand?"

### Discussing sample distributions

Students need to discuss, in context, what they see in the displays of the sample distributions. This could include central tendency, spread, shift and unusual values.

### Discussing sampling variability including the variability of estimates  (Pain  point 2). (What happens if we take more than one sample (multiple samples)? Do you think ‘results’ from different samples will be same? Or will they be different?  Why?

Link to the animations on sampling variability

Link for the short clip from Chris Wild's animation on Sampling Variability

Students need to show an understanding that if they were to take another sample from the population this is likely to result in different displays and summary statistics.

### Making an appropriate formal statistical inference  (Pain point 3)

Students need to use the bootstrap confidence interval for the difference of the medians/means to answer their investigative question. The inference needs to identify the population and the parameter. Students also need to show an understanding about the nature of the confidence interval.

An appropriate formal inference could be: “I am fairly sure that, in New Zealand, the median number of text messages sent by adults is more than the median number of text messages sent by teenagers and that the difference in the medians is between 12 and 17 text messages per day.”

### Required quality of student response

For Merit, students need to justify all findings with reference to evidence from the displays and statistics, and to link the purpose and findings to their research.

For Excellence, students need to integrate the statistical and contextual knowledge gained from their research throughout the response, and to reflect on the process. Reflection could be shown by considering other relevant explanations.

# Introduction (posing a ‘question’)

## Purpose statement

Purpose of the investigation:

How to background setting...the context, 'There is a belief that... or 'I want to test the claim that...'

How to make a hypothesis and justify why. "I suspect that...  because"

Summary of contextual research.

 Year 12 Informal Inference: REVIEW

# Sampling - a discussion (The Island)

The Island_vardo_sampling - some considerations

Island in schools programme

random number generator

 Census At School Resources: formal inferenceRefer to this site for Achievement Standard requirement. Exemplar is also uploaded for students’ reference.

# Problem: posing a question

### Posing an appropriate comparison investigative question using a given multivariate data set

Sufficient time needs to be allocated for students to research the context and acquire appropriate contextual knowledge. For all grades, students need to identify a purpose and pose an investigative question which is informed by this contextual knowledge. The question needs to be comparative, and needs to refer to the population and the parameter under investigation.

An appropriate question could be: "What is the difference between the median number of text messages sent per day by adults in New Zealand and the median number of text messages sent per day by teenagers in New Zealand?"

# Discussion : distribution

### Discussing sample distributions

Students need to discuss, in context, what they see in the displays of the sample distributions. This could include central tendency, spread, shift and unusual values.

## Review box plot

Murdoch Uni: Box plot: review

# Sampling variability: (did we understand it?)

Link to the animations on Sampling variability

### Discussing sampling variability including the variability of estimates

Students need to show an understanding that if they were to take another sample from the population this is likely to result in different displays and summary statistics.

# Inference: the core ‘business’ (we are dealing with the ‘difference of medians’ NOT the medians)

### Making an appropriate formal statistical inference

Students need to use the bootstrap confidence interval for the difference of the medians/means to answer their investigative question. The inference needs to identify the population and the parameter. Students also need to show an understanding about the nature of the confidence interval.

An appropriate formal inference could be: “I am fairly sure that, in New Zealand, the median number of text messages sent by adults is more than the median number of text messages sent by teenagers and that the difference in the medians is between 12 and 17 text messages per day.”

Note: the difference of the medians is the parameter!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

# Quality of student responses for M and E

### Required quality of student response

For Merit, students need to justify all findings with reference to evidence from the displays and statistics, and to link the purpose and findings to their research.

For Excellence, students need to integrate the statistical and contextual knowledge gained from their research throughout the response, and to reflect on the process. Reflection could be shown by considering other relevant explanations.

# Sampling variations_Confidence Interval_bootstrapping

Sampling Variability demonstration with sample size 10 &100

 Following is an important link from Dr Nic’s blog. All the students of statistics must read it.Dr Nic's blog post: Confidence Interval

# Important sites which might be helpful

Nayland College Mathematics site

MathsNZStudents_Jake Wills

# What to write? Some of the things we must avoid. Are we talking about the ‘sample’ or ‘population’? Sample= ‘definite article’; population= ‘no definite article’.

 Achieved: Must cover every aspect of the statistical enquiry cycleUse statistical methods to make a formal inference involves showing evidence of using each component of the statistical enquiry cycle. Merit: Must justify aspects (an explanation for the choice of variables for the investigation).Use statistical methods to make a formal inference, with justification involves linking components of the statistical enquiry cycle to the context, and/or to the populations, and referring to evidence such as sample statistics, data values, or features of visual displays in support of statements made. Excellence: Must integrate statistical and contextual knowledgeUse statistical methods to make a formal inference, with statistical insight involves integrating statistical and contextual knowledge throughout the statistical enquiry cycle, and may include reflecting about the process; considering other relevant explanations.

Example

 •The purpose of this investigation is to see whether smoking affects the lung capacity of children. I wonder what the difference is between the Forced Expiratory Volume of child smokers, compared to children who do not smoke, in New Zealand. I assume it is lower, but how much lower is the FEV value?•What statistical word needs to be inserted into the investigative question?

 Differentiating the sample and the population•Does this exemplar show that the student has a clear understanding of when they are talking about the sample and when they are talking about the population?•The use of the definite article “the” is useful when referring to sample data, and use no article when referring to the population.

 Acknowledgement:Anne Patel    and Jake wills.Presented this in NZAMT Conference.

# PSSS   and  OSEM (describing distribution of data)

 Main features of analysis Levels of analysis in categories Notes Position  (shift) Obvious Mean, median or mode.(relative position)Median is better for data affected by extreme values. Specific Compare between groups. Evidence Numerical values Meaning Spread Obvious Inter-quartile range (IQR)Do not use RANGE (affected by extreme values). Specific Wide or narrow Evidence Numerical values Meaning Link it to context, insightful comments. Shape Obvious Skew or symmetric. Very high peak or low peak. Specific Further clarification . Compare between groups. Evidence Numerical value. Meaning Link it to the context. Special Obvious Cluster, groups, extreme values, outliers. Specific Comparison between groups. Evidence Numerical values. Meaning Link it to the context.

 Problem Posing a questionIt must be a comparative question comparing  a single variable over two or more different groups in the population.The statistic of interest (mean/median)The variable which is being investigated.Groups clearly described.Population must be identified. Predictions for the question you posed. “ I would expect the median ……..”. Purpose: justify why you choose this variable.

 Plan/data Will use iNZight software to generate Bootstrap sample- explain bootstrap sampling. Bootstrap sampling- what is it? Population and sample size: discuss. Bootstrap sampling is resampling from a sample with replacement.

 Analysis(focus on sample) PSSS  and OSEM- describe the distribution of data Discuss sample statistic

 Inference(Population: unseen: we don’t have much information about it)(based on the sample analysis, deciding on POPULATION characteristics). Terms & statements…….Tends to beIt is safe bet to sayLikelyUnlikelyWords referring to ‘uncertainty’. (Probability)Confidence level, confidence interval, 95% CIUnable to make a call‘Population mean/median will be enclosed by the CI’95% confident that population mean/median will be within the CI.Lower limit of bootstrap CI is….Upper limit of bootstrap CI is….