Probability, Statistics and Errors�in High Energy Physics
Wen-Chen Chang
Institute of Physics, Academia Sinica
章文箴
中央研究院 物理研究所
Outline
Why do we do experiments?
Why estimate errors?
Source of Errors
Systematic Errors
Systematic effects is a general category which includes effects such as background, scanning efficiency, energy resolution, angle resolution, variation of counter efficiency with beam position and energy, dead time, etc. The uncertainty in the estimation of such as systematic effect is called a systematic error
Orear
Systematic Error: reproducible inaccuracy introduced by faulty equipment, calibration, or technique
Bevington
Error=mistake?
Error=uncertainty?
Experimental Examples
a & b determined by calibration expt
η found from Monte Carlo studies
If not spotted, this is a mistake
If temp. measured, not a problem
If temp. not measured guess →uncertainty
Repeating measurements doesn’t help
The Binomial
n trials r successes
Individual success probability p
Variance
V≡σ2=<(r- μ )2>=<r2>-<r>2
=np(1-p)
Mean
μ=<r>=ΣrP( r )
= np
1-p ≡p ≡ q
A random process with exactly two possible outcomes which occur with fixed probabilities.
Binomial Examples
n=10
p=0.2
p=0.5
p=0.8
p=0.1
n=20
n=50
n=5
Poisson
‘Events in a continuum’
The probability of observing r independent events in a time interval t, when the counting rate is μ and the expected number events in the time interval is λ.
Mean
μ=<r>=ΣrP( r )
= λ
Variance
V≡σ2=<(r- μ )2>=<r2>-<r>2
=λ
λ=2.5
More about Poisson
Poisson Examples
λ=25
λ=10
λ=5.0
λ=0.5
λ=2.0
λ=1.0
Examples
Binomial and Poisson
From an exam paper
A student is standing by the road, hoping to hitch a lift. Cars pass according to a Poisson distribution with a mean frequency of 1 per minute. The probability of an individual car giving a lift is 1%. Calculate the probability that the student is still waiting for a lift
(a) After 60 cars have passed
(b) After 1 hour
b) e-0.6 * 0.60 /0! =0.5488
Gaussian (Normal)
Probability Density
Mean
μ=<x>=∫xP( x ) dx
=μ
Variance
V≡σ2=<(x- μ )2>=<x2>-<x>2
=σ2
Different Gaussians
There’s only one!
Normalisation (if required)
Location change μ
Width scaling factor
Falls to 1/e of peak at x=μ±σ
Probability Contents
68.27% within 1σ
95.45% within 2σ
99.73% within 3σ
90% within 1.645 σ
95% within 1.960 σ
99% within 2.576 σ
99.9% within 3.290σ
These numbers apply to Gaussians and only Gaussians
Other distributions have equivalent values which you could use of you wanted
Central Limit Theorem
Or: why is the Gaussian Normal?
If a variable x is produced by the convolution of variables x1,x2…xN
I) <x>=μ1+μ2+…μN
Multidimensional Gaussian
Chi squared
Sum of squared discrepancies, scaled by expected error
Integrate all but 1-D of multi-D Gaussian
About Estimation
Theory
Data
Statistical
Inference
Theory
Data
Probability
Calculus
Given these distribution parameters, what can we say about the data?
Given this data, what can we say about the properties or parameters or correctness of the distribution functions?
What is an estimator?
An estimator (written with a hat) is a function of the data whose value, the estimate, is intended as a meaningful guess for the value of the parameter . (from PDG)
What is a good estimator?
A perfect estimator is:
minimum
Minimum Variance Bound
One often has to work with less-than-perfect estimators
The Likelihood Function
Set of data {x1, x2, x3, …xN}
Each x may be multidimensional – never mind
Probability depends on some parameter a
a may be multidimensional – never mind
Total probability (density)
P(x1;a) P(x2;a) P(x3;a) …P(xN;a)=L(x1, x2, x3, …xN ;a)
The Likelihood
Maximum Likelihood Estimation
In practice usually maximise ln L as it’s easier to calculate and handle; just add the ln P(xi)
ML has lots of nice properties
Given data {x1, x2, x3, …xN} estimate a by maximising the likelihood L(x1, x2, x3, …xN ;a)
a
Ln L
â
Properties of ML estimation
(no big deal)
May need to worry
Saturates the Minimum Variance Bound
If you switch to using u(a), then û=u(â)
a
Ln L
â
u
Ln L
û
More about ML
ML does not give �goodness-of-fit
Fit P(x)=a1x+a0
will give a1=0; constant P
L= a0N
Just like you get from fitting
Least Squares
x
y
So ML ‘proves’ Least Squares. But what ‘proves’ ML? Nothing
Least Squares: �The Really nice thing
Ndegrees Of Freedom=Ndata pts – N parameters
Chi Squared Results
Large χ2 comes from
Small χ2 comes from
Fitting Histograms
Often put {xi} into bins
Data is then {nj}
nj given by Poisson,
mean f(xj) =P(xj)Δx
4 Techniques
Full ML
Binned ML
Proper χ2
Simple χ2
x
x
What you maximise/minimise
Confidence Level:�Meaning of Error Estimates
The Straightforward Example
Apples of different weights
Need to describe the distribution
μ = 68g σ = 17 g
50 100
All weights between 24 and 167 g (Tolerance)
90% lie between 50 and 100 g
94% are less than 100 g
96% are more than 50 g
Confidence level statements
Confidence Levels
(68%, 95%, 99%…)
(x<U x<L L<x<U)
(central, shortest…)
U
L
U’
Maximum Likelihood and Confidence Levels
ML estimator (large N) has variance given by MVB
At peak For large N
Ln L is a parabola (L is a Gaussian)
a
Ln L
Falls by ½ at
Falls by 2 at
Read off 68% , 95% confidence regions
Monte Carlo Calculations
An Example
References