1 of 41

Probability, Statistics and Errors�in High Energy Physics

Wen-Chen Chang

Institute of Physics, Academia Sinica

章文箴

中央研究院 物理研究所

2 of 41

Outline

  • Errors
  • Probability distribution: Binomial, Poisson, Gaussian
  • Confidence Level
  • Monte Carlo Method

3 of 41

Why do we do experiments?

  1. Parameter determination: determine the numerical value of some physical quantity.
  2. Hypothesis testing: test whether a particular theory is consistent with our data.

4 of 41

Why estimate errors?

  • We are concerned not only with the answer but also with its accuracy.
  • For example, speed of light 2.998x108 m/sec
    • (3.09±0.15) x108:
    • (3.09±0.01) x108:
    • (3.09±2) x108:

5 of 41

Source of Errors

  • Random (Statistic) error: the inability of any measuring device to give infinitely accurate answers.
  • Systematic error: uncertainty.

6 of 41

Systematic Errors

Systematic effects is a general category which includes effects such as background, scanning efficiency, energy resolution, angle resolution, variation of counter efficiency with beam position and energy, dead time, etc. The uncertainty in the estimation of such as systematic effect is called a systematic error

Orear

Systematic Error: reproducible inaccuracy introduced by faulty equipment, calibration, or technique

Bevington

Error=mistake?

Error=uncertainty?

7 of 41

Experimental Examples

  • Energy in a calorimeter E=aD+b

a & b determined by calibration expt

  • Branching ratio B=N/(ηNT)

η found from Monte Carlo studies

  • Steel rule calibrated at 15C but used in warm lab

If not spotted, this is a mistake

If temp. measured, not a problem

If temp. not measured guess →uncertainty

Repeating measurements doesn’t help

8 of 41

The Binomial

n trials r successes

Individual success probability p

Variance

V≡σ2=<(r- μ )2>=<r2>-<r>2

=np(1-p)

Mean

μ=<r>=ΣrP( r )

= np

1-p ≡p ≡ q

A random process with exactly two possible outcomes which occur with fixed probabilities.

9 of 41

Binomial Examples

n=10

p=0.2

p=0.5

p=0.8

p=0.1

n=20

n=50

n=5

10 of 41

Poisson

‘Events in a continuum’

The probability of observing r independent events in a time interval t, when the counting rate is μ and the expected number events in the time interval is λ.

Mean

μ=<r>=ΣrP( r )

= λ

Variance

V≡σ2=<(r- μ )2>=<r2>-<r>2

λ=2.5

11 of 41

More about Poisson

  • The approach of the binomial to the Poisson distribution as N increases.
  • The mean value of r for a variable with a Poisson distribution is λ and so is the variance. This is the basis of the well known n±√n formula that applies to statistical errors in many situations involving the counting of independent events during a fixed interval.
  • As λ→∞, the Poisson distribution tends to a Gaussian one.

12 of 41

Poisson Examples

λ=25

λ=10

λ=5.0

λ=0.5

λ=2.0

λ=1.0

13 of 41

Examples

  • The number of particles detected by a counter in a time t, in a situation where the particle flux φ and detector are independent of time, and where counter dead-time τ is such that φ τ <<1.
  • The number of interactions produced in a thin target when an intense pulse of N beam particles is incident on it.
  • The number of entries in a given bin of a histogram when the data are accumulated over a fixed time interval.

14 of 41

Binomial and Poisson

From an exam paper

A student is standing by the road, hoping to hitch a lift. Cars pass according to a Poisson distribution with a mean frequency of 1 per minute. The probability of an individual car giving a lift is 1%. Calculate the probability that the student is still waiting for a lift

(a) After 60 cars have passed

(b) After 1 hour

  1. 0.9960=0.5472

b) e-0.6 * 0.60 /0! =0.5488

15 of 41

Gaussian (Normal)

Probability Density

Mean

μ=<x>=xP( x ) dx

=μ

Variance

V≡σ2=<(x- μ )2>=<x2>-<x>2

2

16 of 41

Different Gaussians

There’s only one!

Normalisation (if required)

Location change μ

Width scaling factor

Falls to 1/e of peak at x=μ±σ

17 of 41

Probability Contents

68.27% within 1σ

95.45% within 2σ

99.73% within 3σ

90% within 1.645 σ

95% within 1.960 σ

99% within 2.576 σ

99.9% within 3.290σ

These numbers apply to Gaussians and only Gaussians

Other distributions have equivalent values which you could use of you wanted

18 of 41

Central Limit Theorem

Or: why is the Gaussian Normal?

If a variable x is produced by the convolution of variables x1,x2…xN

I) <x>=μ12+…μN

  1. V(x)=V1+V2+…VN
  2. P(x) becomes Gaussian for large N

19 of 41

Multidimensional Gaussian

20 of 41

Chi squared

Sum of squared discrepancies, scaled by expected error

Integrate all but 1-D of multi-D Gaussian

21 of 41

22 of 41

About Estimation

Theory

Data

Statistical

Inference

Theory

Data

Probability

Calculus

Given these distribution parameters, what can we say about the data?

Given this data, what can we say about the properties or parameters or correctness of the distribution functions?

23 of 41

What is an estimator?

An estimator (written with a hat) is a function of the data whose value, the estimate, is intended as a meaningful guess for the value of the parameter . (from PDG)

24 of 41

What is a good estimator?

A perfect estimator is:

  • Consistent
  • Unbiassed

  • Efficient

minimum

Minimum Variance Bound

One often has to work with less-than-perfect estimators

25 of 41

The Likelihood Function

Set of data {x1, x2, x3, …xN}

Each x may be multidimensional – never mind

Probability depends on some parameter a

a may be multidimensional – never mind

Total probability (density)

P(x1;a) P(x2;a) P(x3;a) …P(xN;a)=L(x1, x2, x3, …xN ;a)

The Likelihood

26 of 41

Maximum Likelihood Estimation

In practice usually maximise ln L as it’s easier to calculate and handle; just add the ln P(xi)

ML has lots of nice properties

Given data {x1, x2, x3, …xN} estimate a by maximising the likelihood L(x1, x2, x3, …xN ;a)

a

Ln L

â

27 of 41

Properties of ML estimation

  • It’s consistent

(no big deal)

  • It’s biased for small N

May need to worry

  • It is efficient for large N

Saturates the Minimum Variance Bound

  • It is invariant

If you switch to using u(a), then û=u(â)

a

Ln L

â

u

Ln L

û

28 of 41

More about ML

  • It is not ‘right’. Just sensible.
  • It does not give the ‘most likely value of a’. It’s the value of a for which this data is most likely.

  • Numerical Methods are often needed
  • Maximisation / Minimisation in >1 variable is not easy
  • Use MINUIT but remember the minus sign

29 of 41

ML does not give �goodness-of-fit

  • ML will not complain if your assumed P(x;a) is rubbish
  • The value of L tells you nothing

Fit P(x)=a1x+a0

will give a1=0; constant P

L= a0N

Just like you get from fitting

30 of 41

Least Squares

  • Measurements of y at various x with errors σ and prediction f(x;a)
  • Probability
  • Ln L

  • To maximise ln L, minimise χ2

x

y

So ML ‘proves’ Least Squares. But what ‘proves’ ML? Nothing

31 of 41

Least Squares: �The Really nice thing

  • Should get χ2≈1 per data point
  • Minimise χ2 makes it smaller – effect is 1 unit of χ2 for each variable adjusted. (Dimensionality of MultiD Gaussian decreased by 1.)

Ndegrees Of Freedom=Ndata pts – N parameters

  • Provides ‘Goodness of agreement’ figure which allows for credibility check

32 of 41

Chi Squared Results

Large χ2 comes from

  1. Bad Measurements
  2. Bad Theory
  3. Underestimated errors
  4. Bad luck

Small χ2 comes from

  1. Overestimated errors
  2. Good luck

33 of 41

Fitting Histograms

Often put {xi} into bins

Data is then {nj}

nj given by Poisson,

mean f(xj) =P(xj)Δx

4 Techniques

Full ML

Binned ML

Proper χ2

Simple χ2

x

x

34 of 41

What you maximise/minimise

  • Full ML

  • Binned ML

  • Proper χ2

  • Simple χ2

35 of 41

Confidence Level:�Meaning of Error Estimates

  • How often we expect to include “the true fixed value of our paramter” P0, within our quoted range, p±δp, for a repeated series of experiments?
  • For the actual value P0, the probability that a measurement will give us an answer in a specific range of p is given by the area under the relevant part of Gaussian curve. A conventional choice of this probability is 68%.

36 of 41

The Straightforward Example

Apples of different weights

Need to describe the distribution

μ = 68g σ = 17 g

50 100

All weights between 24 and 167 g (Tolerance)

90% lie between 50 and 100 g

94% are less than 100 g

96% are more than 50 g

Confidence level statements

37 of 41

Confidence Levels

  • Can quote at any level

(68%, 95%, 99%…)

  • Upper or lower or two-sided

(x<U x<L L<x<U)

  • Two-sided has further choice

(central, shortest…)

U

L

U’

38 of 41

Maximum Likelihood and Confidence Levels

ML estimator (large N) has variance given by MVB

At peak For large N

Ln L is a parabola (L is a Gaussian)

a

Ln L

Falls by ½ at

Falls by 2 at

Read off 68% , 95% confidence regions

39 of 41

Monte Carlo Calculations

  • The Monte Carlo approach provides a method of solving probability theory problems in situations where the necessary integrals are too difficult to perform.
  • Crucial element: random number generator.

40 of 41

An Example

41 of 41

References