1 of 170

How to handle extremely rare events in a meaningful way

Scott Ferson

Institute for Risk and Uncertainty, University of Liverpool

Nuclear Energy - GREEN CDT, 26 July 2023

Photo credit: Tatyana Ginzburg, used under license

This presentation has animations. Select Slide Show / From Beginning to see them.

Turn on recording and use microphone

2 of 170

Today

  • Logistics
  • Mostly lecture (sorry)
  • But interrupt!
  • Remind me at the break
  • Discussion

  • Download these slides from

https://sites.google.com/view/greenrisk

�NAFEMS is the International Association for the Engineering Modelling, Analysis and Simulation Community

https://en.wikipedia.org/wiki/Simulation_governance

Simulation governance is a managerial function concerned with assurance of reliability of information generated by numerical simulation. The term was introduced in 2011 [1] and specific technical requirements were addressed from the perspective of mechanical design in 2012.[2] Its strategic importance was addressed in 2015.[3][4] At the 2017 NAFEMS World Congress in Stockholm simulation governance was identified as the first of eight “big issues” in numerical simulation.

Edimu et al. (2010)

Risk analysis in engineering 26

So how are we doing? 7

Dependence 9

Kinds of uncertainty 8

Getting the inputs 18

Risk communication 3

Bayesian networks 4

Backcalculation 12

Conclusions 12

NAFEMS challenge

Fire alarm at 11 am

Look for the falconer on the roof next door

3 of 170

Open question

Suppose

A is in [2, 4]

B is in [3, 5]

What can be said about the sum A+B?

4 of 170

Deepwater Horizon Spill (2010)

Columbia Disaster (2003)

2007-2008 Financial Crisis

Fukushima Daiichi Nuclear Disaster (2011)

Air France Flight 447 Rio−Paris (2009)

Minneapolis Bridge Collapse (2007)

Credits (left to right):

https://www.telegraph.co.uk/news/worldnews/asia/japan/8863968/Fukushima-nuclear-disaster-timeline.html.

Unknown photographer - US Coast Guard - 100421-G-XXXXL- Deepwater Horizon fire (Direct link), public domain.

Pawel Kierzkowski, https://en.wikipedia.org/wiki/Air_France_Flight_447, CC BY-SA 3.0.

Dominic Alves from Brighton, England - Northern Rock Queue, CC BY 2.0.

The Ponte Morundi collapsed 45 meters killing 43 people, Image credit: The Daily Sabah

Kevin Rofidal, United States Coast Guard, https://en.wikipedia.org/wiki/I-35W_Mississippi_River_bridge, public domain.

https://www.youtube.com/watch?v=aXiZ3RHR3bg

RAAC (reinforced autoclaved aerated concrete), collapse of a school roof in Kent (2018) image credit: no credit

Morry Gash, NPR, https://www.npr.org/2017/08/01/540669701/10-years-after-bridge-collapse-america-is-still-crumbling?t=1614450713959

The British bank Northern Rock failed as a result of a bank run precipitated by the global financial crisis. It was the first bank in 150 years to fail due to a bank run.

Don’t confuse the Fukushima Daiichi nuclear plant with a more dramatic picture of natural gas storage tanks burn at a facility in Chiba Prefecture, near Tokyo, which was another consequence of the same earthquake and tsunami that affected the Fukushima Daiichi nuclear power plant.

Hurricane Katrina

2011 natural gas storage tanks burn at a facility in Chiba Prefecture, near Tokyo, after !! It is not a nuclear power plant! But is another consequence of the same earthquake and tsunami that effected the Fukushima Daiichi nuclear power plant

Tōhoku earthquake and tsunami

2010 Failure of the blowout preventer led to eleven deaths, the largest-ever marine oil-spill, environmental catastrophe and costs to BP of more than $60B.

2009 crash with 228 fatalities of Air France flight 447 en route from Rio de Janeiro to Paris due to sensor failure and flawed human-software interaction

2011 explosive release of radioactive material in nuclear power plant meltdowns when cooling pumps lost battery power and backup generators were disabled by the tsunami resulting from the Tōhoku earthquake

https://programm.ard.de/TV/wdrfernsehen/kyrill---ein-orkan-fegt-durchs-land/eid_28111450974388

Ponte Morundi Collapse (2018)

5 of 170

Takoma Narrows Bridge failure (1940)

Gare Montparnasse (1895)

Boiler explosion (1880s—1910s)

Takoma Narrows (1940)

Challenger (1986)

Bhopal disaster (1984, 4000)

E. coli O157 (South Wales outbreak 2005; US 1993)

Fukushima Daiichi nuclear disaster 2011

Chernobyl 1986

Titanic (1912)

Minneapolis bridge (2004)

Mad cow

A radiation therapy machine caused 6 accidents in the late 1980s in which patients were given massive overdoses of radiation, because of a race condition program error. Now a standard case study in software engineering about the overconfidence of the engineers and lack of proper due diligence to resolve reported software bugs.

Therac-25 (late 1980s)

Bhopal disaster (1984)

Mad cow and E. coli O157 (late 1980s—today)

Hyatt Walkway Collapse (1981)

Challenger (1986)

Aloha Airlines Flight 243 (1988)

Chernobyl (1986)

Sampoong Department Store collapse (1995)

1988 explosive decompression at altitude of old aircraft with metal fatigue and possibly improper maintenance

Pyramid of Snefru at Maidum (2600 BCE)

6 of 170

Asteriod

Icelandic eruptions affecting European airtravel (2011)

Chilean eruption

California wildfires (2012)

Dought, pestilence

Landslide

Haiti earthquake (2010, 160,000)

Indian Ocean tsunami (2004, 280,000)

Yellow River floods (1931, ~4 million)

Yellow River floods (1931, ~4 million)

7 of 170

Risk assessment applications

  • Pollution

heavy metals, pesticides, PMx, ozone, PCBs, EMF, RF, etc., thermal pollution

  • Engineered systems

traffic safety, bridge design, airplanes, spacecraft, nuclear plants, reliability of electricity distribution

  • Financial investments

portfolio planning, consultation, instrument evaluation

  • Occupational hazards

manufacturing and factory workers, farm workers, hospital staff

  • Food safety and consumer product safety

benzene in Perrier, E. coli in beef, salmonella in tomato, children’s toys

  • Ecosystems and biological resources

endangered species, fisheries and reserve management, entrainment and impingement

$

8 of 170

More and more important

  • To improve efficiencies, most regulatory agencies are shifting from standards-based to performance-based regulations

      • Fire codes no longer say the stair wells have to be so wide; they just say the occupants must be able to evacuate the building in a given period of time

      • Slaughter houses needn’t spend money on old ag-school rules if they can otherwise control enteric contaminants

  • Risk analysis used to make the demonstrations

9 of 170

Risk analysis

  • Everywhere in science and engineering today

  • It was invented by the nuclear power industry

  • Major contributions from the United Kingdom

  • It is essential for next generation nuclear power

10 of 170

Throughout the nuclear industry

  • Containment reliability
  • Design and process safety
  • Human factors
  • Effects of thermal pollution from CWISs
  • Entrainment & impingement of aquatic biota
  • Reliability of electricity distribution networks
  • Occupational safety
  • Biotic control
  • Financial management

11 of 170

Risk

  • Chance that something bad happens, together with how bad it is

  • Chance is usually measured by probability

  • The pair of values may be given as a product, or as a graph when there are many pairs

12 of 170

Risk as product

  • Probability × severity = “Expected” injury

  • Would you prefer to kill 1000 people with probability 1%, or to surely kill 10 people?

  • To escape such decisions, risk analysts often prefer to show all outcomes graphically and let someone else decide

13 of 170

Risk curve

Badness

-1

10

0

1

Probability

0

Mortality

Financial loss

Sometimes called a Farmer curve (after the British nuclear regulator)

Probability unitless

Percentage per demand or outcome

Density per unit outcome

Frequency per unit time or outcome

Qualitative per unit

(e.g., ‘rare’, ‘plausible’, etc.)

14 of 170

Reg said

  • No line between safe and unsafe

  • Nothing has no risk

  • But we can quantify the risks

https://web.archive.org/web/20121218194620/https://netfiles.uiuc.edu/mragheb/www/NPRE%20457%20CSE%20462%20Safety%20Analysis%20of%20Nuclear%20Reactor%20Systems/The%20Risk%20Assessment%20Methodology.pdf

© M. Ragheb1/8/2011

In 1986, IAEA estimated 4,000 projected deaths due to Chernobyl

15 of 170

Risk assessment

  • Quantification of risk

  • Basically, systematic “what-if” thinking

  • Same as safety assessment

  • Many confused synonyms
    • Assessment, characterization, evaluation, analysis, quantification

16 of 170

Risk management

  • What we’re going to do about the risks

  • Usually includes the development and application of strategies or policies to remove, reduce or work around risks

  • May also include monitoring risks and communicating risks

17 of 170

Old school risk analysis

  • Because they were pioneering the discipline, they made a few mistakes

  • Assuming independence too much

  • Using uniforms or flat priors for “I don’t know”

  • Using stochastic mixture as averaging or using averages

  • The nuclear industry has not sustained its lead

X

X

X

X

X

X

Carelessness about ensembles is perhaps their fourth mistake

18 of 170

Traditional practice

  • Most assessments are deterministic

  • Some are deliberately conservative, but

    • degree of conservatism is opaque, unquantified, inconsistent

    • difficult to characterize risks, except in extreme situations

19 of 170

What’s needed

An assessment should tell us

  • How likely the various consequences are

  • How reliable the conclusions are

20 of 170

Probability distribution display

  • Density distribution or mass distributions

  • Cumulative distribution function (CDF)

  • Complementary CDF (exceedance risk)

Pr(X x)

x

x

Pr(x X)

x

p(x)

0

1

0

1

CDF is not core damage frequency!

21 of 170

Probabilistic models

  • The same as deterministic models except that point values are replaced by probability distributions

  • Well, almost
      • Ensembles
      • Distribution shapes and tails
      • Dependencies
      • Backcalculations

wrinkles

22 of 170

Ensemble

  • Statisticians call it the “population” or “reference class”

  • E.g., dysfunction rates in prostate patients
      • 50% of attempts at sex fail, or
      • 50% of men are totally impotent

23 of 170

Specification of ensembles

  • Above all, be clear
    • 1% of lethal dose per day / lethal dose on 1% of days
    • Assessment units: birds, meals, days, fields, …
    • Assessment ensemble: all birds, or just those on site?

  • Mismatched input distributions yield gibberish

  • If the analyst can’t say what ensemble a distribution represents, the analysis may be meaningless

24 of 170

Combining distributions (convolution)

X

X+Y

=

+

Y

Convolution is getting the output distribution from the input distributions. Monte Carlo is one way to do this.

25 of 170

Integrated calculations

Z

Y

X

W

f(W,X,Y,Z)

This is unlike intrusive analyses that solve operations sequentially

Sample realizations from each input distribution, and combine them to compute a scalar output value. Repeat many times to accumulate a distribution of output values.

26 of 170

Typical risk analysis problems

  • Fault tree analysis, event trees, etc.
  • Bayes nets
  • Arithmetic expressions
  • Strength − stress comparisons

27 of 170

Case study: dike reliability

28 of 170

The inputs

Relative density of the revetment blocks

  • = uniform(1.60, 1.65)

Block thickness

D = uniform(0.68, 0.72) meters

Slope of the revetment

  • = atan(uniform(0.32, 0.34)) radians

Analysts’ “model uncertainty” factor

M = uniform(3.0, 5.2)

Significant wave height (average of the highest third of waves)

H = weibull(1.35 meters, 11)

Offshore peak wave steepness

s = normal(0.04, 0.0055)

How do we specify these distributions?

29 of 170

Analysis

0.6

0.7

0.8

D

0.2

0.3

0.4

0

1

α

3

5

2

4

6

M

0.02

0.04

0.06

s

1.5

1.6

1.7

1

0

Δ

Cumulative probability

Z

Probability of Z being negative

(i.e., dike failure) is very small

1

0

2

H

0

1

0

1

(meters)

Cumulative probability

Z

0

1

0

P = 0.000015

30 of 170

Script for the R programming language

# case study for reliability of the dike

m = 1000000 # number of replications

delta = runif(m, 1.60, 1.65) # density of the revetment blocks

D = runif(m, 0.68, 0.72) # block thickness

alpha = atan(runif(m, 0.32, 0.34)) # slope of the revetment

M = runif(m, 3.0, 5.2) # “model uncertainty” factor

H = rweibull(m, 11,1.35) # significant wave height

s = rnorm(m, 0.04, 0.0055) # offshore peak wave steepness

Z = delta * D - (H* tan(alpha)) / (cos(alpha)*M*sqrt(s))

plot(ecdf(Z)); sum(Z<0)/m

Google “download R”

31 of 170

So how are we doing?

32 of 170

Dramatic technocratic failures

  • Kansai International Airport wishful thinking
  • Diablo Canyon Nuclear Power Plant empirically incorrect
  • Vioxx withdrawal missed decision context
  • Fukushima non-redundancy
  • Mars Climate Orbiter units
  • Google Flu drank the Koolaid
  • Ariane 5 integer overflow
  • Minneapolis bridge collapse “one in a billion”

33 of 170

These are not noble failures

  • They are not simply unfortunate but expected extreme tail events

  • Occur more often than our risk estimates predict

  • Managers never see the problems coming

  • Analysts are doing the risk estimations wrong

34 of 170

Famous ‘misunderestimations’

  • Shuttle risk estimated at 1/10,000 per flight, it was actually 1 per 70 flights

  • Grossly understating financial risks precipitated the 2007−2008 recession

  • 100-year flood risks underestimated and uncertainties not communicated

  • Failure cascades in electricity distribution systems are more common than they are forecasted to be

1998 National Weather Service, US Army Corps

(RAWG 2005; USCPSOTF 2006)

35 of 170

We are understating risks and uncertainties

  • NASA estimated the risk of losing the Space Shuttle at 1/10,000 per flight, but its observed failure rate was 1/70 flights (Hamlin et al. 2011; Oberkampf and Roy 2010)

  • National Weather Service and the Army Corps of Engineers underestimated risks of “100-year floods” and flubbed flood risk communication(Morss&Wahl 2007; NRC 2006; NWS 1998)

  • Observed failures and near-misses found at the Diablo Canyon Nuclear Power Plant reveal gross understatement of its assessed risks (Lochbaum 2011)

  • Failure cascades in power grids are more common than forecasted (RAWG 2005; USCPSOTF 2006)

  • Understating risks in the financial industry caused the Great Recession (Savage 2012; Silver 2012)

  • An engineer called the 2007 Minneapolis bridge collapse a “billion to one” event

36 of 170

What are we doing wrong?

  • Underestimation of the uncertainties of original measurements
  • Unjustified use of independence assumptions
  • Overly precise dependence assumptions
  • Unjustified assumption of equiprobability
  • Inappropriate use uniform distributions
  • Modelling epistemic uncertainty the same way as aleatory
  • Not accounting for model-form uncertainty
  • Modelling volitional choices with distributions as though they’re random
  • Using averaging as an aggregation
  • Making assumptions for the sake of mathematical convenience

Wishful thinking

37 of 170

These mistakes can compound

  • The errors all work in the same direction: underestimation of risks

  • Omitting uncertainties likewise lead to understating risks

38 of 170

Northeast Blackout of 2003

  • Most widespread in history after Southern Brazil Blackout of 1999

  • 55 million people affected

  • Traced to a software bug in a control room alarm system in Ohio

  • Electric Reliability Organization was created in the aftermath

I reviewed a paper for Risk Analysis on reliability theory by candle light.

39 of 170

Almost 200,000 km of lines, operated by 500 separate companies

40 of 170

Dependence

41 of 170

Dependence

  • Most variables are assumed to be independent

  • But some variables clearly aren’t
    • Density and porosity
    • Rainfall and temperature
    • Body size and skin surface area
    • Adjacent or chained delivery systems

42 of 170

Dependence can be complex

X

Y

43 of 170

Dependence is subtler than correlation

Knowing the correlation doesn’t tell you the dependence. Each plot has (Pearson) correlation 0.816

op <- par(las=1, mfrow=c(2,3), mar=1.5+c(4,4,1,1), oma=c(0,0,0,0),

lab=c(6,6,7),

#cex.lab=2.0, cex.axis=1.3,

mgp=c(3,1,0))

ff <- y ~ x

for(i in 1:4) {

ff[[2]] <- as.name(paste("y", i, sep=""))

ff[[3]] <- as.name(paste("x", i, sep=""))

lmi <- lm(ff, data= anscombe)

xl <- substitute(expression(x[i]), list(i=i))

yl <- substitute(expression(y[i]), list(i=i))

plot(ff, data=anscombe, col="red", pch=21, cex=2.4, bg = "orange",

xlim=c(3,19), ylim=c(3,13)

# , xlab=eval(xl), ylab=yl # for version 3

, xlab='', ylab='' # for version 3

)

abline(lmi, col="blue")

}

x = c(4,8,13.269,17.5); y = c(4,10,8,12.5); plot(x,y

,col="red", pch=21, cex=2.4, bg = "orange",

xlim=c(3,19), ylim=c(3,13)

, xlab='', ylab='' # for version 3

); cor(x,y)

abline(lmi, col="blue")

x = c(6,4,8,10,13,17.5,16,14,15,17.3); y = c(5.468,4,6,8,13,9.241,11,11.5,12,10); plot(x,y

,col="red", pch=21, cex=2.4, bg = "orange",

xlim=c(3,19), ylim=c(3,13)

, xlab='', ylab='' # for version 3

); cor(x,y)

abline(lmi, col="blue")

par(op)

http://en.wikipedia.org/wiki/Correlation_and_dependence

http://en.wikipedia.org/wiki/File:Anscombe%27s_quartet_3.svg (four of these six graphs)

This is a featured picture, which means that members of the community have identified it as one of the best images on the English Wikipedia, adding significantly to its accompanying article. If you have a different image of similar quality, be sure to upload it using the proper free license tag, add it to a relevant article, and nominate it.

44 of 170

Case study: Vesely’s tank

What’s the chance the tank ruptures under pumping?

Vesely et al. 1981

45 of 170

Fault tree and its Boolean expression

E1

T

K2

E2

E3

S

E4

S1

E5

K1

or

and

or

or

or

R

E1 = T (K2 (S & (S1 (K1 R))))

46 of 170

Bill Vesely’s subjective probabilities

E1 = T (K2 (S & (S1 (K1 R))))

Component

Pressure tank T 

Relay K2 

Pressure switch S 

Relay K1 

Timer relay R 

On-switch S1 

Probability

5 × 10−6

3 × 10−5

1 × 10−4

3 × 10−5

1 × 10−4

3 × 10−5

47 of 170

Different dependency models

Vesely et al. (all variables independent)

E1 = T || (K2 || (S |&| (S1 || (K1 || R))))

Mixed dependencies

E1 = T || (K2 (S &r (S1 || (K1 // R))))

Correlated to no assumption about dependence

E1 = T || (K2 (S & (S1 || (K1 // R))))

No assumptions about any dependencies

E1 = T (K2 (S & (S1 (K1 R))))

48 of 170

Results

10−5

10−4

10−3

Probability of tank rupturing under pumping

Mixed dependencies

Relaxing S–(S1,K1,R) dependence

No dependence assumptions

All variables independent

3.5×10−5

[3.499×10−5, 3.504×10−5]

[3.50×10−5, 1.35×10−4]

[3×10−5, 1.4×10−4]

  • Independence assumption seems to be non-conservative
  • Some relaxations make a big difference, some don’t (have to do the analysis)
  • Even making no assumptions at all doesn’t make the result vacuous

49 of 170

Can’t always assume independence

  • If you know the mechanism of dependence, you can model it

  • Or, should reproduce the statistical patterns

  • If dependence unknown, can’t use either way

  • Even small correlations can have a big effect on convolutions

50 of 170

Kinds of uncertainty

51 of 170

Sources of uncertainties

  • Thermal noise (Johnson−Nyquist) electronics
  • Poisson counting noise (integrality) disintegrations
  • Sampling uncertainty (some of a population)
  • Unmodelled factors (unknown unknown) seismicity
  • Quantum mechanics (God does play dice) Schrödinger
  • Measurement uncertainty (plus-or-minus) ½ or 1 unit

  • Etymologically, random means unpredictable

52 of 170

Two kinds of uncertainty

  • Variability
  • Aleatory uncertainty
  • Type A uncertainty
  • Stochasticity
  • Randomness
  • Chance
  • Risk
  • Incertitude
  • Epistemic uncertainty
  • Type B uncertainty
  • Ambiguity
  • Ignorance
  • Imprecision
  • True uncertainty

Monte Carlo methods

and probability theory

Interval arithmetic and

constraint analysis

They can change in different contexts. Distinguishing them may be subtle, sometimes like dividing ice from snow. But it’s essential to do so, as getting hit with a snowball is different from getting hit with a block of ice.

Photo credit: Jason Hollinger, https://commons.wikimedia.org/wiki/File:Feathery_Snow_Crystals_(2217830221).jpg, behind translucent filter

53 of 170

Variability = aleatory uncertainty

  • Arises from natural stochasticity

  • Variability arises from
    • spatial variation
    • temporal fluctuations
    • manufacturing or genetic differences

  • Not reducible by empirical effort

54 of 170

Incertitude = epistemic uncertainty

  • Arises from incomplete knowledge

  • Incertitude arises from
    • limited sample size
    • mensurational limits (‘measurement uncertainty’)
    • use of surrogate data

  • Reducible with empirical effort

55 of 170

Propagating incertitude

Suppose

A is in [2, 4]

B is in [3, 5]

What can be said about the sum A+B?

4

6

8

10

The right answer for

risk analysis is [5,9]

56 of 170

Must be treated differently

  • Variability should be modeled as randomness with probability theory

  • Incertitude should be modeled as ignorance with methods of interval or constraint analysis

  • Imprecise probabilities or probability bounding can do both at once

57 of 170

Propagation methods

  • Analytical (delta method, Laplace transforms)

  • Monte Carlo simulation

  • Probability bounding (imprecise probabilities)

58 of 170

��Relaxing all assumptions�� Laplace Dependence Shape

4

5

6

7

8

9

10

0

0.5

1

Cumulative distribution of Laplace’s unreasonable triangular density function

Any CDF inside this region is possible…same as interval

This contains any CDF arising from dependent uniforms

59 of 170

Aleatory or epistemic?

“These concepts only make unambiguous sense if they are defined with the confines of a model of analysis. In one model an addressed uncertainty may be aleatory, in another model it may be epistemic. So, the characterization of uncertainty becomes a pragmatic choice dependent on the purpose of the application.” (Der Kiureghian and Ditlevsen 2009)

60 of 170

“The probability of a component failure is 0.01”

  • An aleatoric interpretation
    • On average, 1 in every 100 components of this type will fail

  • An epistemic interpretation
    • There is 0.01 probability that all components of this type will fail

  • Very different characterizations of an uncertainty statement

  • Once understood, proper analysis is straightforward

(Jon Helton 2009)

61 of 170

Getting the inputs

62 of 170

Expert elicitation

Maximum likelihood

Bayesian inference

PERT

Method of moments

Maximum entropy

Estimation

methods

the great frontier of making things up

A two-front assault

63 of 170

How to specify input distributions

  • Default distribution
  • PERT
  • Expert opinion
  • Maximum entropy
  • Fitted distribution (goodness of fit)
    • Method of matching moments
    • Maximum likelihood
  • Bayesian analysis
  • Empirical distribution

No data

Lots of data

64 of 170

Default distributions

  • Physicists overuse normal distributions
  • Risk analysts overuse uniform distributions
  • Reliability engineers overuse Weibulls
  • Ecologists overuse lognormals
  • Statisticians overuse beta distributions
  • Food safety overuses triangular distributions

Cooper’s quip

65 of 170

Arguments (frequently abused)

  • Sum of many independent comparable terms (normal)
  • Product of many independent factors (lognormal)
  • Largest of many values (Gumbel, Weibull, Fréchet)
  • Lifetimes with neither aging nor burn-in (exponential)
  • Symmetry (uniform)
  • Rare discrete events (Poisson)
  • Independent trials (binomial)

66 of 170

Engineers cannot always get data

  • New systems may have no performance history
    • spacecraft of new design or in a new environment
    • biology using novel genetic constructs that have never existed before
    • components so well constructed that there’s no failure data

  • Three ways to estimate probabilities without data
    • disaggregation into parts whose probabilities are easier to estimate (i.e., breaking it into subproblems)
    • expert elicitation (i.e., guessing)
    • modern statistical estimation

67 of 170

Rare events

  • Often the driving concern in analyses
  • Typically big consequences
  • Hardly ever characterized by good data
  • Perhaps never seen, or seen only once
  • How can we estimate the probability of a rare event?

68 of 170

Random sample data

  • Maximum likelihood says p is zero for never-seen events
    • Nobody believes this is a reasonable estimate
  • Bayesian estimator is more reasonable…

Just kidding

69 of 170

0

1

0

1

2

Probability density

0

0.002

0.004

0.006

0.008

0.01

0

1

p

Cumulative probability

Zellner

Bayes-Laplace

Jeffreys

Haldane

Means

0.001 Haldane

0.0015 Jeffreys

0.001986 Zellner

0.001996 Bayes-Laplace

Data is “once out of 1000”

Data is “zero out of 1000”

Means

0 Haldane

0.0005 Jeffreys

0.000942 Zellner

0.000998 Bayes-Laplace

0.000

0.002

0.004

0.006

0.008

0.010

0.0

0.2

0.4

0.6

0.8

1.0

p

Cumulative Probability

70 of 170

Random sample data

  • Maximum likelihood says p is zero for never-seen events
    • Nobody believes this is a reasonable estimate
  • The Bayesian estimator is many things
    • ‘Reasonable’ only in that it’s whatever you want
  • Modern estimators
    • Imprecise beta (Dirichlet) models
    • Confidence structures

71 of 170

Confidence structure (c-box)

  • Estimator of a (fixed) parameter
  • Gives confidence intervals for the parameter at any confidence level

Parameter value

Probability

0

1

α

β

(α−β)100% confidence interval

θ

72 of 170

Confidence boxes

  • Structures that infer confidence intervals, at any level, for a parameter

  • Can be propagated just like p-boxes

  • Allow us to compute with confidence

  • Generalize confidence distribution (e.g. Student t)

  • Different from confidence bandssuch as Kolmogorov-Smirnov or distributional bands

73 of 170

Estimators

  • Point estimates (e.g., sample mean)

  • Interval estimates (e.g., confidence intervals)

  • Distributional estimates (Bayesian posteriors)

  • P-box-shaped estimates (e.g., c-boxes)

74 of 170

Probability of rare event

  • Inference about a probability from binary data, k successes out of n trials

p ~ [beta(k, n−k+1), beta(k+1, n−k)]

notation extends the use of tilda

Data

k = 2

n = 10

beta(2, 9)

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

beta(3, 8)

75 of 170

Binomial rate

0

.4

.8

0

.2

.4

.6

.8

1

1

.2

.6

p

0/1

1/1

0/0

0/2

1/2

2/2

0/3

1/3

2/3

3/3

0/20

15/20

20/20

0/200

150/200

200/200

0/2000

1500/2000

2000/2000

Notice that the c-boxes in every row partition the vacuous ‘dunno’ interval

76 of 170

Zero out of 10k trials

0 8×10k

“zero corner”

77 of 170

One out of 10k trials

0 101−k

“once corner”

78 of 170

Computing with c-boxes

0

0.2

0.4

0.6

0.8

1

0

1

0

0.2

0.4

0.6

0.8

1

0

1

0

0.2

0.4

0.6

0.8

1

0

1

Plan A

249 out of 1,014 failed

Plan B

39 out of 60 failed

Plan C

17 out of 20 failed

What if we tried all three plans independently?

79 of 170

Conjunction (AND)

a = balchbox(100,25)

b = balchbox(60,39)

c = balchbox(20,17)

ABC = a %&% b %&% c

abc = a %|&|% b %|&|% c

0

0.2

0.4

0.6

0.8

1

0

1

Probability that all three plans fail

Has the confidence performance!

Probability

80 of 170

Vesely’s pressurized tank

on-off switch S1 relay K1 timer relay R

0

0

0

0

0.06

0.03

0.006

0

0

0

0

0.015

0.002

0.03

0/2000 3/500 1/150

0/460 7/10000 0/380

tank T relay K2 pressure switch S

81 of 170

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0

1

Top event, tank rupture under pressurization, assuming independence

Has the performance interpretation

82 of 170

Tight bounds or fat bounds

  • When the bounds are tight, and far from a threshold of concern, the analysis gives confidence about the conclusion

  • When the bounds straddle a threshold of concern, the analysis is saying you don’t have enough information to answer that question

83 of 170

R script

# Vesely's pressurized tank system from sparse sample data assuming independence�many = 10000�constant <- function(b) if (length(b)==1) TRUE else FALSE�precise <- function(b) if (length(b)==many) TRUE else FALSE�leftside <- function(b) if (precise(b)) return(b) else return(b[1:many])�rightside <- function(b) if (precise(b)) return(b) else return(b[(many+1):(2*many)])�sampleindices = function() round(runif(many)*(many-1) + 1)�pairsides <- function(b) {i=sampleindices(); return(env(leftside(b)[i],rightside(b)[i]))}�env <- function(x,y) if ((precise(x) && precise(y))) c(x,y) else stop('env error’)�beta <- function(v,w) if ((v==0) && (w==0)) env(rep(0,many),rep(1,many)) else

if (v==0) rep(0,many) else if (w==0) rep(1,many) else sort(rbeta(many, v, w))�kn <- function(k,n) return(pairsides(env(beta(k, n-k+1), beta(k+1, n-k))))  ��orI <- function(x,y) return(1-(1-x)*(1-y))�andI <- function(x,y) return(x*y)��t = kn(0, 2000)�k2 = kn(3, 500)�s = kn(1, 150)�s1 = kn(0, 460)�k1 = kn(7, 10000)�r = kn(0, 380)��e1 = orI(t, orI(k2, andI(s, orI(s1, orI(k1, r)))))�

�plotbox <- function(b,new=TRUE,col='blue',lwd=2,xlim=range(b[is.finite(b)]),ylim=c(0,1),xlab='',ylab='Probability',...) {�  edf <- function (x, col, lwd, ...) {�      n <- length(x)�      s <- sort(x)�      if (2000<n) {s = c(s[1],s[1:(n/5)*5],s[n]); n <- length(s);} # subsetting for the plot to every 5th value�      lines(c(s[[1]],s[[1]]),c(0,1/n),lwd=lwd,col=col,...)�      for (i in 2:n) lines(c(s[[i-1]],s[[i]],s[[i]]),c(i-1,i-1,i)/n,col=col,lwd=lwd,...)�      }�  b = ifelse(b==-Inf, xlim[1] - 10, b)�  b = ifelse(b==Inf, xlim[2] + 10, b)�  if (new) plot(NULL, xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab)�  if (length(b) < many) edf(b,col,lwd) else�  edf(c(min(b),max(b),b[1:min(length(b),many)]),col,lwd)�  if (many < length(b)) edf(c(min(b),max(b),b[(many+1):length(b)]),col,lwd)�  }��par(mfrow=c(2,4))�plotbox(t); title('tank T')�plotbox(k2); title('relay K2')�plotbox(s); title('pressure switch S')�plotbox(s1); title('on-off switch S1')�plotbox(k1); title('relay K1')�plotbox(r); title('timer relay R')�plotbox(e1); title('top event E1')

# Vesely's pressurized tank system from sparse sample data assuming independence

many = 10000

constant = function(b) if (length(b)==1) TRUE else FALSE

precise = function(b) if (length(b)==many) TRUE else FALSE

leftside = function(b) if (precise(b)) return(b) else return(b[1:many])

rightside = function(b) if (precise(b)) return(b) else return(b[(many+1):(2*many)])

sampleindices = function() round(runif(many)*(many-1) + 1)

pairsides = function(b) {i=sampleindices(); return(env(leftside(b)[i],rightside(b)[i]))}

env = function(x,y) if ((precise(x) && precise(y))) c(x,y) else stop('env error')

plotbox = function(x, xlab='', ylab='Cumulative probability', xlim=range(x), ylim=c(0,1), ...) {

plot(NULL,NULL,xlab=xlab,ylab=ylab,xlim=xlim,ylim=ylim, ...);

lines(ecdf(x[(many+1):(2*many)]),...); lines(ecdf(x[1:many]),...) }

beta = function(v,w) if ((v==0) && (w==0)) env(rep(0,many),rep(1,many)) else

if (v==0) rep(0,many) else if (w==0) rep(1,many) else sort(rbeta(many, v, w))

kn = function(k,n) return(pairsides(env(beta(k, n-k+1), beta(k+1, n-k))))  

orI = function(x,y) return(1-(1-x)*(1-y))

andI = function(x,y) return(x*y)

t = kn(0, 2000); k2 = kn(3, 500)

s = kn(1, 150); s1 = kn(0, 460)

k1 = kn(7, 10000); r = kn(0, 380)

e1 = orI(t, orI(k2, andI(s, orI(s1, orI(k1, r)))))

plotbox(e1)

84 of 170

Experts opinions on rare event probabilities

  • Hardly ever any actual data
  • Sometimes we have experts
  • But how should we model bald assertions:

    • “1 in 10 million”

    • “about 1 in 1000”

    • “never been seen in 100 years”

85 of 170

In my experience

People using expert elicitation often take the pronouncements made by their informants exactly as they are specified

    • Sometimes even as point values

(If your experience is different, I’d like to chat)

86 of 170

Is there no uncertainty?

  • What is the uncertainty in estimates like
    • “1 in 107”,
    • “about 1 in 1000”, or
    • “never been seen in over 100 years of observation”?

  • How should this uncertainty be captured and projected in computations?

  • We have suggested a tentative strategy based on c-boxes

87 of 170

Risk communication

88 of 170

0

.4

.8

0

.2

.4

.6

.8

1

1

.2

.6

p

0/1

1/1

0/0

0/2

1/2

2/2

0/3

1/3

2/3

3/3

0/20

15/20

20/20

0/200

150/200

200/200

0/2000

1500/2000

2000/2000

89 of 170

Confidence

0

1

Probability

0

1

Express risk as “k out of n

Risk

k-n c-box

Match a calculated risk to a c-box

90 of 170

Equivalent binomial count

  • An imprecisely computed risk can be expressed as a p-box over [0,1]

  • Transformed it into a natural language statement of the form “ k out of n

  • These are natural frequencies
    • Large uncertainties imply small denominators
    • Or even interval numerators if very large

  • People can understand them, possibly better than graphical depictions

91 of 170

Confidence

0

1

Confidence

0

1

Probability

0

1

Probability

0

1

92 of 170

Amazon Mechanical Turk

  • Check preference for sunglasses rated by buyers using various schemes
    • More frequent ‘excellent’ ratings should be better
    • Larger pool of buyers rating should be more reliable

  • Testing whether
    • Turkers can make rational choices
    • Natural frequencies are as good as or better than percentages
    • Larger denominators convey more reliability
    • Interval numerators can be understood

93 of 170

Pair A was rated excellent by 10 out of 100 customers.

Pair B was rated excellent by 80 out of 100 customers.

Pair A was rated excellent by 66 out of 198 customers.

Pair B was rated excellent by 2 out of 6 customers.

Pair A was rated excellent by 66 to 88 out of 198 customers.

Pair B was rated excellent by 33 out of 100 customers.

50% of customers rated Pair A as excellent.

Pair B was rated excellent by

50 out of 100 customers.

50% of customers rated Pair B as excellent.

Pair A was rated excellent by 2 out of 4 customers.

94 of 170

Findings

80% 50% rational

10/100 80/100 same sample size

66/198 2/6 same magnitude

[66,88]/198 33/100 even with ambiguity

50% 50/100 prefer natural frequency

2/4 50% unless very uncertain

Sample sizes ~300 “master turkers”

95 of 170

Bayesian networks

96 of 170

Bayesian networks (Bayes nets)

  • Probabilistic graphical models used to study the conditional dependencies of components (random variables) that make up a system

Slide thanks to Diego Hector Estrada Lugo

97 of 170

Bayes nets

  • Intuitive and relatively easy to implement
  • By introducing evidence you update outcomes
  • Causal (as opposed to evidential) reasoning
  • Needs lots of information about dependence among variables (conditional distributions)
  • When you don’t have empirical information, you need assumptions

Slide thanks to Diego Hector Estrada Lugo

98 of 170

Many applications

Slide thanks to Diego Hector Estrada Lugo

99 of 170

Limited applicability

  • But only practical for Gaussian distributions or discrete distributions

Probability

Slide thanks to Diego Hector Estrada Lugo

100 of 170

Backcalculation

101 of 170

Engineering design under uncertainty

  • Traditional design freezes requirements early
    • restricts future developments
    • specifies functions and performance levels

  • Often does not fully address uncertainty
    • assumptions may be incorrect
    • variation or unpredictability of conditions
    • entirely unknown facts that might be germane

  • Performance is often affected by uncertainties

101

102 of 170

A typical problem

  • How can we design a radiation shielding system if we can’t specify the radiation distribution?

  • Could plan with worst case analysis
    • Often wasteful
    • Can’t account for rare, even worse extremes

  • Could pretend we know the distribution
    • Unreasonable for new designs or environments

103 of 170

Backcalculation

  • Untangles an equation in uncertain numbers to satisfy a constraint

  • Finds B such that A + B = C, from estimates for A and constraints on C

  • Engineering design under uncertainty often needs backcalculation

(⊆ means “is inside of” or “is a subset of”)

Backcalculation finds B such that A+B C, from estimates for A and constraints on C

104 of 170

Planning a remediation program

  • Suppose we have constraints (from toxicity studies) on the doses receptors can tolerate of some contaminant

  • Dose is the concentration of the contaminant in water times the intake of water by the receptor

  • How clean must the remediation program make the water in order to ensure those constraints aren’t violated?

105 of 170

Can’t just solve for concentration

When Concentration is put back into the forward equation, the resulting Dose is wider than planned

prescribed

known

unknown

forward equation

inverted equation

Dose = Concentration × Intake

Concentration = Dose / Intake

106 of 170

Can’t just solve for concentration

When Radiation is put back into the forward equation, the resulting Dose is wider than planned

prescribed

known

unknown

forward equation

inverted equation

Dose = Radiation × Exposure

Radiation = Dose / Exposure

107 of 170

Example

Suppose dose must be less than 3 milligrams

dose = [0, 3] mg

intake = [0.5, 4] liter

conc = dose / intake

[ 0, 6] mg liter-1

dose = conc * intake

[ 0, 24] mg

eight times larger than planned tolerable levels

108 of 170

Untangling backcalculations

  • Solving for B given A + B = C

B = backcalc(A, C) = [C1A1, C2A2]

  • Solving for B given A × B = C

B = factor(A, C) = [C1 / A1, C2 / A2]

  • Sometimes called “tolerance solutions”

109 of 170

Correct solution

Dose must be less than 3 milligrams

dose = [0, 3] mg

intake = [0.5, 4] liter

conc = factor(intake, dose) = [0/0.5, 3/4] mg liter-1

[ 0, 0.75] mg liter-1

Check:

dose = conc * intake

[ 0, 3] mg

110 of 170

Doesn’t work for distributions either

Dose = Concentration × Intake / Bodymass

Concentration = Dose × Bodymass / Intake

When concentration is put back into the forward equation, the resulting dose is wider than planned

prescribed

known

unknown

111 of 170

Planned dose

Body mass

Intake

Concentration

0

2

4

6

8

0.06

0.04

0.02

0.00

0

2

4

6

8

0.4

0.2

0.0

0

2

4

6

8

0.06

0.04

0.02

0.00

0

2

4

6

8

0.03

0.02

0.01

0.00

Concentration = Dose × Bodymass / Intake

112 of 170

Large doses in the realized distribution (arising from the allowed distribution of concentrations) are more common than were originally planned

0

2

4

6

8

0.06

0.05

0.04

0.03

0.02

0.01

0

Dose

planned

realized

113 of 170

Normal approximation

  • If A+B=C, compute B as C−A under the assumption the correlation between A and C is r = sd(A)/sd(C)

To simulate normal deviates with correlation r, compute

Y1 = Z1 × σ1 + μ1

Y2 = (rZ1 + Z2 √(1−r2)) × σ2 + μ2

where Z1 and Z2 are independent standard normal deviates, and μ and σ denote the respective desired means and standard deviations

  • Uses Pearson correlation (not rank correlation)
  • Good for multivariate normal, and maybe more

114 of 170

Normal approximation with non-normal distributions

115 of 170

Iterative (trial & error) approach

  • Initialize B with C−A
  • This distribution is too wide
  • Transform density p(x) to p(x)m
  • Rescale so that area remains one
  • Whenever m > 1 dispersion decreases
  • Vary m until you get an acceptable fit

116 of 170

By trial and error, you may be able to find a distribution of concentrations that yields something close to the planned distribution of doses. This is not always possible however.

0.06

0

2

4

6

8

0.05

0.04

0.03

0.02

0.01

0

Dose

117 of 170

Backcalculation with p-boxes

Suppose A + B = C, where

A = normal(5, 1)

C = {0 ≤ C, median ≤ 1.5, 90th %ile ≤ 35, max ≤ 50}

2

3

4

5

6

7

8

0

1

A

0

10

20

30

40

50

60

0

1

C

118 of 170

Getting the answer

  • The backcalculation algorithm basically reverses the forward convolution

  • Any distribution totally inside B is sure to satisfy the constraint

-10

0

10

20

30

40

50

0

1

B

Algorithm for additive backcalculation of the kernel for B from the equation C = A + B, where the p-boxes for A and the constraint C are known

 N = number of percentiles used in the representation; {e.g. 100}

{algorithm below finds the left side of the p-box for the distribution of B}

left limit on B at lowest percentile = left limit on C at lowest percentile – left limit on A at lowest percentile;

for i = 1 to N do begin

set flag to "not done";

for j = 0 to i–1 do begin

if (left limit on C at ith percentile) £ (left limit on A at [i–j]th percentile) + (left limit on B at the jth percentile)

then set flag to "done";

end; {if}

if flag is "done" then

left limit on B at ith percentile = left limit on B at [i–1]th percentile {the one right below it}

else left limit on B at ith percentile = left limit on C at ith percentile – left limit on A at 0th percentile;

end; {if}

end; {for j}

end; {for i, left bound}

{algorithm below finds the right side of the p-box for the distribution of B}

right limit on B at highest percentile = right limit on C at highest percentile – right limit on A at highest percentile;

for i = N–1 down to 0 do begin

set flag to "not done";

for j = N down to i+1 do begin

if (right limit on C at ith percentile) ³ (right limit on A at [i–j+N]th percentile) + (right limit on B at the jth percentile)

then set flag to "done";

end; {if}

if flag is "done" then

right limit on B at ith percentile = right limit on B at [i+1]th percentile {the one right above it}

else right limit on B at ith percentile = right limit on C at ith percentile – right limit on A at Nth percentile;

end; {if}

end; {for j}

end; {for i, right bound} end. {algorithm}

119 of 170

Check it by plugging it back in

A + B = C* ⊆ C

-10

0

10

20

30

40

50

60

0

1

C*

C

120 of 170

P-boxes

  • Natural compromise that can express both
    • Gross uncertainty like intervals and worst cases
    • Distributional information about tail risks

  • Needed to solve equations containing uncertain numbers
    • Uncertainty propagation
    • Constraint propagation
    • Backcalculation

121 of 170

NASA shielding example

  • Medical team says
    • crew must not get doses larger than 2000 grays
    • 95% of doses should be less than 750 grays
    • most doses should be less than 500 grays

  • Radiation distribution is uniform
    • minimum in [0.01, 0.05] rads per second
    • maximum in [0.08, 0.1] rads per second

  • Mission could last 30 to 200 hours

How are these constraints expressed with p-boxes?

122 of 170

P-boxes express these constraints

D [Gy]

R

[

s

-

1

Gy

]

0

300000

600000

T [s]

0

1000

2000

3000

0

1

0.0001

0.0011

What are the ordinate axes here?

D

R

T

Exceedance risk

123 of 170

Can’t just invert the equation

Dose = Radiation ×Time / Shielding

Shielding = Radiation × Time / Dose

When Shielding is put into the forward equation, the resulting Dose is wider than planned

prescribed constraint

unknown

known

forward equation

inverted equation

124 of 170

Wrong calculation

S = R × T / D

Tail doses more than 50 times larger than planned!

x

D = R × T / S

0

100000

200000

0

1

D

[

Gy

]

0

100000

200000

0

1

D

[

Gy

]

Exceedance risk

0

0.1

8

0

1

S

0

0.1

8

0

1

S

125 of 170

Correct way uses backcalculation

D = R × T / S

Wait, how could shielding vary?

The all-but-95% range of this p-box answer for S is about [0.95, infinity] (which is a lot thicker than the interval answer [0.72, infinity]) because the all-but-95% range of the allowed dose in this example was only 750 Gy, rather than 1000 Gy.

0

1000

2000

3000

0

1

D

[

Gy

]

0

1000

2000

3000

0

1

D

[

Gy

]

S = 1 / factor(R × T, D)

Exceedance risk

0

1

0

1

8

S

0

1

0

1

8

S

126 of 170

P-boxes better than worst case

  • More realistic
    • Need not assume the worst case surely happens
    • Can account for rarity of extreme events

  • More expressive
    • Can distinguish between possible events
    • Captures complex tolerances (higher doses if infrequent)
    • Can consider more extreme events than would else be feasible

  • Better results
    • Cheaper constraints on shielding
    • Safer outcomes for the crew

127 of 170

Precise distributions don’t work

  • Precise distributions can’t express the target

  • Prescribing a precise distribution of doses says we want some high doses

  • Any distribution to the left would be better

  • A p-box on the dose target expresses this idea

128 of 170

Backcalculation algebra

  • Can define untanglings for all basic operations

e.g., if A × B = C, then B = factor(A, C) = exp(backcalc(ln A, ln C))

  • Can chain them together for big problems

  • Assuming independence widens the result

  • Repeated terms need special strategies

129 of 170

When you

need for

And you have

estimates for

Use this formula

to find the unknown

A + B C

A, C

B ,C

B = backcalc(A,C)

A = backcalc(B,C)

A – B C

A, C

B ,C

B = –backcalc(A,C)

A = backcalc(–B,C)

A × B C

A, C

B ,C

B = factor(A,C)

A = factor(B,C)

A / B C

A, C

B ,C

B = 1/factor(A,C)

A = factor(1/B,C)

A ^ B C

A, C

B ,C

B = factor(log A, log C)

A = exp(factor(B, log C))

to be inside

130 of 170

Conclusion

  • Planning cleanup requires backcalculation

  • Monte Carlo methods don’t generally work except in a trial-and-error approach

  • Should express the dose target as a p-box

131 of 170

Conclusions

132 of 170

Decision making under risk and uncertainty

  • Estimate risks of bad things happening

  • Fashion more sustainable & more resilient designs and processes

  • Minimise adverse impacts, unintended effects

  • Plan mitigation, remediation or restitution

133 of 170

Don’t drink the Kool-Aid 1978

  • Modelling is dangerous because we sometimes believe our predictions

  • “We have to build the best models we possibly can and then not trust them.” Robert Costanza
  • Risk analysis helps us do that systematically

  • It also ensures we’re realistic about what is knowable and predictable

Don’t take the blue pill 1999

Be skeptical

Be sceptical

134 of 170

There’s a lot more

  • Uncertainty quantification (estimation & modelling)
  • Dependencies, common-cause, common-mode failures
  • Model-form uncertainty
  • Numerical uncertainty
  • Acceleration methods (line sampling, Latin hypercube)
  • Psychology of uncertainty & risk communication
  • Backcalculation & design
  • Validation
  • Sensitivity analysis
  • Decision analysis

Complain if they shortchange you

135 of 170

Kinds of calculations

  • Chance that stress exceeds strength
  • Probability of a failure or some scary event
  • How much can a quantity vary
  • How much could it vary and still be safe
  • How much must it vary to ensure safety
  • Where to focus empirical efforts
  • How best to control the outcome

Answers can be very different with variability/uncertainty

136 of 170

Better safety assessments and reliability analyses

  • Account for risks and uncertainties in systems and reliability analysis
  • Should replace average values with distributions and dependencies

component failure rates, restoration times, performance indices

Base case Uncertainty

(no uncertainty) considered

Probability of failure 0.009 0.0072

Frequency of failure 3.284 1.6660

Duration of failure 76.799 63.465

Edimu, M., C.T. Gaunt and R. Herman (2011). Using probability distribution functions in reliability analyses. Electric Power Systems Research 81 (4): 915−921. https://doi.org/10.1016/j.epsr.2010.11.022 [Table 6]

137 of 170

Steps in a probabilistic risk analysis

  • Clarify the questions and gather data
  • Formulate the model (identify variables and their interrelationships)
  • Specify distributions for each input
  • Specify dependencies and correlations
  • Run simulation
  • Conduct sensitivity studies
  • Present results
  • Discuss limitations of the assessment

138 of 170

Limitations of Monte Carlo

  • Needs precise, stationary distributions
  • Nonlinear dependencies often ignored
  • Unknown dependencies incorrectly modeled
  • Assumes model is well understood
  • Can be computationally expensive
  • Backcalculations difficult
  • Treats incertitude like variability
  • Thought by some to be too difficult

139 of 170

Common pitfalls of Monte Carlo

  • Muddled model
  • Infeasible correlation matrices
  • Inconsistent independence assumptions
  • Multiply instantiated variables
  • Too few replications
  • No sensitivity studies
  • Confounding of different populations

140 of 170

Monte Carlo simulation

How?

    • replace each point estimate with a probability distribution
    • repeatedly sample from each, tally answers in a histogram

Why?

    • simple to implement
    • fairly simple to explain
    • summarizes entire distribution of risk

Why not?

    • requires a great deal of empirical information
    • usually need to guess some things
    • only appropriate if uncertainty is statistical
    • routine assumptions can lead to non-protective conclusions

141 of 170

Probability bounds analysis

How?

    • specify what you are sure about
    • establish bounds on probability distributions
    • pick dependencies (no assumption, independence, perfect dependence, etc.)

Why?

    • account for uncertainty better than maximum entropy, etc.
    • puts bounds on Monte Carlo results
    • bounds get narrower with better empirical information

Why not?

    • does not yield second-order probabilities
    • best-possible results can sometimes be expensive to compute

142 of 170

Some of the available software tools

  • Crystal Ball
  • Sapphire
  • Cossan
  • Dakota
  • RAMAS Risk Calc

143 of 170

More information from Risk Institute

144 of 170

Questions?

145 of 170

NAFEMS challenge

  • NAFEMS I is the international association for the engineering modelling, analysis and simulation community

  • For the links to the challenge, see the GREEN CDT in Nuclear Energy’s risk website https://sites.google.com/view/greenrisk

�NAFEMS is the International Association for the Engineering Modelling, Analysis and Simulation Community

https://en.wikipedia.org/wiki/Simulation_governance

Simulation governance is a managerial function concerned with assurance of reliability of information generated by numerical simulation. The term was introduced in 2011 [1] and specific technical requirements were addressed from the perspective of mechanical design in 2012.[2] Its strategic importance was addressed in 2015.[3][4] At the 2017 NAFEMS World Congress in Stockholm simulation governance was identified as the first of eight “big issues” in numerical simulation.

146 of 170

147 of 170

Woodpile

148 of 170

Traditional practice

  • Most assessments are deterministic

  • Some are deliberately conservative, but

    • degree of conservatism is opaque, unquantified, and can be inconsistent

    • difficult to characterize risks, except in extreme situations

149 of 170

Traditional uncertainty analyses

  • Worst case analysis

  • Taylor series approximations (delta method)

  • Normal theory propagation (NIST; mean value method)

  • Monte Carlo simulation

  • Stochastic PDEs

  • Two-dimensional Monte Carlo

150 of 170

Untenable assumptions

  • Uncertainties are small

  • Distribution shapes are known

  • Sources of variation are independent

  • Uncertainties cancel each other out

  • Linearized models good enough

  • Relevant science is known and modeled

151 of 170

Need ways to relax assumptions

  • Hard to say what the distribution is precisely

  • Non-independent, or unknown dependencies

  • Uncertainties that may not cancel

  • Possibly large uncertainties

  • Model uncertainty

152 of 170

Incertitude in data measurements

  • Periodic observations

When did the fish in my aquarium die during the night?

  • Plus-or-minus measurement uncertainties

Coarse measurements, measurements from digital readouts

  • Non-detects and data censoring

Chemical detection limits, studies prematurely terminated

  • Privacy requirements

Epidemiological or medical information, census data

  • Theoretical constraints

Concentrations, solubilities, probabilities, survival rates

  • Bounding studies

Presumed or hypothetical limits in what-if calculations

153 of 170

Simmer down…

  • I’m not saying we should just use intervals
  • Not all uncertainty is incertitude
  • Maybe most uncertainty isn’t incertitude
  • Maybe incertitude is entirely negligible
  • If so, probability theory is perfectly sufficient

We didn’t stop using Euclidean geometry when they invented non-Euclidean geometry

154 of 170

What I am saying is…

  • Some analysts face non-negligible incertitude

  • Handling it with standard probability theory requires assumptions that may not be tenable
    • Uniformity, Unbiasedness, Independence

  • Useful to know what difference it might make

  • Can be discovered by bounding probabilities

155 of 170

Bounding probability is an old idea

  • Boole and de Morgan
  • Chebyshev and Markov
  • Borel and Fréchet
  • Kolmogorov and Keynes
  • Dempster and Ellsberg
  • Berger and Walley

156 of 170

Several closely related ideas

  • Probability bounds analysis

  • Imprecise probabilities

  • Robust Bayesian analysis

  • Second-order probability

  • Hierarchical Bayes

Bounding

approaches

  • Probability bounds analysis (PBA)
    • A rough and ready technology

  • Imprecise probabilities
    • More comprehensive but also more difficult

  • Robust Bayesian analysis
    • Handles updating rather than convolutions

  • Second-order probability
    • More difficult and less comprehensive

157 of 170

  • Verified computation of probabilities

  • Sensitivity analysis

  • Non-Laplacian uncertainty

158 of 170

Euclid

  • Given a line in a plane, how many parallel lines can be drawn through a point not on the line?

  • For over 20 centuries, the answer was ‘one’

159 of 170

Relax one axiom

  • Around 1850, Riemann and others developed non-Euclidean gemetries in which the answer was either ‘zero’ or ‘many’
  • Controversial, but eventually accepted
  • Mathematics richer and applications expanded
  • Used by Einstein in general relativity

160 of 170

Relax one axiom

  • IP satisfies all the Kolmogorov axioms

  • Avoid sure loss and are fully ‘rational’ in the Bayesian sense (except that agents are not forced to either buy or sell any offered gamble)

  • Relaxes one decision theory axiom: assumption that we can always compare any two gambles

Richer mathematics, wider array of applications

161 of 170

2

4

6

8

10

0

1

-10

0

10

20

0

1

0

10

20

30

40

50

0

1

2

4

6

8

10

0

1

2

4

6

8

10

0

1

2

4

6

8

10

0

1

0

10

20

30

40

50

0

1

2

4

6

8

10

0

1

2

4

6

8

10

0

1

min,

max,

mode

mean, std

range, quantile

min,

max

min,

max, mean

min, mean

Comparing p-boxes with maximum entropy distributions

sample data, range

min,max, mean,std

p-box

Maximum entropy solutions

When you know Use this shape

{minimum, maximum} uniform

{mean, standard deviation} normal

{minimum, maximum, mode} beta

{minimum, mean} exponential

{minimum, maximum, mean} beta

{min, max, mean, stddev} beta

{min, max, some quantiles} piecewise uniform

{mean, geometric mean} gamma

162 of 170

Comparing IP to Bayesian approach

  • “Uncertainty of probability” is meaningful

  • Operationalized as the difference between the max buying price and min selling price

  • If you know all the probabilities (and utilities) perfectly, then IP reduces to Bayes
  • Axioms identical except IP doesn’t use completeness

  • Bayesian rationality implies not only avoidance of sure loss & coherence, but also the idea that an agent must agree to buy or sell any bet at one price

163 of 170

Bayes fails incertitude

  • Traditional probability theory doesn’t account for gross uncertainty correctly

  • Output precision depends strongly on the number inputs and not so much on their shapes

  • The more inputs, the tighter the answer

164 of 170

A few grossly uncertain inputs

Uniform

Uniform

Uniform

Uniform

Probability

165 of 170

A lot of grossly uncertain inputs...

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Uniform

Where does this surety come from?

What justifies it?

Probability

166 of 170

“Smoke and mirrors” certainty

  • Probability makes certainty out of nothing

  • It has an inadequate model of ignorance

  • Probability bounds analysis gives a vacuous answer if all you give it are vacuous inputs

167 of 170

Wishful thinking

Risk analysts often make assumptions that are convenient but are not really justified:

  1. All variables are independent of one another
  2. Uniform distributions model incertitude
  3. Distributions are stationary (unchanging)
  4. Specifications are perfectly precise
  5. Measurement uncertainty is negligible

168 of 170

You don’t have to do this

Removing wishful thinking creates a p-box

  1. Don’t have to assume any dependence at all
  2. An interval can be a better model of incertitude
  3. P-boxes can enclose non-stationary distributions
  4. Can handle imprecise specifications
  5. Measurement data with plus-minus, censoring

169 of 170

Uncertainties expressible with p-boxes

  • Sampling uncertainty
    • Confidence bands, confidence structures

  • Measurement incertitude
    • Intervals

  • Uncertainty about distribution shape
    • Constraints (non-parametric p-boxes)

  • Surrogacy uncertainty
    • Modeling

170 of 170

Can uncertainty swamp the answer?

  • Sure, if uncertainty is huge

  • This should happen (it’s not “unhelpful”)

  • If you think the bounds are too wide, then put in whatever information is missing

  • If there is no more information, what justifies your belief? Who are you trying to fool?