1 of 170

How to handle extremely rare events in a meaningful way

Scott Ferson

Institute for Risk and Uncertainty, University of Liverpool

Nuclear Energy - GREEN CDT, 26 July 2023

Photo credit: Tatyana Ginzburg, used under license

This presentation has animations. Select Slide Show / From Beginning to see them.

Turn on recording and use microphone

2 of 170

Today

Logistics
Mostly lecture (sorry)
But interrupt!
Remind me at the break
Discussion

Download these slides from

https://sites.google.com/view/greenrisk

�NAFEMS is the International Association for the Engineering Modelling, Analysis and Simulation Community

https://en.wikipedia.org/wiki/Simulation_governance

Simulation governance is a managerial function concerned with assurance of reliability of information generated by numerical simulation. The term was introduced in 2011 ^[1] and specific technical requirements were addressed from the perspective of mechanical design in 2012.^[2] Its strategic importance was addressed in 2015.^[3]^[4] At the 2017 NAFEMS World Congress in Stockholm simulation governance was identified as the first of eight “big issues” in numerical simulation.

Edimu et al. (2010)

Risk analysis in engineering 26

So how are we doing? 7

Dependence 9

Kinds of uncertainty 8

Getting the inputs 18

Risk communication 3

Bayesian networks 4

Backcalculation 12

Conclusions 12

NAFEMS challenge

Fire alarm at 11 am

Look for the falconer on the roof next door

3 of 170

Open question

Suppose

A is in [2, 4]

B is in [3, 5]

What can be said about the sum A+B?

4 of 170

Deepwater Horizon Spill (2010)

Columbia Disaster (2003)

2007-2008 Financial Crisis

Fukushima Daiichi Nuclear Disaster (2011)

Air France Flight 447 Rio−Paris (2009)

Minneapolis Bridge Collapse (2007)

Credits (left to right):

https://www.telegraph.co.uk/news/worldnews/asia/japan/8863968/Fukushima-nuclear-disaster-timeline.html.

Unknown photographer - US Coast Guard - 100421-G-XXXXL- Deepwater Horizon fire (Direct link), public domain.

Pawel Kierzkowski, https://en.wikipedia.org/wiki/Air_France_Flight_447, CC BY-SA 3.0.

Dominic Alves from Brighton, England - Northern Rock Queue, CC BY 2.0.

The Ponte Morundi collapsed 45 meters killing 43 people, Image credit: The Daily Sabah

Kevin Rofidal, United States Coast Guard, https://en.wikipedia.org/wiki/I-35W_Mississippi_River_bridge, public domain.

https://www.youtube.com/watch?v=aXiZ3RHR3bg

RAAC (reinforced autoclaved aerated concrete), collapse of a school roof in Kent (2018) image credit: no credit

Morry Gash, NPR, https://www.npr.org/2017/08/01/540669701/10-years-after-bridge-collapse-america-is-still-crumbling?t=1614450713959

The British bank Northern Rock failed as a result of a bank run precipitated by the global financial crisis. It was the first bank in 150 years to fail due to a bank run.

Don’t confuse the Fukushima Daiichi nuclear plant with a more dramatic picture of natural gas storage tanks burn at a facility in Chiba Prefecture, near Tokyo, which was another consequence of the same earthquake and tsunami that affected the Fukushima Daiichi nuclear power plant.

Hurricane Katrina

2011 natural gas storage tanks burn at a facility in Chiba Prefecture, near Tokyo, after !! It is not a nuclear power plant! But is another consequence of the same earthquake and tsunami that effected the Fukushima Daiichi nuclear power plant

Tōhoku earthquake and tsunami

2010 Failure of the blowout preventer led to eleven deaths, the largest-ever marine oil-spill, environmental catastrophe and costs to BP of more than $60B.

2009 crash with 228 fatalities of Air France flight 447 en route from Rio de Janeiro to Paris due to sensor failure and flawed human-software interaction

2011 explosive release of radioactive material in nuclear power plant meltdowns when cooling pumps lost battery power and backup generators were disabled by the tsunami resulting from the Tōhoku earthquake

https://programm.ard.de/TV/wdrfernsehen/kyrill---ein-orkan-fegt-durchs-land/eid_28111450974388

Ponte Morundi Collapse (2018)

2011 explosive release of radioactive material at the Fukushima Daiichi nuclear plant in its meltdowns when cooling pumps lost battery power and backup generators were disabled by the tsunami resulting from the Tōhoku earthquake.

2010 Failure of the blowout preventer led to eleven deaths, the largest-ever marine oil-spill, environmental catastrophe and costs to BP of more than $60B.

2009 crash with 228 fatalities of Air France flight 447 en route from Rio de Janeiro to Paris due to sensor failure and flawed human-software interaction

The British bank Northern Rock failed as a result of a bank run precipitated by the global financial crisis. It was the first bank in 150 years to fail due to a bank run.

Credits (left to right):

https://www.telegraph.co.uk/news/worldnews/asia/japan/8863968/Fukushima-nuclear-disaster-timeline.html.

Unknown photographer - US Coast Guard - 100421-G-XXXXL- Deepwater Horizon fire (Direct link), public domain.

Pawel Kierzkowski, https://en.wikipedia.org/wiki/Air_France_Flight_447, CC BY-SA 3.0.

Dominic Alves from Brighton, England - Northern Rock Queue, CC BY 2.0.

Kevin Rofidal, United States Coast Guard, https://en.wikipedia.org/wiki/I-35W_Mississippi_River_bridge, public domain.

https://www.youtube.com/watch?v=aXiZ3RHR3bg

https://programm.ard.de/TV/wdrfernsehen/kyrill---ein-orkan-fegt-durchs-land/eid_28111450974388

Morry Gash, NPR, https://www.npr.org/2017/08/01/540669701/10-years-after-bridge-collapse-america-is-still-crumbling?t=1614450713959

5 of 170

Takoma Narrows Bridge failure (1940)

Gare Montparnasse (1895)

Boiler explosion (1880s—1910s)

Takoma Narrows (1940)

Challenger (1986)

Bhopal disaster (1984, 4000)

E. coli O157 (South Wales outbreak 2005; US 1993)

Fukushima Daiichi nuclear disaster 2011

Chernobyl 1986

Titanic (1912)

Minneapolis bridge (2004)

Mad cow

A radiation therapy machine caused 6 accidents in the late 1980s in which patients were given massive overdoses of radiation, because of a race condition program error. Now a standard case study in software engineering about the overconfidence of the engineers and lack of proper due diligence to resolve reported software bugs.

Therac-25 (late 1980s)

Bhopal disaster (1984)

Mad cow and E. coli O157 (late 1980s—today)

Hyatt Walkway Collapse (1981)

Challenger (1986)

Aloha Airlines Flight 243 (1988)

Chernobyl (1986)

Sampoong Department Store collapse (1995)

1988 explosive decompression at altitude of old aircraft with metal fatigue and possibly improper maintenance

Credits (left to right):

Seoul Metropolitan Fire & Disaster Headquarters, https://en.wikipedia.org/wiki/Sampoong_Department_Store_collapse, CC BY-SA 4.0, http://eungdapso.seoul.go.kr/Community/viewLongText.jsp?key=CD852A52B4760F9E09DBF27ABCB137A3AB33F5A7F1FF35C5B135421B870AD652.

Dennis Oda, AP, https://flightsafety.org/wp-content/uploads/2016/10/asw_mar11_p37-41.pdf, https://www.reddit.com/r/CatastrophicFailure/comments/euwgvz/on_28_april_1988_aloha_airlines_flight_243/, https://twitter.com/djsnm/status/996433183435509760.

Sally Banchbach, used under license.

Carl Montgomery, Flickr, https://en.wikipedia.org/wiki/Chernobyl_disaster#/media/File:Chernobylreactor_1.jpg, CC BY 2.0

FSIS, https://www.fsis.usda.gov/wps/wcm/connect/87caa3f9-0c76-45c7-be4e-84d73151ed9e/RD-2009-0011-072114.pdf?MOD=AJPERES, public domain

Catalina Márquez, https://en.wikipedia.org/wiki/Therac-25, CC BY-SA 4.0

Luca Frediani, https://en.wikipedia.org/wiki/Bhopal_disaster#/media/File:Bhopal-Union_Carbide_2.jpg, CC BY-SA 2.0

Dr. Lee Lowery, Jr., P.E., https://en.wikipedia.org/wiki/Hyatt_Regency_walkway_collapse#/media/File:Hyatt_Regency_collapse_end_view.PNG, public domain

https://en.wikipedia.org/wiki/Tacoma_Narrows_Bridge_(1940), public domain

Levy & fils, https://en.wikipedia.org/wiki/Gare_Montparnasse#/media/File:Train_wreck_at_Montparnasse_1895.jpg, public domain

Jon Bodsworth, Egypt Archive, https://en.wikipedia.org/wiki/Meidum#/media/File:02_meidum_pyramid.jpg, copyright free

Pyramid of Snefru at Maidum (2600 BCE)

6 of 170

Asteriod

Icelandic eruptions affecting European airtravel (2011)

Chilean eruption

California wildfires (2012)

Dought, pestilence

Landslide

Haiti earthquake (2010, 160,000)

Indian Ocean tsunami (2004, 280,000)

Yellow River floods (1931, ~4 million)

7 of 170

Risk assessment applications

Pollution

heavy metals, pesticides, PMx, ozone, PCBs, EMF, RF, etc., thermal pollution

Engineered systems

traffic safety, bridge design, airplanes, spacecraft, nuclear plants, reliability of electricity distribution

Financial investments

portfolio planning, consultation, instrument evaluation

Occupational hazards

manufacturing and factory workers, farm workers, hospital staff

Food safety and consumer product safety

benzene in Perrier, E. coli in beef, salmonella in tomato, children’s toys

Ecosystems and biological resources

endangered species, fisheries and reserve management, entrainment and impingement

$

8 of 170

More and more important

To improve efficiencies, most regulatory agencies are shifting from standards-based to performance-based regulations

Fire codes no longer say the stair wells have to be so wide; they just say the occupants must be able to evacuate the building in a given period of time

Slaughter houses needn’t spend money on old ag-school rules if they can otherwise control enteric contaminants

Risk analysis used to make the demonstrations

9 of 170

Risk analysis

Everywhere in science and engineering today

It was invented by the nuclear power industry

Major contributions from the United Kingdom

It is essential for next generation nuclear power

10 of 170

Throughout the nuclear industry

Containment reliability
Design and process safety
Human factors
Effects of thermal pollution from CWISs
Entrainment & impingement of aquatic biota
Reliability of electricity distribution networks
Occupational safety
Biotic control
Financial management

11 of 170

Risk

Chance that something bad happens, together with how bad it is

Chance is usually measured by probability

The pair of values may be given as a product, or as a graph when there are many pairs

12 of 170

Risk as product

Probability × severity = “Expected” injury

Would you prefer to kill 1000 people with probability 1%, or to surely kill 10 people?

To escape such decisions, risk analysts often prefer to show all outcomes graphically and let someone else decide

13 of 170

Risk curve

Badness

-1

10

0

1

Probability

0

Mortality

Financial loss

Sometimes called a Farmer curve (after the British nuclear regulator)

Probability unitless

Percentage per demand or outcome

Density per unit outcome

Frequency per unit time or outcome

Qualitative per unit

(e.g., ‘rare’, ‘plausible’, etc.)

14 of 170

Reg said

No line between safe and unsafe

Nothing has no risk

But we can quantify the risks

https://web.archive.org/web/20121218194620/https://netfiles.uiuc.edu/mragheb/www/NPRE%20457%20CSE%20462%20Safety%20Analysis%20of%20Nuclear%20Reactor%20Systems/The%20Risk%20Assessment%20Methodology.pdf

In 1986, IAEA estimated 4,000 projected deaths due to Chernobyl

15 of 170

Risk assessment

Quantification of risk

Basically, systematic “what-if” thinking

Same as safety assessment

Many confused synonyms

Assessment, characterization, evaluation, analysis, quantification

16 of 170

Risk management

What we’re going to do about the risks

Usually includes the development and application of strategies or policies to remove, reduce or work around risks

May also include monitoring risks and communicating risks

17 of 170

Old school risk analysis

Because they were pioneering the discipline, they made a few mistakes

Assuming independence too much

Using uniforms or flat priors for “I don’t know”

Using stochastic mixture as averaging or using averages

The nuclear industry has not sustained its lead

X

Carelessness about ensembles is perhaps their fourth mistake

18 of 170

Traditional practice

Most assessments are deterministic

Some are deliberately conservative, but

degree of conservatism is opaque, unquantified, inconsistent

difficult to characterize risks, except in extreme situations

19 of 170

What’s needed

An assessment should tell us

How likely the various consequences are

How reliable the conclusions are

20 of 170

Probability distribution display

Density distribution or mass distributions

Cumulative distribution function (CDF)

Complementary CDF (exceedance risk)

Pr(X ≤ x)

x

Pr(x ≤ X)

x

p(x)

0

1

0

1

CDF is not core damage frequency!

21 of 170

Probabilistic models

The same as deterministic models except that point values are replaced by probability distributions

Well, almost

Ensembles
Distribution shapes and tails
Dependencies
Backcalculations

wrinkles

22 of 170

Ensemble

Statisticians call it the “population” or “reference class”

E.g., dysfunction rates in prostate patients

50% of attempts at sex fail, or
50% of men are totally impotent

23 of 170

Specification of ensembles

Above all, be clear

1% of lethal dose per day / lethal dose on 1% of days
Assessment units: birds, meals, days, fields, …
Assessment ensemble: all birds, or just those on site?

Mismatched input distributions yield gibberish

If the analyst can’t say what ensemble a distribution represents, the analysis may be meaningless

24 of 170

Combining distributions (convolution)

X

X+Y

=

+

Y

Convolution is getting the output distribution from the input distributions. Monte Carlo is one way to do this.

25 of 170

Integrated calculations

Z

Y

X

W

f(W,X,Y,Z)

This is unlike intrusive analyses that solve operations sequentially

Sample realizations from each input distribution, and combine them to compute a scalar output value. Repeat many times to accumulate a distribution of output values.

26 of 170

Typical risk analysis problems

Fault tree analysis, event trees, etc.
Bayes nets
Arithmetic expressions
Strength − stress comparisons

27 of 170

Case study: dike reliability

28 of 170

The inputs

Relative density of the revetment blocks

= uniform(1.60, 1.65)

Block thickness

D = uniform(0.68, 0.72) meters

Slope of the revetment

= atan(uniform(0.32, 0.34)) radians

Analysts’ “model uncertainty” factor

M = uniform(3.0, 5.2)

Significant wave height (average of the highest third of waves)

H = weibull(1.35 meters, 11)

Offshore peak wave steepness

s = normal(0.04, 0.0055)

How do we specify these distributions?

29 of 170

Analysis

0.6

0.7

0.8

D

0.2

0.3

0.4

0

1

α

3

5

2

4

6

M

0.02

0.04

0.06

s

1.5

1.6

1.7

1

0

Δ

Cumulative probability

Z

Probability of Z being negative

(i.e., dike failure) is very small

1

0

2

H

0

1

0

1

(meters)

Cumulative probability

Z

0

1

0

P = 0.000015

30 of 170

Script for the R programming language

# case study for reliability of the dike

m = 1000000 # number of replications

delta = runif(m, 1.60, 1.65) # density of the revetment blocks

D = runif(m, 0.68, 0.72) # block thickness

alpha = atan(runif(m, 0.32, 0.34)) # slope of the revetment

M = runif(m, 3.0, 5.2) # “model uncertainty” factor

H = rweibull(m, 11,1.35) # significant wave height

s = rnorm(m, 0.04, 0.0055) # offshore peak wave steepness

Z = delta * D - (H* tan(alpha)) / (cos(alpha)*M*sqrt(s))

plot(ecdf(Z)); sum(Z<0)/m

Google “download R”

31 of 170

So how are we doing?

32 of 170

Dramatic technocratic failures

Kansai International Airport wishful thinking
Diablo Canyon Nuclear Power Plant empirically incorrect
Vioxx withdrawal missed decision context
Fukushima non-redundancy
Mars Climate Orbiter units
Google Flu drank the Koolaid
Ariane 5 integer overflow
Minneapolis bridge collapse “one in a billion”

33 of 170

These are not noble failures

They are not simply unfortunate but expected extreme tail events

Occur more often than our risk estimates predict

Managers never see the problems coming

Analysts are doing the risk estimations wrong

34 of 170

Famous ‘misunderestimations’

Shuttle risk estimated at 1/10,000 per flight, it was actually 1 per 70 flights

Grossly understating financial risks precipitated the 2007−2008 recession

100-year flood risks underestimated and uncertainties not communicated

Failure cascades in electricity distribution systems are more common than they are forecasted to be

1998 National Weather Service, US Army Corps

(RAWG 2005; USCPSOTF 2006)

35 of 170

We are understating risks and uncertainties

NASA estimated the risk of losing the Space Shuttle at 1/10,000 per flight, but its observed failure rate was 1/70 flights (Hamlin et al. 2011; Oberkampf and Roy 2010)

National Weather Service and the Army Corps of Engineers underestimated risks of “100-year floods” and flubbed flood risk communication(Morss&Wahl 2007; NRC 2006; NWS 1998)

Observed failures and near-misses found at the Diablo Canyon Nuclear Power Plant reveal gross understatement of its assessed risks (Lochbaum 2011)

Failure cascades in power grids are more common than forecasted (RAWG 2005; USCPSOTF 2006)

Understating risks in the financial industry caused the Great Recession (Savage 2012; Silver 2012)

An engineer called the 2007 Minneapolis bridge collapse a “billion to one” event

36 of 170

What are we doing wrong?

Underestimation of the uncertainties of original measurements
Unjustified use of independence assumptions
Overly precise dependence assumptions
Unjustified assumption of equiprobability
Inappropriate use uniform distributions
Modelling epistemic uncertainty the same way as aleatory
Not accounting for model-form uncertainty
Modelling volitional choices with distributions as though they’re random
Using averaging as an aggregation
Making assumptions for the sake of mathematical convenience

Wishful thinking

37 of 170

These mistakes can compound

The errors all work in the same direction: underestimation of risks

Omitting uncertainties likewise lead to understating risks

38 of 170

Northeast Blackout of 2003

Most widespread in history after Southern Brazil Blackout of 1999

55 million people affected

Traced to a software bug in a control room alarm system in Ohio

Electric Reliability Organization was created in the aftermath

I reviewed a paper for Risk Analysis on reliability theory by candle light.

39 of 170

Almost 200,000 km of lines, operated by 500 separate companies

40 of 170

Dependence

41 of 170

Dependence

Most variables are assumed to be independent

But some variables clearly aren’t

Density and porosity
Rainfall and temperature
Body size and skin surface area
Adjacent or chained delivery systems

42 of 170

Dependence can be complex

X

Y

43 of 170

Dependence is subtler than correlation

Knowing the correlation doesn’t tell you the dependence. Each plot has (Pearson) correlation 0.816

op <- par(las=1, mfrow=c(2,3), mar=1.5+c(4,4,1,1), oma=c(0,0,0,0),

lab=c(6,6,7),

#cex.lab=2.0, cex.axis=1.3,

mgp=c(3,1,0))

ff <- y ~ x

for(i in 1:4) {

ff[[2]] <- as.name(paste("y", i, sep=""))

ff[[3]] <- as.name(paste("x", i, sep=""))

lmi <- lm(ff, data= anscombe)

xl <- substitute(expression(x[i]), list(i=i))

yl <- substitute(expression(y[i]), list(i=i))

plot(ff, data=anscombe, col="red", pch=21, cex=2.4, bg = "orange",

xlim=c(3,19), ylim=c(3,13)

# , xlab=eval(xl), ylab=yl # for version 3

, xlab='', ylab='' # for version 3

)

abline(lmi, col="blue")

}

x = c(4,8,13.269,17.5); y = c(4,10,8,12.5); plot(x,y

,col="red", pch=21, cex=2.4, bg = "orange",

xlim=c(3,19), ylim=c(3,13)

, xlab='', ylab='' # for version 3

); cor(x,y)

abline(lmi, col="blue")

x = c(6,4,8,10,13,17.5,16,14,15,17.3); y = c(5.468,4,6,8,13,9.241,11,11.5,12,10); plot(x,y

,col="red", pch=21, cex=2.4, bg = "orange",

xlim=c(3,19), ylim=c(3,13)

, xlab='', ylab='' # for version 3

); cor(x,y)

abline(lmi, col="blue")

par(op)

http://en.wikipedia.org/wiki/Correlation_and_dependence

http://en.wikipedia.org/wiki/File:Anscombe%27s_quartet_3.svg (four of these six graphs)

This is a featured picture, which means that members of the community have identified it as one of the best images on the English Wikipedia, adding significantly to its accompanying article. If you have a different image of similar quality, be sure to upload it using the proper free license tag, add it to a relevant article, and nominate it.

44 of 170

Case study: Vesely’s tank

What’s the chance the tank ruptures under pumping?

Vesely et al. 1981

45 of 170

Fault tree and its Boolean expression

E1

T

K2

E2

E3

S

E4

S1

E5

K1

or

and

or

R

E1 = T ∨ (K2 ∨ (S & (S1 ∨ (K1 ∨ R))))

46 of 170

Bill Vesely’s subjective probabilities

E1 = T ∨ (K2 ∨ (S & (S1 ∨ (K1 ∨ R))))

Component

Pressure tank T

Relay K2

Pressure switch S

Relay K1

Timer relay R

On-switch S1

Probability

5 × 10⁻⁶

3 × 10⁻⁵

1 × 10⁻⁴

3 × 10⁻⁵

1 × 10⁻⁴

3 × 10⁻⁵

47 of 170

Different dependency models

Vesely et al. (all variables independent)

E1 = T |∨| (K2 |∨| (S |&| (S1 |∨| (K1 |∨| R))))

Mixed dependencies

E1 = T |∨| (K2 ∨ (S &_r (S1 |∨| (K1 /∨/ R))))

Correlated to no assumption about dependence

E1 = T |∨| (K2 ∨ (S & (S1 |∨| (K1 /∨/ R))))

No assumptions about any dependencies

E1 = T ∨ (K2 ∨ (S & (S1 ∨ (K1 ∨ R))))

48 of 170

Results

10⁻⁵

10⁻⁴

10⁻³

Probability of tank rupturing under pumping

Mixed dependencies

Relaxing S–(S1,K1,R) dependence

No dependence assumptions

All variables independent

3.5×10⁻⁵

[3.499×10⁻⁵, 3.504×10⁻⁵]

[3.50×10⁻⁵, 1.35×10⁻⁴]

[3×10⁻⁵, 1.4×10⁻⁴]

Independence assumption seems to be non-conservative
Some relaxations make a big difference, some don’t (have to do the analysis)
Even making no assumptions at all doesn’t make the result vacuous

49 of 170

Can’t always assume independence

If you know the mechanism of dependence, you can model it

Or, should reproduce the statistical patterns

If dependence unknown, can’t use either way

Even small correlations can have a big effect on convolutions

50 of 170

Kinds of uncertainty

51 of 170

Sources of uncertainties

Thermal noise (Johnson−Nyquist) electronics
Poisson counting noise (integrality) disintegrations
Sampling uncertainty (some of a population)
Unmodelled factors (unknown unknown) seismicity
Quantum mechanics (God does play dice) Schrödinger
Measurement uncertainty (plus-or-minus) ½ or 1 unit

Etymologically, random means unpredictable

https://www.youtube.com/watch?v=6EsuJDpDvW0 ; https://www.youtube.com/watch?v=3gcEhGlfu4U

52 of 170

Two kinds of uncertainty

Variability
Aleatory uncertainty
Type A uncertainty
Stochasticity
Randomness
Chance
Risk

Incertitude
Epistemic uncertainty
Type B uncertainty
Ambiguity
Ignorance
Imprecision
True uncertainty

Monte Carlo methods

and probability theory

Interval arithmetic and

constraint analysis

They can change in different contexts. Distinguishing them may be subtle, sometimes like dividing ice from snow. But it’s essential to do so, as getting hit with a snowball is different from getting hit with a block of ice.

Photo credit: Jason Hollinger, https://commons.wikimedia.org/wiki/File:Feathery_Snow_Crystals_(2217830221).jpg, behind translucent filter

53 of 170

Variability = aleatory uncertainty

Arises from natural stochasticity

Variability arises from

spatial variation
temporal fluctuations
manufacturing or genetic differences

Not reducible by empirical effort

54 of 170

Incertitude = epistemic uncertainty

Arises from incomplete knowledge

Incertitude arises from

limited sample size
mensurational limits (‘measurement uncertainty’)
use of surrogate data

Reducible with empirical effort

55 of 170

Propagating incertitude

Suppose

A is in [2, 4]

B is in [3, 5]

What can be said about the sum A+B?

4

6

8

10

The right answer for

risk analysis is [5,9]

56 of 170

Must be treated differently

Variability should be modeled as randomness with probability theory

Incertitude should be modeled as ignorance with methods of interval or constraint analysis

Imprecise probabilities or probability bounding can do both at once

57 of 170

Propagation methods

Analytical (delta method, Laplace transforms)

Monte Carlo simulation

Probability bounding (imprecise probabilities)

58 of 170

��Relaxing all assumptions�� Laplace Dependence Shape

4

5

6

7

8

9

10

0

0.5

1

Cumulative distribution of Laplace’s unreasonable triangular density function

Any CDF inside this region is possible…same as interval

This contains any CDF arising from dependent uniforms

59 of 170

Aleatory or epistemic?

“These concepts only make unambiguous sense if they are defined with the confines of a model of analysis. In one model an addressed uncertainty may be aleatory, in another model it may be epistemic. So, the characterization of uncertainty becomes a pragmatic choice dependent on the purpose of the application.” (Der Kiureghian and Ditlevsen 2009)

60 of 170

“The probability of a component failure is 0.01”

An aleatoric interpretation

On average, 1 in every 100 components of this type will fail

An epistemic interpretation

There is 0.01 probability that all components of this type will fail

Very different characterizations of an uncertainty statement

Once understood, proper analysis is straightforward

(Jon Helton 2009)

61 of 170

Getting the inputs

62 of 170

Expert elicitation

Maximum likelihood

Bayesian inference

PERT

Method of moments

Maximum entropy

Estimation

methods

the great frontier of making things up

A two-front assault

63 of 170

How to specify input distributions

Default distribution
PERT
Expert opinion
Maximum entropy
Fitted distribution (goodness of fit)

Method of matching moments
Maximum likelihood

Bayesian analysis
Empirical distribution

No data

Lots of data

64 of 170

Default distributions

Physicists overuse normal distributions
Risk analysts overuse uniform distributions
Reliability engineers overuse Weibulls
Ecologists overuse lognormals
Statisticians overuse beta distributions
Food safety overuses triangular distributions

Cooper’s quip

65 of 170

Arguments (frequently abused)

Sum of many independent comparable terms (normal)
Product of many independent factors (lognormal)
Largest of many values (Gumbel, Weibull, Fréchet)
Lifetimes with neither aging nor burn-in (exponential)
Symmetry (uniform)
Rare discrete events (Poisson)
Independent trials (binomial)

66 of 170

Engineers cannot always get data

New systems may have no performance history

spacecraft of new design or in a new environment
biology using novel genetic constructs that have never existed before
components so well constructed that there’s no failure data

Three ways to estimate probabilities without data

disaggregation into parts whose probabilities are easier to estimate (i.e., breaking it into subproblems)
expert elicitation (i.e., guessing)
modern statistical estimation

67 of 170

Rare events

Often the driving concern in analyses
Typically big consequences
Hardly ever characterized by good data
Perhaps never seen, or seen only once
How can we estimate the probability of a rare event?

68 of 170

Random sample data

Maximum likelihood says p is zero for never-seen events

Nobody believes this is a reasonable estimate

Bayesian estimator is more reasonable…

Just kidding

69 of 170

0

1

0

1

2

Probability density

0

0.002

0.004

0.006

0.008

0.01

0

1

p

Cumulative probability

Zellner

Bayes-Laplace

Jeffreys

Haldane

Means

0.001 Haldane

0.0015 Jeffreys

0.001986 Zellner

0.001996 Bayes-Laplace

Data is “once out of 1000”

Data is “zero out of 1000”

Means

0 Haldane

0.0005 Jeffreys

0.000942 Zellner

0.000998 Bayes-Laplace

0.000

0.002

0.004

0.006

0.008

0.010

0.0

0.2

0.4

0.6

0.8

1.0

p

Cumulative Probability

70 of 170

Random sample data

Maximum likelihood says p is zero for never-seen events

Nobody believes this is a reasonable estimate

The Bayesian estimator is many things

‘Reasonable’ only in that it’s whatever you want

Modern estimators

Imprecise beta (Dirichlet) models
Confidence structures

71 of 170

Confidence structure (c-box)

Estimator of a (fixed) parameter
Gives confidence intervals for the parameter at any confidence level

Parameter value

Probability

0

1

α

β

(α−β)100% confidence interval

θ

72 of 170

Confidence boxes

Structures that infer confidence intervals, at any level, for a parameter

Can be propagated just like p-boxes

Allow us to compute with confidence

Generalize confidence distribution (e.g. Student t)

Different from confidence bandssuch as Kolmogorov-Smirnov or distributional bands

73 of 170

Estimators

Point estimates (e.g., sample mean)

Interval estimates (e.g., confidence intervals)

Distributional estimates (Bayesian posteriors)

P-box-shaped estimates (e.g., c-boxes)

74 of 170

Probability of rare event

Inference about a probability from binary data, k successes out of n trials

p ~ [beta(k, n−k+1), beta(k+1, n−k)]

notation extends the use of tilda

Data

k = 2

n = 10

beta(2, 9)

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

beta(3, 8)

75 of 170

Binomial rate

0

.4

.8

0

.2

.4

.6

.8

1

.2

.6

p

0/1

1/1

0/0

0/2

1/2

2/2

0/3

1/3

2/3

3/3

0/20

15/20

20/20

0/200

150/200

200/200

0/2000

1500/2000

2000/2000

…

Notice that the c-boxes in every row partition the vacuous ‘dunno’ interval

76 of 170

Zero out of 10^k trials

0 8×10⁻^k

“zero corner”

…

77 of 170

One out of 10^k trials

0 10¹⁻^k

“once corner”

…

78 of 170

Computing with c-boxes

0

0.2

0.4

0.6

0.8

1

0

1

0

0.2

0.4

0.6

0.8

1

0

1

0

0.2

0.4

0.6

0.8

1

0

1

Plan A

249 out of 1,014 failed

Plan B

39 out of 60 failed

Plan C

17 out of 20 failed

What if we tried all three plans independently?

79 of 170

Conjunction (AND)

a = balchbox(100,25)

b = balchbox(60,39)

c = balchbox(20,17)

ABC = a %&% b %&% c

abc = a %|&|% b %|&|% c

0

0.2

0.4

0.6

0.8

1

0

1

Probability that all three plans fail

Has the confidence performance!

Probability

80 of 170

Vesely’s pressurized tank

on-off switch S1 relay K1 timer relay R

0

0.06

0.03

0.006

0

0.015

0.002

0.03

0/2000 3/500 1/150

0/460 7/10000 0/380

tank T relay K2 pressure switch S

81 of 170

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0

1

Top event, tank rupture under pressurization, assuming independence

Has the performance interpretation

82 of 170

Tight bounds or fat bounds

When the bounds are tight, and far from a threshold of concern, the analysis gives confidence about the conclusion

When the bounds straddle a threshold of concern, the analysis is saying you don’t have enough information to answer that question

83 of 170

R script

# Vesely's pressurized tank system from sparse sample data assuming independence�many = 10000�constant <- function(b) if (length(b)==1) TRUE else FALSE�precise <- function(b) if (length(b)==many) TRUE else FALSE�leftside <- function(b) if (precise(b)) return(b) else return(b[1:many])�rightside <- function(b) if (precise(b)) return(b) else return(b[(many+1):(2*many)])�sampleindices = function() round(runif(many)*(many-1) + 1)�pairsides <- function(b) {i=sampleindices(); return(env(leftside(b)[i],rightside(b)[i]))}�env <- function(x,y) if ((precise(x) && precise(y))) c(x,y) else stop('env error’)�beta <- function(v,w) if ((v==0) && (w==0)) env(rep(0,many),rep(1,many)) else

if (v==0) rep(0,many) else if (w==0) rep(1,many) else sort(rbeta(many, v, w))�kn <- function(k,n) return(pairsides(env(beta(k, n-k+1), beta(k+1, n-k)))) ��orI <- function(x,y) return(1-(1-x)*(1-y))�andI <- function(x,y) return(x*y)��t = kn(0, 2000)�k2 = kn(3, 500)�s = kn(1, 150)�s1 = kn(0, 460)�k1 = kn(7, 10000)�r = kn(0, 380)��e1 = orI(t, orI(k2, andI(s, orI(s1, orI(k1, r)))))�

�plotbox <- function(b,new=TRUE,col='blue',lwd=2,xlim=range(b[is.finite(b)]),ylim=c(0,1),xlab='',ylab='Probability',...) {� edf <- function (x, col, lwd, ...) {� n <- length(x)� s <- sort(x)� if (2000<n) {s = c(s[1],s[1:(n/5)*5],s[n]); n <- length(s);} # subsetting for the plot to every 5th value� lines(c(s[[1]],s[[1]]),c(0,1/n),lwd=lwd,col=col,...)� for (i in 2:n) lines(c(s[[i-1]],s[[i]],s[[i]]),c(i-1,i-1,i)/n,col=col,lwd=lwd,...)� }� b = ifelse(b==-Inf, xlim[1] - 10, b)� b = ifelse(b==Inf, xlim[2] + 10, b)� if (new) plot(NULL, xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab)� if (length(b) < many) edf(b,col,lwd) else� edf(c(min(b),max(b),b[1:min(length(b),many)]),col,lwd)� if (many < length(b)) edf(c(min(b),max(b),b[(many+1):length(b)]),col,lwd)� }��par(mfrow=c(2,4))�plotbox(t); title('tank T')�plotbox(k2); title('relay K2')�plotbox(s); title('pressure switch S')�plotbox(s1); title('on-off switch S1')�plotbox(k1); title('relay K1')�plotbox(r); title('timer relay R')�plotbox(e1); title('top event E1')

# Vesely's pressurized tank system from sparse sample data assuming independence

many = 10000

constant = function(b) if (length(b)==1) TRUE else FALSE

precise = function(b) if (length(b)==many) TRUE else FALSE

leftside = function(b) if (precise(b)) return(b) else return(b[1:many])

rightside = function(b) if (precise(b)) return(b) else return(b[(many+1):(2*many)])

sampleindices = function() round(runif(many)*(many-1) + 1)

pairsides = function(b) {i=sampleindices(); return(env(leftside(b)[i],rightside(b)[i]))}

env = function(x,y) if ((precise(x) && precise(y))) c(x,y) else stop('env error')

plotbox = function(x, xlab='', ylab='Cumulative probability', xlim=range(x), ylim=c(0,1), ...) {

plot(NULL,NULL,xlab=xlab,ylab=ylab,xlim=xlim,ylim=ylim, ...);

lines(ecdf(x[(many+1):(2*many)]),...); lines(ecdf(x[1:many]),...) }

beta = function(v,w) if ((v==0) && (w==0)) env(rep(0,many),rep(1,many)) else

if (v==0) rep(0,many) else if (w==0) rep(1,many) else sort(rbeta(many, v, w))

kn = function(k,n) return(pairsides(env(beta(k, n-k+1), beta(k+1, n-k))))

orI = function(x,y) return(1-(1-x)*(1-y))

andI = function(x,y) return(x*y)

t = kn(0, 2000); k2 = kn(3, 500)

s = kn(1, 150); s1 = kn(0, 460)

k1 = kn(7, 10000); r = kn(0, 380)

e1 = orI(t, orI(k2, andI(s, orI(s1, orI(k1, r)))))

plotbox(e1)

84 of 170

Experts opinions on rare event probabilities

Hardly ever any actual data
Sometimes we have experts
But how should we model bald assertions:

“1 in 10 million”

“about 1 in 1000”

“never been seen in 100 years”

85 of 170

In my experience

People using expert elicitation often take the pronouncements made by their informants exactly as they are specified

Sometimes even as point values

(If your experience is different, I’d like to chat)

86 of 170

Is there no uncertainty?

What is the uncertainty in estimates like

“1 in 10⁷”,
“about 1 in 1000”, or
“never been seen in over 100 years of observation”?

How should this uncertainty be captured and projected in computations?

We have suggested a tentative strategy based on c-boxes

87 of 170

Risk communication

88 of 170

0

.4

.8

0

.2

.4

.6

.8

1

.2

.6

p

0/1

1/1

0/0

0/2

1/2

2/2

0/3

1/3

2/3

3/3

0/20

15/20

20/20

0/200

150/200

200/200

0/2000

1500/2000

2000/2000

…

89 of 170

Confidence

0

1

Probability

0

1

Express risk as “k out of n”

Risk

k-n c-box

Match a calculated risk to a c-box

90 of 170

Equivalent binomial count

An imprecisely computed risk can be expressed as a p-box over [0,1]

Transformed it into a natural language statement of the form “ k out of n ”

These are natural frequencies

Large uncertainties imply small denominators
Or even interval numerators if very large

People can understand them, possibly better than graphical depictions

91 of 170

Confidence

0

1

Confidence

0

1

Probability

0

1

Probability

0

1

92 of 170

Amazon Mechanical Turk

Check preference for sunglasses rated by buyers using various schemes

More frequent ‘excellent’ ratings should be better
Larger pool of buyers rating should be more reliable

Testing whether

Turkers can make rational choices
Natural frequencies are as good as or better than percentages
Larger denominators convey more reliability
Interval numerators can be understood

93 of 170

Pair A was rated excellent by 10 out of 100 customers.

Pair B was rated excellent by 80 out of 100 customers.

Pair A was rated excellent by 66 out of 198 customers.

Pair B was rated excellent by 2 out of 6 customers.

Pair A was rated excellent by 66 to 88 out of 198 customers.

Pair B was rated excellent by 33 out of 100 customers.

50% of customers rated Pair A as excellent.

Pair B was rated excellent by

50 out of 100 customers.

50% of customers rated Pair B as excellent.

Pair A was rated excellent by 2 out of 4 customers.

94 of 170

Findings

80% 50% rational

10/100 80/100 same sample size

66/198 2/6 same magnitude

[66,88]/198 33/100 even with ambiguity

50% 50/100 prefer natural frequency

2/4 50% unless very uncertain

Sample sizes ~300 “master turkers”

95 of 170

Bayesian networks

96 of 170

Bayesian networks (Bayes nets)

Probabilistic graphical models used to study the conditional dependencies of components (random variables) that make up a system

Slide thanks to Diego Hector Estrada Lugo

97 of 170

Bayes nets

Intuitive and relatively easy to implement
By introducing evidence you update outcomes
Causal (as opposed to evidential) reasoning
Needs lots of information about dependence among variables (conditional distributions)
When you don’t have empirical information, you need assumptions

Slide thanks to Diego Hector Estrada Lugo

98 of 170

Many applications

Slide thanks to Diego Hector Estrada Lugo

99 of 170

Limited applicability

But only practical for Gaussian distributions or discrete distributions

Probability

Slide thanks to Diego Hector Estrada Lugo

100 of 170

Backcalculation

101 of 170

Engineering design under uncertainty

Traditional design freezes requirements early

restricts future developments
specifies functions and performance levels

Often does not fully address uncertainty

assumptions may be incorrect
variation or unpredictability of conditions
entirely unknown facts that might be germane

Performance is often affected by uncertainties

101

102 of 170

A typical problem

How can we design a radiation shielding system if we can’t specify the radiation distribution?

Could plan with worst case analysis

Often wasteful
Can’t account for rare, even worse extremes

Could pretend we know the distribution

Unreasonable for new designs or environments

103 of 170

Backcalculation

Untangles an equation in uncertain numbers to satisfy a constraint

Finds B such that A + B = C, from estimates for A and constraints on C

Engineering design under uncertainty often needs backcalculation

(⊆ means “is inside of” or “is a subset of”)

Backcalculation finds B such that A+B ^⊆ C, from estimates for A and constraints on C

104 of 170

Planning a remediation program

Suppose we have constraints (from toxicity studies) on the doses receptors can tolerate of some contaminant

Dose is the concentration of the contaminant in water times the intake of water by the receptor

How clean must the remediation program make the water in order to ensure those constraints aren’t violated?

105 of 170

Can’t just solve for concentration

When Concentration is put back into the forward equation, the resulting Dose is wider than planned

prescribed

known

unknown

forward equation

inverted equation

Dose = Concentration × Intake

Concentration = Dose / Intake

106 of 170

Can’t just solve for concentration

When Radiation is put back into the forward equation, the resulting Dose is wider than planned

prescribed

known

unknown

forward equation

inverted equation

Dose = Radiation × Exposure

Radiation = Dose / Exposure

107 of 170

Example

Suppose dose must be less than 3 milligrams

dose = [0, 3] mg

intake = [0.5, 4] liter

conc = dose / intake

[ 0, 6] mg liter^-1

dose = conc * intake

[ 0, 24] mg

eight times larger than planned tolerable levels

108 of 170

Untangling backcalculations

Solving for B given A + B = C

B = backcalc(A, C) = [C₁ − A₁, C₂ − A₂]

Solving for B given A × B = C

B = factor(A, C) = [C₁ / A₁, C₂ / A₂]

Sometimes called “tolerance solutions”

109 of 170

Correct solution

Dose must be less than 3 milligrams

dose = [0, 3] mg

intake = [0.5, 4] liter

conc = factor(intake, dose) = [0/0.5, 3/4] mg liter^-1

[ 0, 0.75] mg liter^-1

Check:

dose = conc * intake

[ 0, 3] mg

110 of 170

Doesn’t work for distributions either

Dose = Concentration × Intake / Bodymass

Concentration = Dose × Bodymass / Intake

When concentration is put back into the forward equation, the resulting dose is wider than planned

prescribed

known

unknown

111 of 170

Planned dose

Body mass

Intake

Concentration

0

2

4

6

8

0.06

0.04

0.02

0.00

0

2

4

6

8

0.4

0.2

0.0

0

2

4

6

8

0.06

0.04

0.02

0.00

0

2

4

6

8

0.03

0.02

0.01

0.00

Concentration = Dose × Bodymass / Intake

112 of 170

Large doses in the realized distribution (arising from the allowed distribution of concentrations) are more common than were originally planned

0

2

4

6

8

0.06

0.05

0.04

0.03

0.02

0.01

0

Dose

planned

realized

113 of 170

Normal approximation

If A+B=C, compute B as C−A under the assumption the correlation between A and C is r = sd(A)/sd(C)

To simulate normal deviates with correlation r, compute

Y₁ = Z₁ × σ₁ + μ₁

Y₂ = (rZ₁ + Z₂ √(1−r²)) × σ₂ + μ₂

where Z₁ and Z₂ are independent standard normal deviates, and μ and σ denote the respective desired means and standard deviations

Uses Pearson correlation (not rank correlation)
Good for multivariate normal, and maybe more

114 of 170

Normal approximation with non-normal distributions

115 of 170

Iterative (trial & error) approach

Initialize B with C−A
This distribution is too wide
Transform density p(x) to p(x)^m
Rescale so that area remains one
Whenever m > 1 dispersion decreases
Vary m until you get an acceptable fit

116 of 170

By trial and error, you may be able to find a distribution of concentrations that yields something close to the planned distribution of doses. This is not always possible however.

0.06

0

2

4

6

8

0.05

0.04

0.03

0.02

0.01

0

Dose

117 of 170

Backcalculation with p-boxes

Suppose A + B = C, where

A = normal(5, 1)

C = {0 ≤ C, median ≤ 1.5, 90^th %ile ≤ 35, max ≤ 50}

2

3

4

5

6

7

8

0

1

A

0

10

20

30

40

50

60

0

1

C

118 of 170

Getting the answer

The backcalculation algorithm basically reverses the forward convolution

Any distribution totally inside B is sure to satisfy the constraint

-10

0

10

20

30

40

50

0

1

B

Algorithm for additive backcalculation of the kernel for B from the equation C = A + B, where the p-boxes for A and the constraint C are known

N = number of percentiles used in the representation; {e.g. 100}

{algorithm below finds the left side of the p-box for the distribution of B}

left limit on B at lowest percentile = left limit on C at lowest percentile – left limit on A at lowest percentile;

for i = 1 to N do begin

set flag to "not done";

for j = 0 to i–1 do begin

if (left limit on C at ith percentile) £ (left limit on A at [i–j]th percentile) + (left limit on B at the jth percentile)

then set flag to "done";

end; {if}

if flag is "done" then

left limit on B at ith percentile = left limit on B at [i–1]th percentile {the one right below it}

else left limit on B at ith percentile = left limit on C at ith percentile – left limit on A at 0th percentile;

end; {if}

end; {for j}

end; {for i, left bound}

{algorithm below finds the right side of the p-box for the distribution of B}

right limit on B at highest percentile = right limit on C at highest percentile – right limit on A at highest percentile;

for i = N–1 down to 0 do begin

set flag to "not done";

for j = N down to i+1 do begin

if (right limit on C at ith percentile) ³ (right limit on A at [i–j+N]th percentile) + (right limit on B at the jth percentile)

then set flag to "done";

end; {if}

if flag is "done" then

right limit on B at ith percentile = right limit on B at [i+1]th percentile {the one right above it}

else right limit on B at ith percentile = right limit on C at ith percentile – right limit on A at Nth percentile;

end; {if}

end; {for j}

end; {for i, right bound} end. {algorithm}

119 of 170

Check it by plugging it back in

A + B = C* ⊆ C

-10

0

10

20

30

40

50

60

0

1

C*

C

120 of 170

P-boxes

Natural compromise that can express both

Gross uncertainty like intervals and worst cases
Distributional information about tail risks

Needed to solve equations containing uncertain numbers

Uncertainty propagation
Constraint propagation
Backcalculation

121 of 170

NASA shielding example

Medical team says

crew must not get doses larger than 2000 grays
95% of doses should be less than 750 grays
most doses should be less than 500 grays

Radiation distribution is uniform

minimum in [0.01, 0.05] rads per second
maximum in [0.08, 0.1] rads per second

Mission could last 30 to 200 hours

How are these constraints expressed with p-boxes?

122 of 170

P-boxes express these constraints

D [Gy]

R

[

s

-

1

Gy

]

0

300000

600000

T [s]

0

1000

2000

3000

0

1

0.0001

0.0011

What are the ordinate axes here?

D

R

T

Exceedance risk

123 of 170

Can’t just invert the equation

Dose = Radiation ×Time / Shielding

Shielding = Radiation × Time / Dose

When Shielding is put into the forward equation, the resulting Dose is wider than planned

prescribed constraint

unknown

known

forward equation

inverted equation

124 of 170

Wrong calculation

S = R × T / D

Tail doses more than 50 times larger than planned!

x

D = R × T / S

0

100000

200000

0

1

D

[

Gy

]

0

100000

200000

0

1

D

[

Gy

]

Exceedance risk

0

0.1

8

0

1

S

0

0.1

8

0

1

S

125 of 170

Correct way uses backcalculation

D = R × T / S

Wait, how could shielding vary?

The all-but-95% range of this p-box answer for S is about [0.95, infinity] (which is a lot thicker than the interval answer [0.72, infinity]) because the all-but-95% range of the allowed dose in this example was only 750 Gy, rather than 1000 Gy.

0

1000

2000

3000

0

1

D

[

Gy

]

0

1000

2000

3000

0

1

D

[

Gy

]

S = 1 / factor(R × T, D)

Exceedance risk

0

1

0

1

8

S

0

1

0

1

8

S

126 of 170

P-boxes better than worst case

More realistic

Need not assume the worst case surely happens
Can account for rarity of extreme events

More expressive

Can distinguish between possible events
Captures complex tolerances (higher doses if infrequent)
Can consider more extreme events than would else be feasible

Better results

Cheaper constraints on shielding
Safer outcomes for the crew

127 of 170

Precise distributions don’t work

Precise distributions can’t express the target

Prescribing a precise distribution of doses says we want some high doses

Any distribution to the left would be better

A p-box on the dose target expresses this idea

128 of 170

Backcalculation algebra

Can define untanglings for all basic operations

e.g., if A × B = C, then B = factor(A, C) = exp(backcalc(ln A, ln C))

Can chain them together for big problems

Assuming independence widens the result

Repeated terms need special strategies

129 of 170

When you need for	And you have estimates for	Use this formula to find the unknown
A + B ⊆ C	A, C B ,C	B = backcalc(A,C) A = backcalc(B,C)
A – B ⊆ C	A, C B ,C	B = –backcalc(A,C) A = backcalc(–B,C)
A × B ⊆ C	A, C B ,C	B = factor(A,C) A = factor(B,C)
A / B ⊆ C	A, C B ,C	B = 1/factor(A,C) A = factor(1/B,C)
A ^ B ⊆ C	A, C B ,C	B = factor(log A, log C) A = exp(factor(B, log C))

to be inside

130 of 170

Conclusion

Planning cleanup requires backcalculation

Monte Carlo methods don’t generally work except in a trial-and-error approach

Should express the dose target as a p-box

131 of 170

Conclusions

132 of 170

Decision making under risk and uncertainty

Estimate risks of bad things happening

Fashion more sustainable & more resilient designs and processes

Minimise adverse impacts, unintended effects

Plan mitigation, remediation or restitution

133 of 170

Don’t drink the Kool-Aid 1978

Modelling is dangerous because we sometimes believe our predictions

“We have to build the best models we possibly can and then not trust them.” Robert Costanza
Risk analysis helps us do that systematically

It also ensures we’re realistic about what is knowable and predictable

Don’t take the blue pill 1999

Be skeptical

Be sceptical

134 of 170

There’s a lot more

Uncertainty quantification (estimation & modelling)
Dependencies, common-cause, common-mode failures
Model-form uncertainty
Numerical uncertainty
Acceleration methods (line sampling, Latin hypercube)
Psychology of uncertainty & risk communication
Backcalculation & design
Validation
Sensitivity analysis
Decision analysis

Complain if they shortchange you

135 of 170

Kinds of calculations

Chance that stress exceeds strength
Probability of a failure or some scary event
How much can a quantity vary
How much could it vary and still be safe
How much must it vary to ensure safety
Where to focus empirical efforts
How best to control the outcome

Answers can be very different with variability/uncertainty

136 of 170

Better safety assessments and reliability analyses

Account for risks and uncertainties in systems and reliability analysis
Should replace average values with distributions and dependencies

component failure rates, restoration times, performance indices

Base case Uncertainty

(no uncertainty) considered

Probability of failure 0.009 0.0072

Frequency of failure 3.284 1.6660

Duration of failure 76.799 63.465

Edimu, M., C.T. Gaunt and R. Herman (2011). Using probability distribution functions in reliability analyses. Electric Power Systems Research 81 (4): 915−921. https://doi.org/10.1016/j.epsr.2010.11.022 [Table 6]

An example reliability analysis on the Roy Billinton test system (RBTS), developed at the University of Saskatchewan for educational purposes. It has 2 generator buses, 4 load buses, 9 transmission lines and 11 generating units. The analysis considers unreliability caused by the generation and transmission systems and the interaction of the two systems.

References

R. Billinton, S. Kumar, N. Chowdhury, K. Chu, K. Debnath, L. Goel, E. Khan, P. Kos, G. Nourbakhsh, and J. Oteng-Adjei (1989). A reliability test system for educational purposes: basic data. IEEE Transactions on Power Systems 4(3): 1238-1244.

Edimu, M., C.T. Gaunt and R. Herman (2011). Using probability distribution functions in reliability analyses. Electric Power Systems Research 81 (4): 915−921. https://doi.org/10.1016/j.epsr.2010.11.022 [Table 6]

IAEA (2008). Measurement Uncertainty: A Practical Guide for Secondart Standards Dosimetry Laboratories. International Atomic Energy Agency. https://www.iaea.org/publications/7913/measurement-uncertainty, https://www-pub.iaea.org/MTCD/publications/PDF/te_1585_web.pdf

*IAEA (n.d.). “Lesson 9: measurement and uncertainty”. International Atomic Energy Agency. https://nucleus.iaea.org/sites/orpnet/training/workplacemonitoring/Shared%20Documents/9%20Measurement%20and%20Uncertainty.pdf

*IAEA (n.d.). “L16 Measurement Uncertainty”. International Atomic Energy Agency. https://nucleus.iaea.org/sites/orpnet/training/Quality%20Management%20System%20for%20Radiation%20Protection%20Monitoring%20Services/Shared%20Documents/DAY_4%20L16%20Measurement%20uncertainty%20RevPV.pdf

Oberkampf WL and Pilch M. Simulation Verification and Validation for Managers. NAFEMS, 2017. ISBN 978-1-910643-33-4. https://www.nafems.org/publications/resource_center/r0121/

137 of 170

Steps in a probabilistic risk analysis

Clarify the questions and gather data
Formulate the model (identify variables and their interrelationships)
Specify distributions for each input
Specify dependencies and correlations
Run simulation
Conduct sensitivity studies
Present results
Discuss limitations of the assessment

138 of 170

Limitations of Monte Carlo

Needs precise, stationary distributions
Nonlinear dependencies often ignored
Unknown dependencies incorrectly modeled
Assumes model is well understood
Can be computationally expensive
Backcalculations difficult
Treats incertitude like variability
Thought by some to be too difficult

139 of 170

Common pitfalls of Monte Carlo

Muddled model
Infeasible correlation matrices
Inconsistent independence assumptions
Multiply instantiated variables
Too few replications
No sensitivity studies
Confounding of different populations

140 of 170

Monte Carlo simulation

How?

replace each point estimate with a probability distribution
repeatedly sample from each, tally answers in a histogram

Why?

simple to implement
fairly simple to explain
summarizes entire distribution of risk

Why not?

requires a great deal of empirical information
usually need to guess some things
only appropriate if uncertainty is statistical
routine assumptions can lead to non-protective conclusions

141 of 170

Probability bounds analysis

How?

specify what you are sure about
establish bounds on probability distributions
pick dependencies (no assumption, independence, perfect dependence, etc.)

Why?

account for uncertainty better than maximum entropy, etc.
puts bounds on Monte Carlo results
bounds get narrower with better empirical information

Why not?

does not yield second-order probabilities
best-possible results can sometimes be expensive to compute

142 of 170

Some of the available software tools

Crystal Ball
Sapphire
Cossan
Dakota
RAMAS Risk Calc

143 of 170

More information from Risk Institute

144 of 170

Questions?

145 of 170

NAFEMS challenge

NAFEMS I is the international association for the engineering modelling, analysis and simulation community

For the links to the challenge, see the GREEN CDT in Nuclear Energy’s risk website https://sites.google.com/view/greenrisk

�NAFEMS is the International Association for the Engineering Modelling, Analysis and Simulation Community

https://en.wikipedia.org/wiki/Simulation_governance

Simulation governance is a managerial function concerned with assurance of reliability of information generated by numerical simulation. The term was introduced in 2011 ^[1] and specific technical requirements were addressed from the perspective of mechanical design in 2012.^[2] Its strategic importance was addressed in 2015.^[3]^[4] At the 2017 NAFEMS World Congress in Stockholm simulation governance was identified as the first of eight “big issues” in numerical simulation.

NAFEMS (UK’s National Agency for Finite Element Methods and Standards) started as a project within the National Engineering Laboratory in Scotland, but it has grown in its scope and mandate

NAFEMS is the International Association for the Engineering Modelling, Analysis and Simulation Community

https://en.wikipedia.org/wiki/Simulation_governance

Simulation governance is a managerial function concerned with assurance of reliability of information generated by numerical simulation. The term was introduced in 2011 ^[1] and specific technical requirements were addressed from the perspective of mechanical design in 2012.^[2] Its strategic importance was addressed in 2015.^[3]^[4] At the 2017 NAFEMS World Congress in Stockholm simulation governance was identified as the first of eight “big issues” in numerical simulation.

146 of 170

147 of 170

Woodpile

148 of 170

Traditional practice

Most assessments are deterministic

Some are deliberately conservative, but

degree of conservatism is opaque, unquantified, and can be inconsistent

difficult to characterize risks, except in extreme situations

149 of 170

Traditional uncertainty analyses

Worst case analysis

Taylor series approximations (delta method)

Normal theory propagation (NIST; mean value method)

Monte Carlo simulation

Stochastic PDEs

Two-dimensional Monte Carlo

150 of 170

Untenable assumptions

Uncertainties are small

Distribution shapes are known

Sources of variation are independent

Uncertainties cancel each other out

Linearized models good enough

Relevant science is known and modeled

151 of 170

Need ways to relax assumptions

Hard to say what the distribution is precisely

Non-independent, or unknown dependencies

Uncertainties that may not cancel

Possibly large uncertainties

Model uncertainty

152 of 170

Incertitude in data measurements

Periodic observations

When did the fish in my aquarium die during the night?

Plus-or-minus measurement uncertainties

Coarse measurements, measurements from digital readouts

Non-detects and data censoring

Chemical detection limits, studies prematurely terminated

Privacy requirements

Epidemiological or medical information, census data

Theoretical constraints

Concentrations, solubilities, probabilities, survival rates

Bounding studies

Presumed or hypothetical limits in what-if calculations

153 of 170

Simmer down…

I’m not saying we should just use intervals
Not all uncertainty is incertitude
Maybe most uncertainty isn’t incertitude
Maybe incertitude is entirely negligible
If so, probability theory is perfectly sufficient

We didn’t stop using Euclidean geometry when they invented non-Euclidean geometry

154 of 170

What I am saying is…

Some analysts face non-negligible incertitude

Handling it with standard probability theory requires assumptions that may not be tenable

Uniformity, Unbiasedness, Independence

Useful to know what difference it might make

Can be discovered by bounding probabilities

155 of 170

Bounding probability is an old idea

Boole and de Morgan
Chebyshev and Markov
Borel and Fréchet
Kolmogorov and Keynes
Dempster and Ellsberg
Berger and Walley

156 of 170

Several closely related ideas

Probability bounds analysis

Imprecise probabilities

Robust Bayesian analysis

Second-order probability

Hierarchical Bayes

Bounding

approaches

Probability bounds analysis (PBA)

A rough and ready technology

Imprecise probabilities

More comprehensive but also more difficult

Robust Bayesian analysis

Handles updating rather than convolutions

Second-order probability

More difficult and less comprehensive

157 of 170

Verified computation of probabilities

Sensitivity analysis

Non-Laplacian uncertainty

158 of 170

Euclid

Given a line in a plane, how many parallel lines can be drawn through a point not on the line?

For over 20 centuries, the answer was ‘one’

159 of 170

Relax one axiom

Around 1850, Riemann and others developed non-Euclidean gemetries in which the answer was either ‘zero’ or ‘many’
Controversial, but eventually accepted
Mathematics richer and applications expanded
Used by Einstein in general relativity

160 of 170

Relax one axiom

IP satisfies all the Kolmogorov axioms

Avoid sure loss and are fully ‘rational’ in the Bayesian sense (except that agents are not forced to either buy or sell any offered gamble)

Relaxes one decision theory axiom: assumption that we can always compare any two gambles

Richer mathematics, wider array of applications

161 of 170

2

4

6

8

10

0

1

-10

0

10

20

0

1

0

10

20

30

40

50

0

1

2

4

6

8

10

0

1

2

4

6

8

10

0

1

2

4

6

8

10

0

1

0

10

20

30

40

50

0

1

2

4

6

8

10

0

1

2

4

6

8

10

0

1

min,

max,

mode

mean, std

range, quantile

min,

max

min,

max, mean

min, mean

Comparing p-boxes with maximum entropy distributions

sample data, range

min,max, mean,std

p-box

Maximum entropy solutions

When you know Use this shape

{minimum, maximum} uniform

{mean, standard deviation} normal

{minimum, maximum, mode} beta

{minimum, mean} exponential

{minimum, maximum, mean} beta

{min, max, mean, stddev} beta

{min, max, some quantiles} piecewise uniform

{mean, geometric mean} gamma

162 of 170

Comparing IP to Bayesian approach

“Uncertainty of probability” is meaningful

Operationalized as the difference between the max buying price and min selling price

If you know all the probabilities (and utilities) perfectly, then IP reduces to Bayes

Axioms identical except IP doesn’t use completeness

Bayesian rationality implies not only avoidance of sure loss & coherence, but also the idea that an agent must agree to buy or sell any bet at one price

163 of 170

Bayes fails incertitude

Traditional probability theory doesn’t account for gross uncertainty correctly

Output precision depends strongly on the number inputs and not so much on their shapes

The more inputs, the tighter the answer

164 of 170

A few grossly uncertain inputs

Uniform

Probability

165 of 170

A lot of grossly uncertain inputs...

Uniform

Where does this surety come from?

What justifies it?

Probability

166 of 170

“Smoke and mirrors” certainty

Probability makes certainty out of nothing

It has an inadequate model of ignorance

Probability bounds analysis gives a vacuous answer if all you give it are vacuous inputs

167 of 170

Wishful thinking

Risk analysts often make assumptions that are convenient but are not really justified:

All variables are independent of one another
Uniform distributions model incertitude
Distributions are stationary (unchanging)
Specifications are perfectly precise
Measurement uncertainty is negligible

168 of 170

You don’t have to do this

Removing wishful thinking creates a p-box

Don’t have to assume any dependence at all
An interval can be a better model of incertitude
P-boxes can enclose non-stationary distributions
Can handle imprecise specifications
Measurement data with plus-minus, censoring

169 of 170

Uncertainties expressible with p-boxes

Sampling uncertainty

Confidence bands, confidence structures

Measurement incertitude

Intervals

Uncertainty about distribution shape

Constraints (non-parametric p-boxes)

Surrogacy uncertainty

Modeling

170 of 170

Can uncertainty swamp the answer?

Sure, if uncertainty is huge

This should happen (it’s not “unhelpful”)

If you think the bounds are too wide, then put in whatever information is missing

If there is no more information, what justifies your belief? Who are you trying to fool?