Published using Google Docs
Public Notes: Logic of Risk - Prof Pasquale Cirillo
Updated automatically every 5 minutes

Logic of Risk

Link to Podcast:

https://www.youtube.com/watch?v=BNwJx_ZhkfY&list=PLgCR5H4IzggGqAlRNRVy7MaFQIzR-9qEs

Link to reference material:

https://www.pasqualecirillo.eu/LOR/

The following notes are reformatted official YouTube transcripts, with added paragraph titles for easier reading.

Table of Content

Table of Content

1- Introduction

Episode 1: What is Risk

Episode 2: Harm as component of risk

Episode 3: The possibility of harm, and intro to probability

2- Definitions of Probability

Episode 4: The classical definition of probability

Episode 5: Moving toward Frequentism

Episode 6: Frequentism, its pros and its cons

Episode 7: Probability as propensity

Episode 8: Propensity in play, from quantum mechanics to one-shot events

Episode 9: Probability does not exist, or the subjectivist approach

Episode 10: Fighting the hydra of frequentism, or why subjectivism is preferable

Episode 11: Fighting the hydra of frequentism, or why subjectivism is preferable

Episode 12: Keynes, Jeffreys, Jaynes and finally Kolmogorov

3- Risk Measures

Episode 13: Back to risk, introducing risk measures

Episode 14: Sigma-algebras and probability spaces

Episode 15: The cumulative distribution function of a random variable

Episode 16: The quantile function, the survival function and the PDF

Episode 17: Extremes and Outliers

Episode 18: Introducing (simple) moments

1- Introduction

Episode 1: What is Risk

Introduction to The Logic of Risk

Here on The Logic of Risk, I’d like to share what I’ve learned—and what I continue learning—about risk. My goal is to convey that risk isn’t something we should automatically avoid at all costs, because without taking risks, there are no opportunities, and consequently, not much real life. Of course, not all risks lead to opportunities, and that’s something we’ll explore together in detail.

I would also like to clarify certain concepts that the media often handle poorly, which in turn can cause social and political confusion. Think of the countless, often specious or misinformed debates about the risks of various diseases, technological decisions, investment choices, and so on. Or the casual comparisons that lump together entirely different types of risk. Or the many so-called “black swans” we see in the news, to the point that now it’s the poor white swan that’s the real exception. Then there are the people who confuse the concepts of mean and median, and write lengthy moralistic articles based on that confusion. And there’s “the risk zero dream,” which I’ll try to show you is basically the same as the evaporation of humanity. In short, there’s plenty to discuss and analyze.

We’ll be covering various topics—some might not seem strictly tied to risk at first glance, but I assure you they either are or will be. Occasionally, I’ll ask you to grab a pen and paper and write down a few notes to think about until the next episode. I hope you’ll join me in that; of course, it’s not required in any way. I don’t force my university students to do things, so I’m certainly not going to force you.

I think that’s enough of an introduction for our very first episode. But before we dive in, one last note: new episodes will be released weekly, and their length may vary. I’ll try to keep each one under half an hour, but I already know I’ll sometimes go over that. Some topics—like the history of risk—will span multiple episodes, others won’t. My plan is to cover just the essential material needed to move forward. Occasionally, I’ll recommend further reading for anyone who wants to dig deeper.

Defining Risk

It might seem straightforward at first, but answering that question is more complicated than you probably think. We will need quite a few conceptual tools for a formal discussion. Let’s start simply. If you open a dictionary, the definition might vary slightly depending on which one you use, but we can summarize it like this: Risk is the possibility of incurring a damage or loss.

Even at this basic level, you can see risk is already a compound concept. Mathematically, we often (though not always) express it as a product of multiple components. What are these components? So far, we have two: The damage—something negative for me, for society, or for someone else. And the possibility that this negative event might happen or might not. Damage alone isn’t enough. If I deliberately and (let’s be honest) rather foolishly smash my hand with a hammer, it’s no longer just a risk but a certainty. At the same time, the mere possibility of an event—something contingent that isn’t guaranteed—also isn’t enough. For instance, there’s no real “risk” in winning a lottery ticket someone else gave me for free, even if winning is intrinsically a probabilistic event.

Quick side note: you’ll notice that I tend to be very precise about how I use certain words, especially at the beginning, to avoid common misunderstandings. For example, I won’t use uncertainty as a synonym for possibility, probability, or eventuality, because uncertainty is something quite specific that actually completes the definition of risk. We’ll talk more about that in future episodes, but let’s at least hint at it now.

The Role of Uncertainty in Risk

If risk is the possibility of incurring damage, then in order to address or manage that risk, we need two things: we have to be able to identify the damage, and we need to attach some measure—or at least an estimate—of the probability that this damage might occur. Therefore, risk requires some kind of quantifiability in its dimensions.

What happens if we know something negative might happen, but we have no idea what the probability is? Or if we can’t even qualify the event—let alone quantify it? Consider this scenario: I tell you that if you turn the corner, something good or something bad will happen to you. I’m deliberately excluding the possibility that nothing happens at all. Obviously, you have no clue what’s behind that corner. It could be wonderful—something you’ve always wished for—or it could be as trivial as a kick in the backside. Maybe you’ll find ten thousand dollars on the ground, or maybe there’s a sniper waiting for you. You don’t know what’s there, nor do you know the likelihood of each outcome, apart from the trivial “something will happen with probability 1”—or “almost surely,” as we say in probability theory. (We’ll come back to why we say “almost” another time.)

This is a situation of uncertainty, not risk. When we can’t quantify the damage, can’t attach a probability to its occurrence, or can’t do both, we call it “uncertainty” rather than “risk.” We’ll see there are various nuances—gradations, if you will—of both uncertainty and risk. And it’s important to realize these concepts are dynamic, not static, despite what many people assume.

Refining the Definition of Risk

Let’s get back to risk. The dictionary definition is a good start, but after many years working on risk, I think it can be refined. Specifically, I like to combine insights from authors like Nassim Nicholas Taleb, Gerd Gigerenzer, Daniel Kahneman, and Nicholas Rescher (we’ll meet them in future episodes). For now, bear with me while I offer a definition of risk that is both theoretically sound and practically useful. The dictionary’s version doesn’t tell us why we might face risk, nor does it give us a helpful context for risk management strategies.

From now on, when we talk about risk, we’ll define it as “the possibility of a negative event—that is, one that causes damage—brought about by internal and/or external vulnerabilities or factors, which can potentially (at least partially) be contained or even prevented.”

Internal vs External Factors in Risk

Internal factors are often called endogenous, while external factors are exogenous. What’s new here? First, we clarify that risk stems from something we can identify—some vulnerability or factor—so we don’t blur the line with pure uncertainty. We also recognize that the causes can be internal or external. For example, if I am hit by a car while crossing the street, there might be a thousand explanations. Maybe I was careless and didn’t look both ways. Perhaps the driver was distracted or, worse, intoxicated. Maybe a tire blew out and the driver lost control. Some factors depend on me (internal/endogenous), while others do not (external/exogenous). In general, I can more effectively address internal factors than external ones.

Please notice that our definition adds that risk can potentially be contained, at least partially. This is the starting basis for risk management, because we’re discussing something quantifiable, whose causes we can at least partly trace, and therefore we have the hope and ambition to intervene.

The Subjectivity of Risk and Probability

It may look like a simple tweak, but this definition runs deeper than it appears. In just a few lines, it raises very non-trivial issues we’ll return to again and again. We noted that the damage must be quantifiable, and the probability of occurrence must also be quantifiable—at least within some confidence interval. Obviously, the notion of “damage” can be subjective: what constitutes damage depends on who’s experiencing it. In decision theory and economics, for example, they use utility functions to describe this subjectivity.

Losing a thousand dollars isn’t the same for a factory worker as it is for Jeff Bezos. Breaking a femur is different for someone in their 30s than for someone in their 90s. A major earthquake has a vastly different impact in a wealthy country with robust infrastructure, like Japan, than it would in a very poor country where people live in huts made of mud and straw and have no earthquake preparedness.

Even if, in everyday speech, we often treat risk as something objective, it’s actually subjective. Sometimes the subject is just you or me, other times it’s an entire group or society—but you can’t detach risk or its perception from the individual or group that could be harmed. This, of course, influences how we manage it, if only because there are constraints (physical, monetary, psychological, and so on).

On top of that—and we’ll dive much deeper in upcoming episodes—the concept of probability itself can be subjective. There are different philosophical “schools” of probability theory; some are more “objective,” like the axiomatic approach, but choosing one school over another is, ironically, subjective. Likewise, nothing is purely objective in estimating the probability of a risk event, regardless of the theoretical approach. Just picking a particular model depends on my personal preferences and experience, on the information available to me, and so forth. That’s one reason we consult experts for complex risks we’re unfamiliar with. Of course, experts are people too, and they face “model risk” (that is, the risk of being wrong in their approach). We’ll come back to that as well.

The Limitations of Risk Management

All this is not an elegant way of saying we cannot do anything about risk, hiding behind subjectivity. We can do a lot! I just want you to see, right here in this first episode, that things are more complex than the simplistic formula “potential loss times its probability.” Formulas—big, small, or in-between—are incredibly useful, but only when applied with an ounce of caution. We have to understand their limits.

Because, unless we fancy ourselves omnipotent and omniscient—attitudes you might see in certain politicians or media personalities—then we do have limitations. And ironically, recognizing limitations can become a strength in risk management. It’s all about knowing what they are and planning accordingly.

Risks We Take vs Risks We’re Subjected To

I want to wrap up this first episode with one final observation that might seem obvious but is actually quite important: there’s a difference between risks we take on and risks we’re subjected to. Risks we take on are those we choose to incur, like the so-called “choice of action.” We deliberately do—or don’t do—something that creates or amplifies one or more risks. Risks we’re subjected to are those that go beyond our control, such as large-scale natural disasters, pandemics, or economic/financial crises (especially if we are not in that sector; if we are, I tend to be less forgiving).

Clearly, risk management differs depending on which category you face. In the first, we can play offense; in the second, we’re forced to play defense. But as I’ve noted, risk and uncertainty are dynamic concepts, so it’s totally possible—and, in fact, quite common—to move from one category to the other. Take the last pandemic. Initially, it was something we were all subjected to. (Whether it was entirely unforeseen is a separate conversation—spoiler alert: “no” at the systemic level, “maybe” at the individual level.) Over time, however, our management of the situation shifted toward actively taking on certain risks.

Final Thoughts

When it comes to actively assuming risks, there’s an adage I find critical. Using Nassim Taleb's words: it’s better to assume risks you can quantify than to quantify risks you’ve already assumed.

Thank you so much for listening. If you’d like, you can share this podcast—available on all major platforms—and leave a review or five-star rating, whatever you think is fair. I hope to have you back for the next episode. Let me leave you with a quote from Sigmund Freud: “When you eliminate risk from your life, little remains.”

Episode 2: Harm as component of risk

Introduction: Philosophical Foundations of Risk and Uncertainty

In these initial episodes, we are a bit more—allow me the expression—philosophical, and less quantitative. That’s not a quirk but a necessity. It’s crucial to understand the fundamental characteristics of risk and uncertainty before we can address their modeling in a satisfactory way. The clearer we are about the problems we are working on, the better equipped we will be to tackle them.

Defining Risk and Uncertainty

In the last episode, we gave a preliminary definition of risk, which I will repeat here for convenience. For us, a risk is the possibility of harm or damage—that is an event with negative consequences—caused by internal and/or external vulnerabilities or reasons, which can potentially be contained, at least in part, if not entirely avoided. This definition expands and makes more operational the simpler version you’d find in various dictionaries. We also said that uncertainty is something else, which, with a bit of terminological abuse, we labeled as “unquantifiable risk.” But as I mentioned, that is just the start. To manage risk effectively, we need more precision.

Exploring the Concept of Harm

Today, I want to take a closer look with you at the concept of harm. Since risk is the possibility of harm—an event with negative consequences, and so on—how can we evaluate that harm? How can we weigh its inherent negativity?

The Problem of Subjectivity in Risk Evaluation

In the previous episode, I brought up the problem of subjectivity. It is very hard to separate potential harm from the subject to whom that harm applies—and also from the evaluator, as will become clear if those two individuals are not the same person. If we consider the loss of 1,000 dollars (from our example last week), we can assume nobody wants to lose that amount of money, but the impact certainly differs for someone who is poor versus someone who is wealthy.

Tackling Subjectivity: Keeney and Rescher's Approach

As far as I am concerned, one of my favorite ways of tackling this issue—and certainly one of the best ways to start, striving for both simplicity and rigor—is inspired by Keeney and Rescher. Let us begin by saying that the impact or magnitude of a harm, of some negative event, can be described succinctly by three dimensions: nature, extent, and timing.

Nature of Harm: Types of Negative Consequences

Nature is simply the type of harm we are assessing. Is it material? Financial? Psychological? A physical injury? Death? Boredom? The cancellation of our favorite TV series? Essentially, we care about its qualitative nature. We will see that the most sensible and reliable comparisons—so as not to fall into a long list of fallacies (which we will discuss later) and to avoid ethical complexities—are comparisons between harms of the same nature. We don’t want to compare apples and oranges.

Extent of Harm: Severity and Distribution

Extent—more quantitative than nature—can be divided into severity and distribution. Severity is some measure of the loss generated by the negative event. We use “loss” in a broad sense, not necessarily financial. If you like, it answers the question, “How much?” This question will be phrased according to the type of harm. So it will always be a conditional quantification. Of course, the context matters. Are we talking about units of time, money, health status? Knowing this is of paramount importance.

Distribution pertains to the type of individuals affected by the harm and their number. Are these well-identified individuals, or do they fall into what we might call a “statistical blur”? For example, are we talking about three firefighters who died trying to extinguish a specific fire, or more generally about on-the-job fatalities in a given country in 2024? When evaluating distribution, in order to facilitate (among other things) risk communication and to ensure reliable and acceptable estimates (remember, not objective estimates), we also need to look at our relationship with the people harmed. Are they us? Are they our family? Friends? Strangers? Do they live close to us, and thus potentially affect us? This is one of the many reasons why experts are expected to be both competent and independent.

Timing of Harm: Imminence and Duration

Timing is the third dimension of a harm’s magnitude. Are we talking about an imminent or a future negative event? Is it a one-time event or something that could recur? How long does exposure last? A disaster that kills 50,000 people in one go is one thing. A chronic situation that kills the same number over a generation is another. And yet another if we are talking about future generations, with hope that something might be done in the meantime.

Measuring and Comparing Risks: Quantifying Harm

To evaluate and compare risks—and thus to rank them—we need to measure them, quantify them, so we can assess their relative severity. Unfortunately, there is no guarantee that different negative factors can be measured in the same units, making them directly commensurable. As mentioned earlier, this is especially true if the harms differ in nature. How can we unambiguously compare a financial loss to pain, or to a prolonged feeling of boredom, or to other similar things? Is it worse to break a leg or lose 10,000 dollars? It gets even more complicated when the decision isn’t for us, but for someone else.

Issues with Quantitative Risk Comparison

And don’t think that the quantitative aspects are free of their own issues. Is a severe but less widespread harm better or worse than a less severe but more widespread harm? This brings to mind many comments about different contagious diseases, for example. I find certain absolute stances on these issues both discouraging and, paradoxically, ethologically fascinating. Perhaps the people who take these stances have preference orderings with zero ambiguity, or maybe they are just talking for the sake of talking.

A Provocative Example: Balancing Severity and Distribution

Let me offer a provocative, yet current, example—exaggerating, but not by too much—the difficulty of balancing severity and distribution when evaluating the extent of harm. We are essentially discussing what we might call the risk profile. How can we compare a scenario where many people are incapacitated to one where just a few die? Someone might suggest using lost workdays as the unit of comparison. Mind you, it is not so unusual a choice as a comparison metric, regardless of what you might think. We will talk more about the value of life in later episodes.

Comparing Two Scenarios: Workdays Lost vs. Fatalities

Now, consider these two alternatives:

Situation 1: Two twenty-year-olds die, each with fifty potential working years ahead of them.
Situation 2: Temporary incapacitation of 15,000 workers due to an illness that causes, on average, 10 days of lost work.

In the first scenario, we can estimate workdays lost as 2 times 50 times 365. That comes to 36,500 lost workdays, and we are clearly exaggerating the number of workdays in a single year. In the second scenario, we have 15,000 workers times 10 lost workdays on average, that is, roughly 150,000 lost workdays. So if we only look at workdays lost, the first situation appears preferable to the second. But I don’t have to imagine too hard to see the outrage on many faces—and I totally share that outrage.

The Illusion of Comparability in Risk

This is a simple but hardly uncommon example showing that comparing risk situations is anything but trivial. You can’t just calculate two quick numbers and put them side by side. I love calling these situations “illusions of comparability.” The fact that different parameters we use to estimate harm may not be commensurable—especially between different types of harm—is one of the fundamental and far-reaching aspects of risk management. It is crucial to grasp this, both for developing a healthy approach to risk and for avoiding problems for ourselves and others. We will come back to this concept more than once.

The Fallacy of Universal Risk Metrics

Just as we accept that distance and temperature can’t be expressed using the same unit, we need to be prepared to understand that in certain situations, different risks aren’t trivially comparable with a single number, even if they might look like it. They are simply not commensurable. Yet we often hear statements like, “The risk of contracting this disease is lower than the risk of being eaten by a shark.” When we talk about the fallacy of scale, we will start right there.

The Way Forward: Subjectivity and Value Scales in Risk Evaluation

So what do we do? Do we give up? Of course not—this is precisely why we are here. Whenever we consider a particular class of harm—that is, we fix the nature of what we are assessing—it is always possible to compare different situations within that class. Once we fix the nature of the harm, we are left with its extent (in its two components: severity and distribution) and timing.

Defining Risk Profiles: Levels of Severity, Distribution, and Timing

Without having a precise risk measure yet, let us assume that for each of these characteristics, we can assign one of three levels: high, medium, and low. So, a given harm might be characterized by a high level of severity but a low level of distribution. When it comes to timing, simplifying a bit for now (thus ignoring frequency, for instance), we say high means imminent and low means far-off in the future, with medium somewhere in between.

Ranking Risks: A Simple System of Triplets

The worst harm would then have the triplet: high, high, high—that is, high severity, high distribution, and high timing. Conversely, the least impactful harm would be low, low, low. Between these two extremes, we can list 25 other combinations of high, medium, and low. Altogether, that is 27 possibilities—3 to the power of 3 (three distinct levels for three different characteristics).

Transitivity and Ranking: A Logical Approach to Risk

Some of these situations can be easily ranked. For instance, we’d say that in terms of risk, the high, high, high scenario dominates all the others. If in terms of risk X dominates Y, that means X is riskier than Y. By the same logic, medium, high, high dominates medium, medium, high, as well as medium, high, medium, and low, high, high, because if two factors remain the same and one improves (that is, it becomes less dangerous), the risk cannot increase (to be precise, we should say it cannot get worse, but we will get back to this later). The same holds if one factor remains the same and the other two improve, or if all three improve. Transitivity also applies in general, meaning that if the triplet medium, medium, high dominates medium, medium, medium, then the triplet medium, high, high also dominates medium, medium, medium.

Comparing Complex Risk Profiles: Incomplete Answers

Now, because I don’t want to lose you—and I know listening to triplets is boring—it is best to sum all this up with a graphic that lays out these different triplets and shows how they compare. But how can we do that on a podcast? You will find the answer on thelogicofrisk.com. From now on, you will find any extra materials for each episode right there. For Episode 2, the relevant figure is already waiting for you.

Subjectivity in Risk Evaluation: The Role of the Evaluator

It becomes trickier to determine whether medium, high, high dominates high, medium, high, because you don’t see a straightforward improvement in one factor while holding the others constant. In that case, for one factor that improves, another factor gets worse. For the time being, the best answer is that we can’t automatically say. We don’t have enough tools for that yet, so we will come back to it.

The Role of Axiology in Risk Evaluation

If you are really curious, we can say that comparability and the possibility of consistent ranking will subjectively depend on who is evaluating the risk. That requires an axiological perspective, or in other words, a value scale. Two harms—especially if they differ in nature—are not automatically comparable or commensurable in their intrinsic qualities, but they can become so extrinsically through the evaluator. This, however, raises the issue of the evaluator’s reliability, which leads us to the question of experts and a host of related complications that will take more time to address. We will come back to them in due course.

Conclusion: The Complexity of Risk Management

Naturally, if we decided to use more characteristics to define the magnitude of a risk, we could talk about quadruplets, quintuplets, and so on, instead of triplets. But that detail isn’t really important right now. What matters is understanding that describing harm properly requires a profile, and this profile can be obtained by considering different features, each measured in some way. We are the ones who choose and manage these features, and therefore the harm profile—and, more generally, the risk profile. There are heuristics and conventions, but there is no single, always-valid solution, nor a single, absolute truth.

Closing Thoughts on Risk Assessment

Believing that we can find one universal metric—perhaps using some ready-made little model—to solve every problem is akin to returning to the lost-workdays example, and probably preferring fatalities. Risk assessment, starting with assessing harm (because then it all gets more complicated when we factor in probability), has both a factual and a normative component. The factual component is the scientific one, related to characterizing the harm, its severity, its distribution, its timing, and—later on—its probability. I said scientific, not objective. It is a matter of observing phenomena, gathering data, theorizing, creating and choosing models, and extracting information in a largely inductive manner, following the scientific method and best practices wherever possible.

But this information then needs to be processed, weighed, and assigned values. We need to decide what is acceptable and what is not. Let us remember: it is not the facts that decide, but people. People are the ones who transform facts into values. Of course, for those judgments to be intelligent and—if you will allow me the term—rational (we will return to the concept of rationality later), they must take the facts into account. However, we cannot separate the responsibility of comparing different negatives from the person doing the comparison, and then blame it all on “the nature of things.” In risk management, reality is never entirely separate from the subject, whether that subject is the risk manager or the potential victim. I can’t emphasize this enough. Claiming otherwise invites trouble and mere buck-passing.

Closing Quote

Well, I think that is enough to bore you with for today. I want to wrap up this episode with a quote from T.S. Eliot: “Only those who will risk going too far can possibly find out how far one can go.”

Episode 3: The possibility of harm, and intro to probability

Episode Overview: Defining Risk and Harm

In our first episode, we gave an operational definition of risk. Do you remember? We said that we call risk the possibility of harm, that is, a negative event caused by internal and/or external vulnerabilities that might be at least partially contained, if not avoided altogether. We also mentioned how it is necessary to delve deeper into this definition. That is why, in the second episode, we focused on one of its components: harm.

We highlighted how quantifying harm is trickier than it seems, especially if we want to do things properly. We talked about the need to create a harm profile, meaning we need to analyze harm from several perspectives like its nature, extent, and timing. For each of these characteristics, we aim to estimate and express levels of severity. We also noted that comparing different harms is far from trivial. Direct commensurability is often a luxury. We suggested the need for an axiological perspective, and an analysis not only of harm itself but also of the evaluator who is measuring that harm.

Shifting Focus: The Possibility of Harm

Today we are taking a step sideways. We are shifting to another fundamental element of our risk definition: the possibility of harm. Notice that our discussion on harm itself is not complete yet, but we need some tools before we can proceed.

When we talk about possibility, we are referring to the trait of something that may or may not happen. Something contingent or accidental. What we want to do here is figure out how to formally—and hopefully quantifiably—describe that vague expression “may or may not happen.”

For our purposes in risk management, we can’t do without a way to describe possibility. We have to quantify risk, and that means we need to quantify its components. Remember: if one of the building blocks of risk can’t be quantified, we slip into uncertainty, which is a different thing.

Formalizing Possibility: Introducing Probability

If we were in a classroom right now, this is when I would pull out one of my favorite questions: How can we formalize and quantify the possibility of harm? At that point, the classroom would go silent, and I would insist that I don’t bite. More silence. Then I would say something like, “Don’t be afraid,” emphasizing that no answer at this stage could be wrong—in fact, any response would help clarify our thinking. But still, silence. Then I would pull out a chocolate and promise it to whoever dares speak first.

Usually, at that point, the most common answers are: “Use probability,” “Use probability theory,” “Use probability calculus,” or some variation on that theme. I would end up handing out way more chocolates than just the one I promised.

Probability: A Partial Answer

Now, each of these answers would be undoubtedly correct but also partial. First, probability is one way to describe the possibility of an event, formalizing that “may or may not occur.” It is certainly the best-known approach—at least by name—and also the most widespread. But it is not necessarily the most useful in every situation.

We can say it is the most useful in the vast majority of cases, but there are definitely non-negligible situations where a different approach might be better. For example, courtrooms, where possibility theory or imprecise probability might be preferable.

The same goes for risk management when we find ourselves in those gray areas between fully quantifiable risk (a situation that basically only exists in theory) and radical uncertainty, that is, areas where some part of the randomness can be measured, but the rest cannot, or at least not precisely enough to be handled by probability theory.

Choosing Between Probability Schools

Of course, we will only select what is really needed for our goals and avoid purely speculative theories like free probability or various technicalities, even though they are absolutely fascinating for experts. Anyway, even if we decide to narrow our focus to only those cases where probability theory is standard and works well, there is something that usually surprises my beginner students: simply saying “probability” doesn’t say much.

What kind of probability? The classical interpretation? Frequentist? Subjectivist? Logicist? Axiomatic? Popper’s propensity theory? Yes, there are indeed multiple definitions of probability, each stemming from different schools of thought. I warned you it wasn’t all that simple. And as I said, choosing one school over another is an expression of preference, which is why I find it amusing that some people like to treat probability as “always and inevitably objective.”

The Subjectivist Approach to Probability

Let me be clear straight away: I’m not interested in the competition between different probability schools of thought. I’ve always found the dogmatism of some frequentists as silly as the dogmatism of certain Bayesians. Probability theory draws its true strength from being richly multifaceted.

If I had to express my personal opinion, though you can probably tell by now, I find the subjectivist approach the most suitable for the majority of cases. I lean toward a definition of probability rooted in the works of Ramsey and de Finetti, with some “correctives” from Jaynes to avoid the mind projection fallacy.

The Mind Projection Fallacy

This fallacy can appear in two different ways. First, there is the notion that our own worldview perfectly reflects reality—that is, that our viewpoint is the absolute Truth with a capital T (a fatal mistake in risk management, among other reasons, because it rules out model risk and leaves you wide open to disaster).

Second, there is the assumption that just because we don’t know something, it must be unknown or unknowable to everyone else. In other words, the mind projection fallacy is basically the perfect summary of how social media often work.

The Logic of Probability

Speaking of Edwin Thompson Jaynes, I have always loved the title of his textbook for Cambridge University Press: Probability Theory: The Logic of Science. For those with a technical interest, I suggest buying it. However, the title alone tells us something relevant: probability theory can be seen as a new logic of science, extending the classical logic that traces back to Aristotle (and many others, of course, with fantastic results).

While classical logic is perfectly at home in the Newtonian world, it finds itself shaky—sometimes impotent—when faced with the developments of the last century. It is quite hard to tackle quantum mechanics, quantum chemistry, computational biology, bioinformatics, or the thousands of applications of statistics in the human and social sciences without probability and ambiguity.

Examining Different Definitions of Probability

But let’s return to our main topic. I want to walk you through the major definitions of probability to understand their differences and strengths. And I will say it again: none is always and inherently superior to the others. None—including the one I, or any of you, might favor. The single exception might be the classical definition, whose field of application is quite limited and is mostly pedagogical in nature.

When we get to the applications, I will try to show you why I think that the subjectivist view might be better in certain areas, but I am the first to switch to a frequentist approach when it is quicker and does the job. Theory should adapt to practice, not the other way around. Every definition has its strengths and weaknesses, and if you are aware of them, you can enjoy the luxury of picking whichever best suits the problem at hand.

Application of Probability

There is some good news, though. Once we select a definition of probability, the way we play with it and work with it stays basically the same (barring some very picky technical details that we will skip in this podcast). So the definition of probability won’t affect how we can define risk measures, but it can influence how we interpret them.

Random Phenomena: Mass vs. One-shot

So let’s start our discussion by looking at the different ideas of probability and figuring out when each might be preferable. Later, we will talk about cases where probability alone won’t cut it and we need something else.

For now, we are interested in studying what are often called random phenomena, that is, phenomena whose outcomes are not certain but can happen in different ways, which we will call events, each with some probability of happening. When dealing with random phenomena, we make a distinction often credited, among others, to some works by my friend Nassim Taleb, whose various books I recommend, starting with Fooled by Randomness. The distinction is between mass random phenomena and single or one-shot phenomena.

Mass Random Phenomena

Mass random phenomena are those that might be hard or even impossible to predict in certain details—at the micro level, you might say—but that become predictable and manageable at a macro, more general level. A famous example from Polya, echoed by many, goes like this: Imagine you’re on an open-air terrace with no roof, and it starts to rain. The terrace is tiled with large slabs—let’s say there are 20 in total—and you want to figure out the probability that a raindrop falls on a particular tile.

For instance, if we label the tiles in some way, we might observe the first raindrop landing on tile 5, the second on tile 9, the third on tile 13, the fourth again on 5, and so on. Where will the 15th drop land? That’s really tough to answer. Even with a model like a Poisson process, any answer would still be a very bad approximation. However, if it keeps raining, we know that all tiles eventually get completely wet. This information, plus the data we gather on where and how raindrops have fallen, helps us study the event “15th raindrop on tile X.”

Single or One-shot Phenomena

Single or one-shot phenomena, on the other hand, are events that have never happened before, for which we have zero hope of gathering data. They can be of utmost importance, yet remain incredibly hard—or impossible—to solve under various definitions of probability.

For a one-shot phenomenon, there’s no historical information; at best we can draw analogies with phenomena we believe to be similar, but that’s not always helpful. Do you want an example? What is the probability of a terrorist attack in Piazza del Duomo in Milan, Italy? What’s the probability that a drone flying over an airport like JFK will cause a plane crash with many casualties? What’s the probability of a specific genetic mutation?

Our goal is to try to answer these questions, all the while recognizing that sometimes a precise answer simply can’t be found.

Basic Facts About Probability

No matter the definition of probability we use, all probabilists agree on some basic facts. Probability is one way to represent randomness. Given an event, its probability is expressed as a real number between 0 and 1. If you prefer, you can express probability as a percentage—so 0%, 10.56%, 100%, and so on. If an event has probability 0, we say it’s impossible, or more precisely almost impossible. If it has probability 1, we consider it certain, or better, almost certain.

I already mentioned the idea of “almost” in the first episode. Although we have not introduced the most technical details yet, we can already sense what this means. We have been repeating that there are multiple definitions of probability, so an event might be “almost certain” under one definition (probability 1) but have probability 0.87 under another. And even using the same definition of probability, you and I might have different theoretical models or different ways of estimating probability, so we might reach different conclusions.

The Assumption of Equivalence in Probability

So for now, when you hear “almost,” think... “in my opinion, according to my model.” One of the fascinating things about probability is that, when you really get down to the nitty-gritty, nothing is ever 100% certain—not even certainty itself.

We will make several assumptions when defining probability according to the different schools of thought. But there’s one assumption, which might be pushing it a bit, that I would like to introduce already: equivalence. It is crucial in mathematical finance, in decision theory, and in many branches of statistics.

Equivalence and Its Importance in Application

In our discussions and applications—when using different probabilities or measures of probability—we will assume they’re equivalent. In simple terms, this means that if an event is impossible according to one definition or model, it will also be impossible under any other definition or model.

If our probabilistic views are equivalent, my partner and I might disagree on what’s possible—I might say there’s a 26% chance of rain tomorrow, while my partner says 62%—but we will both assign probability zero to the event “Italian politics suddenly becomes trustworthy tomorrow.” (Yes, I know, that’s a low blow.)

Conclusion: Understanding Probability in Practice

Those familiar with mathematical finance know about changing measures between the market or physical probability P and the risk-neutral, or martingale-equivalent, probability Q. Those two measures are constructed to be equivalent. They will always agree, for example, on the fact that the price of an asset can’t be negative (ignoring fees and the like), but they may disagree on the probability that its price will exceed 100 dollars tomorrow. That difference between probabilities is called an error or a distortion, depending on our perspective, and it will matter for us.

Wrap-Up: Closing Thoughts

So, ok, yes, probability is a number between 0 and 1. Anything else? Well, plenty more. But we will continue with all this stuff next week.

As always, I want to end this episode with a quote. Today, I leave you with the words of Pierre-Simon Laplace: “The theory of probabilities is at bottom nothing but common sense reduced to calculus.”

2- Definitions of Probability

Episode 4: The classical definition of probability

Introduction to Probability
Hi there, once again welcome to The Logic of Risk. In this episode, we continue our discourse about probability.
Last week, we said that probability is a real number between 0 and 1, which we use to quantify the randomness of an event. A probability of 0 indicates that something is not possible, while 1 indicates almost certainty (or, if you prefer, certainty in practice). We also mentioned a few other points, and I just refer you to the last episode in case you missed it.

The Complement of an Event
We know that if an event has a given probability—say 20%—its negation (which we formally call the complement of that event) has a probability of 80%. In decimal form, that’s 0.2 and 0.8.

The Coin Flip Example
If I flip a coin—which we will assume is fair for now—there’s a 50% chance I’ll get heads and a 50% chance I’ll get tails. If the probability of heads is 0.5, then the probability of tails (its complement) is also 1−0.5=0.5, and together they sum to 1.
If I asked, “What’s the probability of getting heads or tails from flipping the coin?” you’d say 1, because one or the other will happen. The probability that an event or its complement happens is always 1.
However, even that coin-flip example isn’t entirely correct, because in reality, a coin might land on its edge, leading to a third possible outcome: neither heads nor tails. In an ideal game, we might decide to rule that out as impossible, but in real life it isn’t.
Someone might say, “That event is practically impossible—let’s just call it impossible.” And I might even agree—but let’s wait until we talk about “effective nullity” (or effective zerohood).
What happens if our model says the event is impossible, yet it actually happens? Do we yell “Black swan!” or curse bad luck and blame the evil eye?
If we don’t anticipate anything terribly negative from a coin landing on its edge, ignoring it might be okay. But if the consequence, though remote, is devastating, we should think twice.
This, by the way, is exactly what certain financial wizards did with the possibility of major banks defaulting during the 2007–2008 financial crisis, and we all saw how that turned out.

The Sample Space
In general, we should never (and I repeat: never) consider any event completely impossible or completely certain, unless we’re talking about special cases—usually purely logical in nature—following what is known as Cromwell’s rule. We’ll come back to that.
Studying probability theory means trying to answer slippery questions, and those answers often lead to even more slippery questions, though they do get more and more fascinating as we go.
A fundamental part of defining probability is analyzing the concept of an event—the object to which we assign probability. An event can be simple or composite.
Consider a standard six-sided die. We assume that if we roll it, it can only land on one of its faces—no crazy balancing acts for the moment.
If I roll the die once, the possible outcomes are: 1, 2, 3, 4, 5, or 6. So there are six possible results. If I gather these results in an ensemble—imagine physically putting the numbers 1 through 6 into a little bag—I get what we call the sample space in probability theory.
The sample space is just the set of all possible outcomes for a random phenomenon. In the case of a die, enumerating them is easy. In even slightly more complex scenarios, enumerating them becomes a lot less straightforward, though still usually doable.

Simple and Composite Events
An event may correspond to one specific outcome, which we call a simple or elementary event. For example, rolling the die once and getting exactly 3 is a simple event.
But we can also have composite events that group multiple outcomes. For instance, I might be interested in the probability of rolling an even number. In that case, the event manifests itself if I roll a 2, a 4, or a 6. That means I’m merging multiple outcomes into one single event.
Returning to the simple event of rolling a 3, I can also implicitly define its complement—namely, rolling anything but 3. That complement is a composite event consisting of rolling a 1, 2, 4, 5, or 6. Among all the events we can imagine, one is called the certain event—and yes, it’s essentially the entire sample space.
If my event is “any of the six sides of the die comes up,” it obviously has probability 1, because rolling the die once will definitely yield one of the numbers 1 through 6.
Another important event, the empty or impossible event, is the opposite of the entire sample space. Mathematically, it is represented by the empty set—the set that contains nothing, not even itself—and its probability is always 0. Stretching the concept slightly, we could assign the empty set to all those outcomes that really can’t happen, like rolling a 7 on a single six-sided die or rolling nothing at all.

The Event Space
Another step forward: alongside the sample space, we define the event space, which is the set of all the events a random phenomenon can generate—that is, the set of all possible single outcomes and all possible combinations of those outcomes. So it is the set of every simple and composite event.
Of course, the event space depends on the phenomenon we’re considering. The event “rolling a 1 AND rolling a 2” is impossible if I roll the die only once. But with two rolls or two dice simultaneously, that event becomes possible.
When we get to the axiomatic definition of probability, we will formalize these concepts a bit more—without going overboard—but for now, that’s enough.
If any of you already know all this, I apologize for the many simplifications and approximations. They serve the purpose of helping everyone follow along. This is meant to be an exoteric, not esoteric, podcast—open and accessible to anyone interested, and not just a restricted club for initiates.

Mutually Exclusive and Independent Events
Two events are called mutually exclusive if they cannot happen simultaneously. If a box contains a red and a blue ball, and I draw one blindly, I will end up with either the red or the blue, but not both at once.
In the die example, if I get one face out of six, I can’t get a different face at the same time. Similarly, “getting a number lower than 2” and “getting a number greater than 4” exclude each other in a single roll.
If two events are mutually exclusive, the probability that either event happens (the probability of their union, we would say) is simply the sum of the probabilities of the individual events. Conversely, the probability that both happen together is necessarily zero.
But many events are not mutually exclusive, and in that case, we have to pay attention to what happens if they happen at the same time.
I might want to look at the probability of rolling a number less than 3 while also being odd. These events can definitely happen together—rolling a 1, for instance—so we’re interested in their intersection.
Two events are called independent if the probability of one does not depend on whether the other has happened. Otherwise, they are dependent. Working with dependent events is substantially more complicated than working with independent ones.

The Classical Definition of Probability
All of this sets the stage for introducing a first definition of probability, the so-called classical definition. As mentioned, and as we will soon see, it is mainly of pedagogical value—barely useful outside certain well-defined, theoretical gambling scenarios.
The classical definition of probability has its roots in Bernoulli and Laplace, and even earlier in the Renaissance work of Pacioli and Cardano, later taken up by Pascal and Fermat.
According to this definition, the probability of an event is the ratio of the number of favorable outcomes (those that satisfy the event condition, without implying any sort of positive or negative judgment) to the total number of possible outcomes, assuming they are all equally likely. Clearly, we have to be able to count all those outcomes in the first place.

The Chevalier de Méré's Puzzle
One early puzzle that brought this “counting outcomes” approach into the limelight was posed by the Chevalier de Méré, a 17th-century nobleman and enthusiastic gambler. He was stumped by wagers that seemed straightforward but consistently lost him money.
Specifically, he asked Blaise Pascal, “Which is more likely: rolling at least one six in four throws of a single die, or rolling a double six in 24 throws of two dice?” The Chevalier assumed the two events would be equally likely.
However, by systematically enumerating all possible outcomes and comparing them to the total, Pascal showed that an intuitive guess can be misleading, in probability. He discussed the results with Pierre de Fermat, making some moderate fun of the arguments of the Chevalier de Méré.
If you’re curious, the event of rolling a double six in 24 throws of two dice is slightly less likely.
This puzzle and others like it helped cement the idea of the classical definition—where each outcome is presumed to be just as likely as any other.
The classical definition is indeed linked to the “principle of indifference,” which says that in the absence of strong evidence to the contrary, we should not consider some outcomes more likely than others.
But is that always the right move? If I roll a standard six-sided die, it’s easy to assume each face—1, 2, 3, 4, 5, 6—has an equal chance, and that’s usually correct for a fair die.
If I consider a single roll of such a fair die, the probability of any one face is obviously 1/6 (about 17%). I have one favorable outcome out of six possibilities, each presumably equally likely.
But what if the die is chipped or we suspect it is loaded?
Then the principle of indifference doesn’t hold as strongly. Any time we can’t be confident that all outcomes are symmetric in probability, we need different tools or additional information. That’s one reason the classical definition, while elegant, doesn’t always match the complexities of real life.

The Barr Paradox
Moreover, in discrete scenarios like dice, counting outcomes is relatively straightforward. But as soon as we move to continuous sample spaces, like choosing a random real number between 0 and 1, what does it mean for each number to be ‘equally likely’? That question leads to intriguing paradoxes, such as Bertrand’s paradox that dramatically illustrates how a naive application of “equally likely outcomes” can fail in continuous probability.
In 1889, Joseph Bertrand asked what the probability is that a “random chord” in a circle exceeds the side length of the inscribed equilateral triangle. Depending on how one defines “random chord,” the answer varies, even though each of the three methods Bertrand proposed claims to use equal likelihood. This discrepancy contradicts the principle of indifference, when applied blindly to geometric situations.
Later, the excellent Edwin T. Jaynes resolved the paradox by advocating the principle of maximum ignorance: instead of assuming that all outcomes are equally likely in a simplistic way, we adopt the least biased distribution consistent with the information we have. This approach restores consistency and highlights the importance of being precise about what “random” truly means in any probability problem. We will return to this topic soon.

Criticism of the Classical Definition
There are many other reasons to criticize the classical definition of probability and demonstrate why it is unfit for real-world applications. But we will address those in the next episodes.

Next Episode Preview
Next week, we will introduce the frequentist definition of probability, developed by von Mises and others, along with some interesting—at least I hope—practical reflections. We will challenge the classical definition by presenting an alternative perspective, marking the beginning of our exploration of the different schools of thought on probability.

Episode 5: Moving toward Frequentism

Introduction to Probability and Risk Management

In today’s episode, we continue examining the concept and tool of probability, which, as we know, is essential for us in order to quantify and manage risk and, at least partially, also uncertainty.

Objectivist Schools of Thought

In recent episodes, we discussed some general considerations about probability, noting the existence of multiple schools of thought in the field of probability. Some of these schools are objectivist, meaning they view probability as something objective that exists independently of the human mind—an impartial entity, potentially measurable with precision using the right instruments, and comparable to physical concepts like mass, force, or velocity.

Von Mises’s frequentism falls into the objectivist universe. It has been revisited and reshaped by several scholars, including Donald Gillies, whose An Objective Theory of Probability, written in the 1970s, is a fundamental text for anyone wanting to explore objectivism in more detail.

However, Gillies’s perspective goes beyond simply rearranging Von Mises’s ideas. Indeed, Gillies can be considered something of a bridge-builder between frequentism and another objectivist school of thought, namely the propensity school, whose major exponents are Charles Peirce and Karl Popper. Chronologically, Peirce came first, in the nineteenth century. His aim was to reconcile the pragmatism of which he was one of the leading minds with the notions of potentiality and contingency. But it was Karl Popper who, many years later—and we might say independently—made the propensity school widely known.

Subjectivist Schools of Thought

On the other side of the spectrum, we find subjectivist schools. For these schools, probability is inseparable from the subject; it’s not an object that exists on its own, but rather something intertwined with the ideas, the concepts, the qualities, and the limits of whoever uses it. Within the subjectivist world, we have, among others, the Bayesian viewpoint, the epistemic perspective, and the credibility approach. Major figures in this field include Frank Ramsey—who died at just 27 but made a fundamental contribution to probability theory—Leonard Savage, described by Milton Friedman as one of the few people worthy of the title “genius,” and Persi Diaconis, who, besides being a great mathematician, is also a professional magician. There are many more, such as Hans Bühlmann, Richard Jeffrey, and of course my dear Bruno de Finetti, a great Italian probabilist, to whom I’d like to devote an episode sooner or later.

Other Schools of Thought: Classical, Axiomatic, and Logicist

Outside of objectivism and subjectivism, we find certain schools such as the classical one, which can be seen as the forerunner or root of all subsequent developments and has traits of both major worldviews. We also find the axiomatic school and the logicist school.

The axiomatic approach can be seen as a way to dodge the debate, taking refuge in a somewhat metaphysical view of probability. A great representative of this school is the giant Andrey Kolmogorov. The axiomatic perspective is the standard approach to teaching probability, especially in mathematics or physics. Yet it is also increasingly used in economics and actuarial science, where in the past frequentism prevailed.

On the one hand, introducing the axiomatic theory as the first approach to probability is a sound choice, because it allows students to study probability as a branch of mathematics, applying the knowledge they have already acquired, without worrying too much about deeper epistemological issues. On the other hand, it’s a choice that can lead many students to think that probability is only that. They may then collide with real-world situations, where you have to get your hands dirty and where certain elegant theoretical properties do not hold.

The axiomatic construction, which works beautifully in theory, turns out to be inadequate for dealing with the many practical problems one encounters when working with data. Even something as basic as distinguishing between single events and large-scale random phenomena is not easy to handle from a strictly axiomatic perspective. We discussed these two types of phenomena in the previous episodes, which I encourage you to listen to if you haven’t yet—it will soon come in handy.

The logicist school, on the other hand, forms something of a bridge between the subjectivist and the objectivist perspectives—or at least, that’s how I see it, since I view logicism as an attempt to unite these two worlds. However, some scholars claim that logicism is a kind of subjectivism, while others say the logicist view is actually objectivist. In reading works by the great logicists—Keynes, Wittgenstein, Waismann, Reichenbach, or Kyburg—I personally find that seeing it as a “bridge” is the most appropriate stance. In logicism, probability is viewed as the degree of reliability of a given proposition, ranging from 0 (false) to 1 (true).

According to Friedrich Waismann, no concept—especially one of an empirical nature—is fully defined and permanent; rather, each concept is subject to continuous clarifications and revisions. It’s as though each concept were surrounded by a membrane that encloses it, giving it shape, but that membrane is porous, allowing meanings to flow in and out. As an example consider the concept of money and how it has been challenged by cryptocurrencies such as Bitcoin. Indeed, among economists, the discussion is still ongoing. Some do not consider cryptocurrencies to be money—for instance, citing their volatility, which calls into question their use as a unit of account—while others argue they are. Or consider friendship, and how its definition has shifted in the era of social networks. Who do you call a friend? Clearly, probability is a porous concept as well.

The Classical Definition of Probability

But let’s rewind a bit and proceed step by step. In the last episode, we started with the classic definition of probability. As mentioned, it’s not very useful from a practical standpoint—it only applies to highly idealized situations and certain gambling scenarios. Nonetheless, it deserves discussion, both for its historical importance (it was the first formalization of the concept of probability) and because other schools, like the frequentist one, arose to address the shortcomings of the classical definition.

We said that, in the classic view, the probability of an event is nothing but the ratio of the number of favorable outcomes to the total number of possible outcomes, under the assumption—citing Pierre-Simon Laplace—that “nothing leads us to believe that any one of these outcomes should occur more frequently than the others, which makes them all equally possible in our eyes.”

Hence, probability is the result of a simple division: the numerator is the number of times the event of interest happens, and the denominator is the total number of possible outcomes for the random phenomenon in question.

So, once again, if we consider a six-sided die roll as our random phenomenon, we have a total of six possible outcomes (1 through 6). That will be our denominator. The numerator changes based on the event we want to study. If I’m interested in the probability of rolling a 5, there is only one favorable outcome out of six, so the probability is 1/6. If I want the probability of rolling an even number, there are three favorable outcomes: 2, 4, and 6. So the numerator is 3, and the denominator is still 6. The probability is therefore 3 over 6, that is 0.5. If I want the probability of rolling a number strictly less than 3, the favorable outcomes are just 1 and 2, so that’s 2 out of 6, or 1/3. If I say “less than or equal to 3,” then we have 1, 2, and 3—three favorable outcomes—so 3 out of 6, and so on.

By counting the favorable outcomes, summing them, and dividing by the total number of possible outcomes, we’re implicitly considering each possible outcome (or elementary event, to use the term from the last episode) as equally likely; we are not giving them different weights. In the roll of a fair six-sided die, the probability for each elementary event (rolling a specific side) is indeed 1/6. As we noted last time, also in the words of Pierre-Simon Laplace that we have just cited, the assumption of equiprobability is linked to the principle of indifference, according to which we should never consider certain outcomes more likely unless there is strong contrary evidence.

Limitations of the Classical Definition

Simply reading the classical definition reveals its main limitations, though its historical merits and its impetus for later developments are undeniable. First, in order to use the classical definition, we must be able to precisely count both the favorable outcomes and all possible outcomes. This implies that the number of outcomes must be finite. That’s a major restriction! Indeed, while the classical definition easily handles the probability of rolling a 6 with a single die or even the probability of rolling four threes when rolling ten dice, it becomes significantly more complicated if, for example, we want to determine how many die rolls we need on average before we get a 6. The classical definition doesn’t tell us what to do when the number of outcomes is potentially infinite.

Unfortunately for us—especially in risk management—there are only a few situations in which the outcomes of a random phenomenon are finite and easily countable. As mentioned, the classical definition can work in some simple gambling contexts, but even modest modifications of the game reveal its limitations. Under the classical definition, we are unable to say what the probability is that the price of company stock will rise by X% or more tomorrow, or the probability that a bridge will be damaged by a severe storm.

Moreover, the classical definition of probability suffers from what probabilists call the “circularity problem.” By defining probability as the ratio of favorable to possible outcomes, we are assuming that these outcomes are equiprobable, as we said. In other words, we’re relying on the idea of probability (equiprobability) to define probability, as when we say that a coin is “fair” but end up defining fair to mean that heads and tails happen with equal probability.

Frequentist Approach to Probability

Many scholars have tried to solve the circularity problem, for instance by working on the notion of symmetry (a die is indeed a symmetrical object), but no solution has been fully satisfactory—especially since asymmetry is very common and, if you like, more interesting to analyze.

One of the best-known solutions to the problems of the classical definition is offered by the frequentist school, an example of the objectivist view of probability. If you’ve studied some probability—especially in social science departments like economics—there’s a good chance that frequentism is what you have encountered.

Prominent names in frequentism include John Venn (famous for Venn diagrams), Ronald Fisher (one of the fathers of modern statistics), Jerzy Neyman and Egon Pearson (both giants of statistics), and Richard von Mises—a polymath not to be confused with his brother Ludwig von Mises, the influential economist. In his treatise on probability, the logicist John Maynard Keynes suggests that one of the first frequentists in history was Aristotle. Other historically important figures in frequentism include Bernoulli (pronounced “ber-NOO-lee,” if we want to be pedantic), Gauss, and Laplace. We already mentioned Bernoulli and Laplace as among the founders of the classical definition of probability, but it’s undeniable that in some of their results they more or less implicitly used a frequentist viewpoint as well.

Conclusion and Preview

Two absolutely pivotal texts for the frequentist school are The Logic of Chance by John Venn (published in 1866), and Wahrscheinlichkeit, Statistik und Wahrheit (Probability, Statistics, and Truth), written by Richard von Mises. The latter was first published in 1928 and revised in 1936. It’s a clear, accessible essay, rich with interesting examples and almost devoid of formulas. Reading from the original text, von Mises’s goal is to (quote) "show how, starting from statistical observations and applying to them a clear and exact concept of probability, we can arrive at conclusions which are reliable and true, and in practical life just as useful as those obtained in any other exact science” (unquote). It’s definitely an objectivist point of view.

For frequentists, the probability of an event corresponds to the relative frequency of that event over time—or, better yet, the theoretical value of that relative frequency. The word “frequentism” derives precisely from this focus on frequency.

The central idea is that, when analyzing a random phenomenon, we record and study its manifestations over time, and we define probability as the ratio between the number of times the event of interest happens and the total number of observed outcomes of that random phenomenon over time. And since time can possibly go to infinity, we introduce the concept of limit.

Episode 6: Frequentism, its pros and its cons

Introduction to The Logic of Risk: Episode 6

Welcome to “The Logic of Risk,” Episode 6. Today, we continue talking about frequentism, going into some more detail. Are you ready? Let’s get started!

Recap of Frequentist Probability

In the previous episode, we said that, for frequentists, the probability of an event corresponds to its relative frequency over time, or, more precisely, to the theoretical value of that relative frequency. The idea is quite simple: when analyzing a random phenomenon, we record and study its occurrences over time, and we define probability as the ratio between the number of times the event of interest happens and the total number of outcomes we observe for that random phenomenon over time.

Dice Example to Illustrate Frequency

Let’s go back to the dice example, and suppose we’re interested in the probability of rolling a 5. If you have pen and paper handy, you might find it helpful to jot down what I will say; but don’t worry, I’ll try to keep it clear even if you’re just listening.

Imagine rolling a die once. We observe a 3—no 5. So, with just 1 roll, the relative frequency of rolling a five is 0 (meaning zero fives observed) out of 1 roll. Result? 0. Now I roll the die a second time and observe a 2. The relative frequency of rolling a 5 is still 0 out of 2—again, 0. I continue rolling the die and get, in order: 6, 6, 1, 4. After 6 rolls total, I’ve still observed 0 fives.

Then, on the seventh roll, I get a 5. So, I now have 1 five out of 7 rolls, and the relative frequency updates from 0 to 1/7, about 14%. On the eighth roll, I get another 5. Now it’s 2 fives out of 8 rolls, which is 2/8, or 25%. Then I roll a 3, so the frequency of fives becomes 2/9, roughly 22%.

Convergence to Limiting Value and Law of Large Numbers

If I imagine rolling the die many more times—say n times with n large—the relative frequency of fives will tend to converge toward a limiting value: 1/6, or about 17%. We call it a limiting value because, theoretically, you never actually reach it; it’s a theoretical value that appears in the frequentist view when the number of rolls goes to infinity.

In reality, we don’t need to wait for an infinite number of rolls to see probability emerge as a relative frequency. Bernoulli’s law of large numbers tells us that in a sequence of independent trials—like our dice rolls—as the number of trials grows, the relative frequency of a certain event converges to that event’s probability, with an error that becomes smaller and smaller. So if we’re satisfied with, say, a three-decimal-place approximation, we clearly don’t need infinite time—something I personally don’t have, and I doubt you do either!

Objectivity of Probability in Frequentism

Notice that by saying that the frequency converges to the event's probability, we are affirming that such a probability exists objectively. Indeed, frequentists believe (and it’s somewhat amusing to use the verb "believe" here) in the objectivity of probability.

In the dice example, just a few dozen rolls are already enough to get probability values that are very close to that 1/6 which we know is the probability of rolling a five with a fair die.

Frequentist Approach vs Classical Definition

You might say: “Why go to all this trouble just to arrive at the same result the classical definition gave us in two lines?” Well, the point is that in the frequentist definition, we haven’t assumed equal likelihood, nor have we needed to count all possible outcomes. We only counted how many times a five came up out of n rolls, with n large.

At this point, it may be helpful to restate it clearly:
For a frequentist, the probability of an event is the theoretical limiting value that its relative frequency tends toward if we repeat our random experiment an arbitrarily large number of times.

Applying Frequentism to Real-World Events

So, if I’m interested in estimating the probability that a certain portfolio will lose more than 10,000 euros in a single day, all I need to do is to observe that portfolio (or similar portfolios) over time and note how many times such a loss actually occurs. Notice that, by the classical definition, such an event would be practically intractable because I wouldn’t know how to precisely describe the sample space, in which I should be able to count a finite number of equally likely outcomes.

Pitfalls of the Frequentist Approach

It’s natural to ask what pitfalls might lurk behind the frequentist definition. And there are several. First, we’re not always able to repeat and observe a given phenomenon a large number of times. It’s not just a question of the time and money (should repeating the experiment and collecting data be costly). Personally, I would not set out to estimate the probability of a nuclear reactor meltdown using a frequentist approach. And not merely for monetary reasons—there’s something deeper.

Von Mises and Kollektiv: A Limitation of Frequentism

Recall the distinction between mass random phenomena and single events. For a mass random phenomenon, we might hope to collect enough data to study how its relative frequency behaves, but we can forget about that for single events. Von Mises was aware of this, which is why his analysis primarily deals with what he calls a “Kollektiv,” a collective in English.

A collective, he says and I quote, is “a sequence or collection of uniform events or processes that differ in some observable attributes, such as color, number,” and so on. An example is all the molecules in a certain volume of gas, where the attribute might be each molecule’s velocity. Or all the wheat stalks grown by a certain farmer, where our interest lies in whether they have or not a certain parasite. For Von Mises—and for frequentists generally—the probability of interest is the probability of observing a certain attribute in that collective.

Von Mises himself goes so far as to say that for single events, it’s impossible to properly define an objective probability. I can estimate the probability that a group of 40-year-old Frenchmen will die before turning 41, but not for one particular individual—say Jérôme from Paris. This relates to the reference class problem, which we will look at more closely in the next episodes.

Criticism by Bruno de Finetti and Scientific Dialogue

This clear limitation to mass random phenomena is one of the main criticisms by Bruno de Finetti, along with more technical points like the Regellossigkeitaxiom (the axiom of irregularity). We’ll get into those when we discuss the subjectivist definition of probability. To be clear, de Finetti always considered Von Mises’ work extremely important, appreciating and sharing many of its aspects—like the critique of the classical definition’s assumption of equiprobability. He also agreed that once probability is defined based on certain fundamental principles and minimal requirements (such as those we covered in Episode 3, and which we’ll revisit soon), the mathematics of probability must be the same, universally.

True scientists may have differences in perspective, but they always maintain a deep mutual respect. Scientific dialogue thrives on diversity of viewpoints and is enriched by a free exchange of ideas—provided, of course, there’s no room for absurdities, conspiracies, or anything like that.

Practical Considerations in Risk Management

I’d like to close this episode with some practical considerations on the frequentist definition of probability. In risk management, after all, we’re interested in using probability to quantify the chance of damage, and thus, ultimately, risk.

As mentioned, the frequentist approach is really only workable when you’re dealing with mass random phenomena, those you can observe repeatedly over time to collect enough outcomes. Put simply, you need a lot of data you trust—and remember, by the way, that trust is subjective—if you want to estimate a probability accurately. This automatically excludes one-shot events, which are very significant in risk management, but by definition offer no historical data for analysis. And while we might try analogy—looking at similar events observed elsewhere—that doesn’t really help if the event you’re considering is unique, or nearly unique.

Challenges with Rare Events and High Variability

Even in cases where the phenomenon is a collective—thus, in line with Von Mises’s frequentist perspective—practical problems can still arise. Even if the event whose probability we want is a rare event, theoretically we might be able to gather enough observations over a long period, but in practice this might not be realistic. David Freedman, a statistics professor at Berkeley, pointed out that the frequentist method can’t be used to estimate the probability that an earthquake with a magnitude above 6.7 on the Richter scale will occur in the San Francisco Bay Area before 2030. Even though earthquakes of that magnitude happen several times a year worldwide, you’re not going to get enough occurrences specific to that region.

And if you’re interested in an earthquake above 8.5? Thankfully, those happen only every 20 or 30 years worldwide, so it’s basically pointless to rely on a relative frequency.

Limitations in High-Risk Scenarios and Tail Events

The weakness of the frequentist method is particularly obvious for random phenomena marked by high variability and extreme events. When we discuss heavy tails, sub-exponential tails, and fat tails (which are not at all the same thing, despite common misconceptions), we will see that even substantial data can tell us little or nothing about very rare and extreme events—especially, if we focus on just calculating relative frequencies. We do need more robust tools. Phenomena characterized by heavy tails, long tails, sub-exponential tails, and fat tails include wealth, income, telecommunications networks, financial portfolios, pandemics, wars and terrorist attacks (when we assess the number of casualties), floods, storms, earthquakes, solar flares, and a thousand other things that are not exactly minor concerns in risk management.

Historical Bias and the Future of Risk Management

Moreover, unless we’re dealing with very regular phenomena—say, the thin-tailed ones, where extremely large or extremely small outcomes are negligible—the frequentist approach is easily victim of historical bias. This bias manifests itself when we assume the past is an optimal predictor of the future, and thus believe that enough historical data can accurately give us the probability we’re interested in.

For thin-tailed phenomena—many medical or physical measurements, for example—this is often true. If I gather enough data on people’s weights, I can estimate quite accurately the probability of someone being obese, anorexic, or of normal weight. But believing the same holds for fat-tailed phenomena is one of the most serious mistakes in risk management. If I run a nuclear power plant and I have not seen a major incident in the last 15 years, assuming one cannot occur in the future is, shall we say, foolish. Once again, think of the Lehman Brothers collapse, which many “geniuses” assumed was practically impossible because it had never happened before, giving it a relative frequency of zero—just like the zero frequency of fives in our first few dice rolls.

Conclusion: The Role of Risk in Frequentist Thinking

In risk management, there’s a simple saying that seldom fails: “If it has happened, it can happen again. If it has never happened, it can still happen.”

And the quote I’d like to leave you with comes from John Maynard Keynes, whom we will soon discuss:
“It has been already pointed out that no knowledge of probabilities can help us to know which conclusions are true, and that there is no direct relation between the truth of a proposition and its probability. Probability begins and ends with probability.”

Episode 7: Probability as propensity

Introduction to Propensity Theory 

Hi there, welcome to episode 7 of The Logic of Risk. Today we’re still talking about probability, but we’re going beyond the classical and frequentist definitions. We’re going to dive into the propensity perspective, and it’s going to be an important test for this podcast because, frankly, propensity theory is less intuitive than what we’ve seen so far. It might seem unnecessarily complicated—and maybe it is—but it’s a really important objectivist attempt to define probability. Plus, in certain fields, it actually can be a pretty solid point of view. I will do my best to make it clear and show where and when it could make sense.

Classical and Frequentist Definitions of Probability 

Remember, the classical definition sees the probability of an event as the ratio between the number of favorable outcomes (that is, those in which the event occurs) and the total number of possible outcomes, assuming all outcomes are equally likely. The frequentist, on the other hand, defines the probability of an event as the theoretical limit toward which the relative frequency of that event converges, if you observe your random phenomenon an arbitrarily large number of times.

In previous episodes, we discussed the limitations of both definitions. The classical definition is so limited that it only works in idealized situations or for simple games of chance—so it’s not really useful for risk management. The frequentist definition, meanwhile, is great for describing the probability of large-scale random phenomena but isn’t suitable for one-off events. And even when studying mass random phenomena, there are some caveats.

Challenges in Estimating Probabilities 

To reliably calculate relative frequencies, we need good data. And by “good data” I mean not only data of satisfactory quality (which goes without saying) but also in sufficient quantity. Determining just how much isn’t trivial, and we’ll come back to that soon. It can range from a few dozen observations for fairly regular random phenomena—with limited variability and almost negligible extreme events—to several thousands, or even millions and billions, for more erratic and volatile phenomena that might feature non-negligible extremes.

Unfortunately (or maybe fortunately, from a scholar’s point of view), it’s the latter type of events that have the biggest impact and are therefore of most interest in risk management. Estimating relative frequencies for a highly erratic phenomenon based on scant data—and taking them at face value—is just plain foolish.

The Problem of Historical Bias 

In our last episode, we also introduced the problem of historical bias. Historical bias pops up when we mistakenly assume that the past is a great predictor of the future and that, if we have enough data, we can expect the future to mirror the past. Our job then becomes simply to estimate the probability distribution, and then we’re done.

For those of you familiar with bank risk management, you know that this is, unfortunately, what gets done, under the so-called Basel Framework, with historical simulation and stress tests based on pre-observed scenario variations—for market, credit, and operational risks. All of these are situations marked by non-trivial tails, especially in credit and operational areas. And then everyone wakes up in a cold sweat when the estimates turn out to have underestimated the risk. After all, if something has never happened before, why should it happen? But don’t worry—the happy ending (and the promotion) always comes eventually. These guys usually end up becoming politicians.

If you’re not up to speed on bank risk management, that’s fine. We’ll do a few special episodes on that because I believe it’s essential to understand how banks decide (or should decide according to regulators) whether to grant you a mortgage or a credit card. So just wait.

Metaphors for Historical Bias 

To explain historical bias, there are plenty of metaphors and allegories, but one is especially effective—and I often use it because it can be adapted to explain other data management issues. Having a model or approach that suffers from historical bias is like driving a car while only looking in your rearview mirror. Sure, you can see clearly what’s behind you, but if you only look in the rearview mirror, sooner or later you’re going to crash into something (or someone). And then whose fault is it? Your model—the rearview mirror—or you for being an idiot? History will be the judge.

Another useful image is that of a runner trying to set a new world record. We all agree that to set a new world record, you have to run faster than the person who held it before—that is, you have to beat the previous record. Clearly, until we observe that new record, it’s not in our data. All we know is that it will be better than what we currently have, which then acts as a sort of lower (or upper) bound, depending on how you look at it.

So, tell me: why do we all accept this trivial fact about sports records, yet in finance, economics, or in the study of wars or pandemics, we take the historical maximum as an unsurpassable value—the worst of the worst (or the best of the best) that our models can predict? We’re literally shooting ourselves in the foot and then complaining. But we’ll have time to dive into those practical details later.

Introducing the Propensity Definition of Probability 

For now, let’s focus on the definition of probability as propensity. I should warn you that some simplifications will be necessary, so my apologies to any hardcore propensionists listening. You never know.

The propensity definition is relatively unknown outside the experts, but it has some interesting aspects that we’ll highlight. I’ll also mention that, from an application standpoint, the propensity view is virtually indistinguishable from the frequentist one when studying large-scale random phenomena. The difference shows up, at least in theory, when it comes to single events—cases where, as von Mises pointed out, the frequentist view doesn’t really even define probability.

The Origins and Development of Propensity Theory 

Propensity theory is mainly traced back to Peirce, who in some writings and letters around the turn of the 19th to the 20th century, outlined what we should more precisely call his dispositional view—though hardly anyone uses that term these days. For Pierce, probability is a natural disposition of things, a relational property comparable to weight in physics, which comes into play when the outcomes of an experiment or phenomenon aren’t certain but random.

But Peirce never fully formalized his ideas. It was Karl Popper who emerged as the champion of the propensity view. For those interested, I highly recommend reading his most famous article on the subject, The Propensity Interpretation of Probability, published in 1959 in the British Journal for the Philosophy of Science.

Popper’s View on Propensity and Its Implications 

For Peirce and Popper, probability is something objective, measurable, and independent of the observer. It’s nothing more than the measurement of the actual possibilities for an event to happen—not just abstract possibilities, but ones that are objectively and physically real. According to them, every random phenomenon is characterized by real, natural physical tendencies that lean toward a specific outcome. These tendencies are what we call propensities. So, probability is essentially the empirical measure of these propensities.

Closing Thoughts and Looking Ahead 

But before we get into that, we need to explore why Popper decided to develop the propensity view as, in his opinion, a superior objectivist definition compared to the frequentism of Venn or von Mises—even though, in the end, he remained somewhat of a supporter of the latter. We can do that by looking at some paragraphs from what Popper himself wrote in 1959.

Okay, this will be the starting point of our next episode. I know you might think I’m spending too much time discussing what probability is, but believe me—this is just the beginning! And we need this knowledge.

Episode 8: Propensity in play, from quantum mechanics to one-shot events

The Propensity School of Thought
In our last episode—which is essential listening if you haven’t already—we introduced the propensity school of thought as seen by Peirce and Popper in defining what probability is.

Think back to those physics and chemistry experiments we did at school. You would set up an experimental configuration—gathering all the elements needed, verifying conditions like temperature and light, and then following a specific sequence of operations to produce a particular outcome, such as a reaction. The outcome would occur almost every time, but not always, because a slight variation in the setup or a moment of inattention might lead to a different result. Repeating the experiment many times would yield the desired outcome a high percentage of the time, suggesting that the experiment had a propensity to produce that result.

That’s the essence of the propensity idea: once a random phenomenon is set up and its conditions verified, repeating it over time will yield certain outcomes with varying likelihoods, as if an invisible force were pushing events in particular directions. This force is the propensity, which we cannot observe directly but can approximate through our measurements—via probability.

Popper’s Perspective on Propensity
 This idea—that we can’t observe the propensity directly but can only infer it from experimental outcomes—is not as strange as it might seem. In modern physics, similar concepts appear in discussions of fields, potentials, and wave functions.

Philosophically, Popper’s notion of propensity has an Aristotelian flavor, evoking the concept of potentiality. In our last episode, we ended by reading an excerpt from a 1959 article by Popper, where he argues that his propensity view is more reconcilable with quantum mechanics than the frequentist approach. More importantly, it is capable of handling single-case phenomena—something that neither von Mises nor Peirce considered feasible. These are, without a doubt, significant innovations.

Propensities and Quantum Mechanics
 Regarding the compatibility between propensities and quantum mechanics, I’ll offer a few brief remarks now and promise that we will revisit this topic in detail later—perhaps even with a colleague of mine, a physicist who works with these issues every day.

Popper’s propensity interpretation lets us view quantum probabilities as real, physical tendencies rather than mere reflections of our ignorance. In quantum mechanics, the wave function assigns probabilities to different outcomes, and according to Popper, these probabilities represent inherent propensities of a system to behave in certain ways when measured. This means that even if we had complete knowledge of the wave function, the probabilistic nature of the outcomes remains an objective feature of reality.

The idea of an objective tendency fits neatly with quantum mechanics. The probabilities derived from the wave function are not just abstract statistical frequencies or subjective beliefs—they are physical dispositions built into the experimental setup itself.

Context Dependence in Probability
 This interpretation also reinforces the notion that measurement and reality in the quantum world are deeply intertwined. The randomness we observe in quantum measurements isn’t merely a result of incomplete information; it’s a fundamental aspect of nature, suggesting that indeterminacy is intrinsic to the system.

Context dependence is another central idea in both quantum mechanics and Popper’s view. In quantum experiments, the results can vary dramatically depending on the specific conditions under which a measurement is made—such as the arrangement of detectors, the orientation of the measuring devices, or other aspects of the setup. This contextuality means that the properties we observe emerge not just from the system itself but from its interaction with the measurement apparatus. Similarly, in Popper's framework, the propensities are tied to these experimental contexts; different setups will naturally yield different probability distributions, reflecting the influence of context on the outcome.

Large-Scale vs. Single-Case Phenomena
 Popper also points out that his propensity view converges with the frequentist approach when it comes to large-scale random events. For a mass phenomenon—a “Kollektiv,” as von Mises would say—the probability of an event is simply the limit toward which the relative frequency converges over an arbitrarily large number of independent repetitions of the experiment. The difference is mainly formal: for the propensionist, the limit of the relative frequency reflects an underlying tendency—a propensity—of the phenomenon to generate certain outcomes. In the end, the propensity idea is essentially an attempt to “physicalize” probability.

When it comes to large-scale phenomena, the criticisms of viewing probability as propensity are similar to those aimed at frequentism. Everything works well as long as we have high-quality, abundant data and we’re not dealing with extremely volatile phenomena characterized by tail events (that is, rare, extreme outcomes with significant impact).

However, Popper’s propensity view departs from frequentism when dealing with single-case, one-shot events. According to Popper, the propensity view can assign a probability to a single event if three conditions are met:

  1. We consider the one-shot phenomenon as representative of a sequence—a potential “Kollektiv” that is not yet actual but is virtually conceivable.
  2. This imaginary sequence is composed of events based on precise, carefully analyzed surrounding conditions.
  3. The virtual sequence, and hence the probability of the event, can be conceived as resulting from some hidden propensity.

Challenges in Propensity-Based Probability
 Yet, one must ask: on what objective elements does the propensity theory base its formulation of objective probability values for a single event? After all, if relative frequencies can only be defined in terms of a virtual sequence of phenomena, how can we ensure objectivity in the single-case scenario?

While thought experiments and contextualization can, in theory, support a propensity-based approach, in practice it is challenging to guarantee the objectivity that relative frequencies provide for large-scale phenomena. Some propensionists propose “contextualization operations”—ensuring that the evaluator is fully informed about the history and dynamics of the conditions underlying the phenomenon. Rather than a single snapshot, we need the full movie to build a robust probability assertion.

Final Thoughts and Upcoming Topics
 As we’ll see shortly, the subjectivist view turns out to be the only one that is internally consistent for single-case phenomena. You may or may not favor that approach, but its coherence is undeniable.

So, why all this discussion on propensities? It isn’t just a philosophical-mathematical diversion. The ideas of Peirce and Popper have led to important studies and developments. I recommend Donald Gillies’ work, An Objective Theory of Probability. In it, Gillies leans more toward Peirce than Popper and even introduces his own version of a propensity theory, though he doesn’t call it that.

That wraps up Episode 8. We still have a few more episodes ahead on the definitions of probability. We’ll discuss logicism, the axiomatic view, and the subjectivist perspective before moving on to risk measures and their applications. That’s all for now—until next time, when we will discover that probability does not exist!

Episode 9: Probability does not exist, or the subjectivist approach

The Nature of Probability

It’s certainly worth asking: does probability “really” exist? And what on earth would it be? I’d answer no—it does not exist! Someone once asked me, rather ironically, why on earth I even bother with it, after I gave this answer (emphasized with the all-caps motto “PROBABILITY DOES NOT EXIST” in the English preface of Teoria delle probabilità).

Well, I could also say the reverse without contradiction: probability is everywhere, and it is—or at least should be—our “guide in thinking and acting,” which is why it interests me. The trouble is that realism (as Jeffreys cleverly observed) has the advantage that “language was created by realists, and even by very primitive realists,” and therefore “we have vast possibilities of describing the properties attributed to objects, but very little of describing those directly experienced as sensations.”

From this stems the obsession (which, for some, might be a sign of wisdom, seriousness, or acuity) with absolutizing, concretizing, even objectifying those things that are merely properties of our subjective attitudes. Otherwise, how would you explain the effort to make probability into something nobler than it is (to quote Jeffreys again), by hiding its subjective nature and dressing it up as objective? According to Hans Freudenthal’s imaginative take, it might be seen as a kind of odd modesty intended to stop us from seeing probability “as God made it”: it requires “a fig leaf,” and often it’s covered entirely with fig leaves, rendering it even invisible or unrecognizable.

Introduction to Subjective Probability

Ladies and gentlemen, Signori e Signore, these are some of the words Bruno de Finetti wrote for an Italian Encyclopedia, under the entry Probabilità, that is, probability. The translation is mine. Welcome back to The Logic of Risk, Episode 9. And as you might have guessed from this introductory reading, today we’re talking about subjective probability, the epistemological and Bayesian view, and we’re going to meet heavyweights like Frank Ramsey, Bruno de Finetti, Leonard Savage, Sir Harold Jeffreys, and many other friends.

In previous episodes, we considered three important definitions of probability. There’s the classical one, historically the first, and two expressions of the objectivist school: namely, frequentism à la von Mises and company, and Popper’s propensities and his crew’s vision. Today, we’re looking at the other side of the coin, focusing on the subjectivist interpretation of probability.

The Core Idea of Subjective Probability

Okay, the idea is simple: probability and its assessment are inseparable from the evaluating subject. Probability isn’t something that exists on its own; rather, it’s something we create, develop, and modify in our minds as we face something that we perceive as random. Each of us, when estimating the probability of an event, whether we like it or not, is always influenced by our own beliefs, knowledge, available information, and cultural context. And this despite “the obsession with objectifying even those things that are merely properties of our subjective attitudes,” to quote de Finetti once more.

Let’s get one thing straight, putting aside one of the most banal criticisms leveled against probabilistic subjectivism. The fact that probability is subjective doesn’t imply that multiple people can’t reach an agreement, nor does it mean that any outlandish opinion is valid. In everyday life, we come to agreements with one another even when starting from different points of view. Think of price negotiations, couples deciding where to spend the weekend, agreements among friends, as well as commercial deals.

Ensuring Reliability of Subjective Probability

We’ll soon see that there are very effective ways to ensure the reliability of subjective probabilities, their comparability, and their communicability. For example, we can reason in terms of equivalence and coherence.

Remember equivalence? Two probability measures are said to be equivalent if they agree on what is impossible. In other words, if an event has zero probability under one measure, it will also have zero probability under the other. However, when it comes to what is possible—that is, events with positive probability—they can differ greatly. If two individuals have equivalent views, it’s already a great starting point. As for coherence, we’ll return to that in just a minute.

The Subjectivist View on Probability

One great merit of the subjectivist view is that it elegantly and uniquely solves the problem of distinguishing between mass random phenomena and singular events. For the subjectivist perspective, the definition of probability is the same, and so is its use. Notice, this doesn’t deny the difficulty of estimating the probability of certain phenomena—in fact, it doesn’t allow for exceptions: the defining rule remains the same.

Some scholars, usually critics, go so far as to say that subjectivism even erases the distinction between risk and uncertainty, making everything falsely measurable, even when it isn’t. In reality, as observed by Feduzi, Runde, and Zappia, things aren’t like that—especially in the “definettian” subjectivism—and we’ll return to this in the episode on uncertainty. But first, we need to lay the groundwork.

Ramsey, de Finetti, and Savage on Probability

If we look at what Frank Ramsey writes in his 1926 article “Truth and Probability,” probability isn’t something separate from individual knowledge—it doesn’t represent a body of universal knowledge. No, probability is intimately connected to the person who estimates it.

Ramsey’s ideas were later developed by de Finetti and Savage. Savage, a brilliant mind, was one of the greatest defenders of subjective probability against the attacks of objectivists. And he was the one who contributed the most to spreading de Finetti’s ideas in the Anglo-Saxon world.

De Finetti's Operational Epistemic View

According to de Finetti (as well as Ramsey and Savage), the numerical value of probability—that number between 0 and 1 that we assign to random events—can be expressed as the outcome of a bet. The de Finetti view, if we want to be pedantic, is called the operational epistemic view.

The probability of event E is the fair price of a bet that pays 1 euro or 1 dollar, if you prefer, if the event E occurs—where “fair” is understood according to the personal evaluation of the individual.

For de Finetti, it makes no sense to talk about the probability of an event unless it’s in relation to the body of knowledge a person has. Subjective probability is therefore a tool to give a reliable measure of that which cannot be measured objectively.

Episode 10: Fighting the hydra of frequentism, or why subjectivism is preferable

Subjective Probability and de Finetti’s Perspective

Last time, we began talking about subjective probability by looking at the ideas of Bruno de Finetti. We said that, according to de Finetti—and many other subjectivists—the probability of a given event is the fair price of the bet that pays 1 euro if that event occurs, while you lose your wager if it does not. We also mentioned that the concept of fairness is understood as the personal evaluation of the person placing the bet, so what is fair for me may not be fair for you, and vice versa.

An important concept introduced by de Finetti, in his definition of probability as a bet, is that of coherence. The definition of the fair price—that is, the numerical value of the probability associated with a bet on a given event—is coherent if it does not expose the bettor to a sure loss (and therefore the counterparty to a sure gain), regardless of the outcomes of the events being wagered on. In other words, de Finetti excludes the so-called Dutch book, that kind of bet where the prices are set in such a way that the bettor ends up with a net loss no matter what the result is.

Coherence and Fair Betting

Simply put, excluding the Dutch book tells us that if, for me, the probability of rain is 0.3 (30 cents, as mentioned last time), then the probability of no rain must be 0.7, meaning I’d be willing to enter a bet where I pay 70 cents and receive 1 euro if it does not rain. In this way, the sum of the probabilities will be 1, as required by those minimal facts we discussed in the third episode. And this extends if I consider several events, like evaluating the chances of an investment.

In the same way, I will never consider as fair, and therefore valid foundations of probability, a negative price or a price exceeding 1. For those of you familiar with financial mathematics, the concept of no arbitrage might start to spin in your head.

Alternative Definitions of Probability

The definition of probability as a coherent bet allows de Finetti to also provide an alternative definition, not tied to gambling, through the use of scoring rules, such as Brier score. In a way, it anticipates and expresses in probabilistic terms Nassim Taleb’s idea of skin-in-the-game. To avoid outlandish evaluations of probabilities, and to ensure their comparability, our estimates must be in line with the consequences we are willing to face, with our real choices.

And here the reference to the ideas of Ramsey about the fact that probabilities are observable through choices is natural. However, I won’t bore you further on this, also because the definition via scoring rules would require a mathematical formalization that isn’t compatible with a podcast.

Criticism of Classical and Frequentist Probability

So, what about the classical or the frequentist definition? For de Finetti, these are pseudo-definitions. The classical one is tautological and useless, as it presupposes knowing what “equally likely” means at the moment we are defining probability itself. In other words, it’s the criticism of the circular definition, if you recall.

As for the frequentist view, he writes that it is “like the hydra with its seven heads, always presenting new variants of frequentist definition attempts. As the previous ones are nibbled away, more and more artificial and unhappy versions emerge. And indeed, to get rid of the indeterminacy of frequency, it is replaced with the unattainable limiting frequency, which is knowable only after the end of eternity!”

A Balanced Subjectivist Approach

Reading de Finetti is always entertaining, thanks to his use of imagery, his polemical and ironic verve, and the clarity of his thought. In reality, even if, from a philosophical point of view, de Finetti shuns objectivism, he is the first to say that one should not throw the baby out with the bathwater.

The use of relative frequencies, for example, if done with discernment, can prove useful to help and guide us in the subjective evaluation of the probabilities of mass random phenomena. Just as the propensional idea of contextualization is extremely useful in the case of one-shot events. And if I’m playing dice or roulette, it’s perfectly fine to use the classical approach to form a personal idea of the probabilities in question.

Personal Probability and the Role of Judgment

In the end, if you wish, the subjective view tells us that, as individuals, we define probability, but in doing so we can use all the tools we deem useful, as long as we remember that we are the ones making the choices. When I, Pasquale, enter my probabilistic bet, I do so based on what I know. And if I deem that to be not sufficient, then I gather information, analyze the data, use models, and consult experts in the field. And thanks to all this, and conditionally on it, I then elicit my probability judgment, which is always conditional and conditioned.

The verb “to elicit” is the technical term used when expressing a subjective probability.

Advantages of Subjectivist Probability

The estimation of subjective probability, even when based on data observation and relative frequencies, allows me to do things that are simply impossible for a frequentist. For example, assigning a probability to events that have never happened before, but which I consider important to take into account.

For a subjectivist, the fact that the relative frequency of a certain type of flood is 0 does not constrain me from still considering the event possible, and assigning it a probability that may be small but not null. I will thus be less likely to fall into the historical bias. Similarly, I will be less prone to model risk if I always leave open the possibility of being wrong, and perhaps I will commit to quantifying it.

Probabilities on Probabilities and Radical Probabilism

The idea of probabilities on probabilities then allows us to introduce a particular type of probabilistic subjectivism, emblematically represented by Richard Jeffrey. Paraphrasing the famous mythological and cosmological expression of Indian origin “Every turtle stands on another turtle,” Jeffrey coins his own “every probability stands on another probability,” defining what he calls radical probabilism.

This is the belief that nothing can be considered certain or precise (not even the data), so there is a constant need to update and review our beliefs, in an infinite regress of probabilities on probabilities.

Final Thoughts and Recommendations

In summary, for a subjectivist, probability is something personal, dependent on us and our interaction with the world. The best practical approach is to use the state-of-the-art in modeling, to look at the facts, to collect and analyze data, and to behave as rational and scientific people.

That’s it for today, but we will return to many of the concepts expressed very soon. Finally, here’s the quote. This time we go to the Netherlands in the second half of the 1600s, with Christiaan Huygens, who writes: “I believe that we know nothing for certain, but everything as probable.”

Until the next episode, with logicism and the axiomatic view.

Episode 11: Fighting the hydra of frequentism, or why subjectivism is preferable

Introduction 

Hi there. Welcome back to The Logic of Risk. This is episode 11, and we are nearing the end of our overview of probability definitions. In a couple of weeks, we’ll return to risk and how to quantify it. So, what are we dealing with today?

Today we’re talking about logicism, while in the next episode, we’ll complete our journey with the axiomatic definition of probability. Let’s go.

Objectivism vs. Subjectivism in Probability 

So far, we’ve discussed objectivism and subjectivism—diving into the debate between those who view probability as something objective, existing in itself, and those who argue that probability cannot be separated from the subject since it is formed in our minds and expresses our beliefs, inclinations, knowledge, context, and even our fears when facing what is random.

Apart from this dispute, there is the classical definition, which does not belong to the discussion due to historical reasons and its extreme limitations. As I’ve mentioned several times, there are other definitions. Some try to bridge the gap between objectivism and subjectivism—and that’s the case with logicism. Others resolve the issue in a metaphysical way, abstracting themselves, choosing not to decide or “get their hands dirty,” and instead focusing on the mathematical properties and uses of probability. That’s the case with the axiomatic definition.

Other Schools of Thought in Probability 

There are a few other minor schools, but we won’t consider them. This choice is mainly due to two reasons.

Reason number 1: Some are merely refinements of what we’ve already discussed, which would lead us too deep into the details. A typical example is the predictive definition, which in its classical version is highly limiting and often flawed—suffering tremendously from historical bias and other issues—and in its modern version is nothing more than a nuance of the empirical Bayesian approach.

Reason number 2: Some definitions are purely speculative, with applications of little or no interest for us, not to mention that they’re often not even clear in their implications. One example is the “best Systems” definition introduced by David Lewis, now a somewhat trendy topic in certain circles. From one point of view, it’s a kind of propensity-based approach, but it tries to improve the treatment of individual phenomena by borrowing from the epistemic perspective.

The Relevance of Logicism Today 

To be honest, some of my colleagues might even spare you the logicist discussion for this very reason. But personally, I’d consider that a grave mistake. If it’s true that for a long time the logicist view of probability was confined to academic debates, the recent resurgence of machine learning techniques is now leading to a practical application of the logicist definition.

I use the word “resurgence" because the first neural networks date back to the 1940s—only back then we didn’t have enough computing power. Some might then say that, in the end, logicism is nothing more than the Bayesian approach in disguise—but let’s take it step by step.

Logicism as an Extension of Logic 

Do you remember when, a few episodes ago, I mentioned Jaynes and his idea of probability as the logic of science (which is also the title of his monograph)? Well, the logicist definition of probability does just that. It attempts to go beyond classical logic, where a given proposition—which for us becomes an event—can only assume two values: false and true, or 0 and 1 if we use Boole’s coding.

For example, the proposition “the speaker of this podcast is named Pasquale” is true and takes the value 1, while “the speaker of this podcast is named John” is false and takes the value 0. Now, strictly speaking, I know one of these propositions is true and the other false—but someone might think I have two names (I haven’t excluded this possibility), so my being called “Pasquale John” would be an option to consider.

Beyond Binary Truth Values 

Now, classical propositional logic and its various evolutions work very well and allow us to describe and solve a host of fundamental problems. Just think of how essential that Boolean 0–1 coding is in computer science. However—and this is a big however—there are many situations where the truth or falsity of a proposition isn’t so black and white.

We might face partial truths or falsehoods, either because our knowledge is limited, because we’re talking about the future, or because a dichotomous description would be forced. If I say that tomorrow will be sunny, today I have no way to assert whether that statement is completely true or completely false. And if I mention rain, there’s a significant difference between a few drops and a torrential downpour.

The Role of Probability in Managing Uncertainty 

Now imagine that 0 and 1 don’t mean false and true, but rather that a given event is impossible (0) or certain (1). Or even better: that it either does not happen or does happen. Every value between 0 and 1 then becomes a quantifier of the possibility of our event happening. A certain event takes the value 1. An impossible event takes the value 0. A moderately possible event might have a value of, say, 0.5, while values like 0.2 or 0.8 would represent events that are scarcely or very likely. And so on.

In the logicist view of probability, probability expresses the degree of possibility of a proposition—of an event. It indicates how true that proposition is (hence, how certain) or false (hence, how impossible). More precisely, it expresses the extent to which one proposition informs us about the certainty and truth of another.

Analytical Philosophy and Logicism 

This management of contingency is one of the key points embraced by analytical philosophy. The logicist view of probability is widespread among analysts, who have been studying and dissecting it for decades—with results of absolute significance. Among them, one cannot fail to mention the great Kripke, a giant of analytical philosophy, a prodigy, and one of the foremost exponents of modal logic, intuitionistic logic, and many other areas.

Historical Foundations of Logicism 

The logicist view of probability goes much further back in time. In his Tractatus Logico-philosophicus, Wittgenstein hints at his own view of probability—a view later independently expressed by John Maynard Keynes in his Treatise on Probability, a work highly praised by Russell.

Wittgenstein’s Perspective on Probability 

In proposition 5.153, Wittgenstein writes: “A proposition is in itself neither probable nor improbable. An event occurs or does not occur, there is no middle course.”

Then, in proposition 5.155, he adds: “The minimal unit for a probability proposition is this: The circumstances—of which I have no further knowledge—give such and such a degree of probability to the occurrence of a particular event.”

And in proposition 5.156 he concludes: “It is in this way that probability is a generalization. It involves a general description of a propositional form. We use probability only in default of certainty—if our knowledge of a fact is not indeed complete, but we do know something about its form.”

Conclusion and Next Steps 

The natural question that arises at this point is: is that number between 0 and 1 objective or subjective? Among the various logicists, some lean toward the objectivist perspective, while others favor the subjectivist one.

Among the objectivist logicists, we find Keynes, who argues that probability is not subjective, as it is not subject to human caprice. We will explore this more in the next episode.

Episode 12: Keynes, Jeffreys, Jaynes and finally Kolmogorov

In this episode we continue discussing logicism before moving on to the axiomatic definition of probability. The last topic in our overview. Let’s start!

Wittgenstein and the Logical Picture

In the previous episode, we talked about Wittgenstein. According to him, probability is a way of distilling a broader logical picture. Every statement perfectly mirrors its part of reality, so when we speak of probability, we’re really just focusing on a slice of a larger, complete tapestry of propositions.

In this view, probability emerges not from a vague property of the world but from the fact that any single statement only captures a part of the whole picture—much like Laplace's Demon illustrates our limited perspective.

Keynes and Objective Logical Probability

Then we talked about Keynes, and noted that according to him, probability is a logical relation between possibilities or degrees of truth.
Given two propositions, there exists only one probability relation between them. I can choose the propositions, but once that choice is made, the relationship between them is determined. It is a unique and objective relation, and any disagreement about it arises solely from a misinterpretation of the initial propositions.

According to Keynes, probability should be understood as the expression of the beliefs of a representative rational agent. In this idea of rationality, the economist Keynes truly shines.

Like Wittgenstein, Keynes holds the idea that probability arises from a lack of information. If we truly had all the necessary information, then the conclusions we logically arrive at would give us certainties. But given a fixed amount of incomplete information, what we obtain is an objective logical probability.

Ramsey’s Subjectivist Critique of Keynes

Speaking of subjectivism, you may recall that I introduced Ramsey. We mentioned that his subjectivism acts as a critique of the Keynesian view. In fact, for Ramsey—who was one of Keynes’s students—when Keynes speaks of probability relations between propositions he never defines them clearly. He merely states that these relations define a universal corpus, detached from individual subjects, and that in certain cases they can be clearly perceived.

Ramsey, however, rejects this, saying that he personally does not perceive them, and that perception, which is clearly subjective, cannot serve as the foundation for something objective. Thus, Keynes would need a more convincing argument—a need he never quite meets.

Conditional Reasoning and Criticisms of Keynes

It must be noted that Keynes’ approach is mainly a conditional one. It relates two or more events or propositions. If I know that A necessarily implies B, and if A occurs, then the occurrence of B is certain. If A does not occur, can I then exclude B? And if I observe only B, what can I say about A? This is, in essence, where Keynesian speculation begins.

Keynes’s assertions regarding the probability of simple propositions—individual facts, i.e. just C or just D without any evident relationships between them—are even weaker and have been subject to criticism. This is because such cases require the use of the principle of indifference, which Keynes seemingly excludes, thereby creating a contradiction.

The Principle of Indifference and Broader Critique

Recall that this principle, as we discussed earlier, requires that in the absence of contrary evidence every agent must assign the same probability to equally possible alternatives. However, it remains a controversial principle, not easily justifiable in absolute terms.

In short, if the Keynesian view seems complicated, it is because it is indeed complicated. Much like Popper’s propensity interpretation—which strives to reach conclusions that cannot be achieved objectively while overcomplicating the matter—the Keynesian approach faces similar criticisms.

Salmon’s Test: Admissibility, Ascertainability, Applicability

Let’s say that neither Keynesian logicism nor Popper’s propensity approach fully pass Salmon’s test, according to which a definition of probability is adequate only if it meets three criteria: admissibility, ascertainability, and applicability.

Admissibility means being compatible with the axioms of probability calculus—that is, with the axiomatic view (which we will discuss in a bit). I would say that all the definitions we have considered are admissible.

Ascertainability refers to whether a given definition can yield numerical and measurable probabilities. Here the discussion is more complicated, because the propensity approach is not truly capable of handling individual phenomena, much like frequentism—and so what numbers does it provide in such cases?

For logicism, apparently, there are no issues, also because, as in subjectivism, the distinction between mass random phenomena and individual events is much more blurred, if not entirely absent.

However, regarding Keynes there is a nuance… just give me a minute.

Applicability simply refers to the possibility of practically using a given definition of probability. Here we must clarify what we mean by “practical.” If we refer to specific cases, then I would say that all definitions are applicable—even the classical one. But if we mean a general, uncompromising approach, then it’s easy to see almost all the definitions we have considered breaking down one by one.

With maybe the exception of the subjective one, and yet… Still, I wonder how wise it is to limit ourselves to only a few tools.

Keynes and Imprecise Probability

Now, returning briefly to the ascertainability of the Keynesian view: the answer is—it depends. If we mean numerical probabilities, clearly the answer is yes.

However, there is in Keynes a truly excellent idea that was for years overlooked but is now coming back into fashion, and rightly so.
The numerical probability—the number between 0 and 1 that we are focusing on—is, for Keynes, only a specific case of the broader concept of probability, which in itself need not be quantifiable or comparable.

In this way, Keynes anticipates ideas such as imprecise probability, according to which it would be more appropriate to reason in terms of probability intervals—results that often run into the minimal facts we discussed in our early episodes.

Along these lines belong also some definitional considerations on indeterminacy, as well as various findings by different scholars on pseudo-probabilities, ambiguity, and more. These are cases that an orthodox approach might not even consider strictly admissible. But these are issues to which we will return in the future.

Carnap, Jaynes, and Jeffreys

Other scholars of the caliber of Carnap have tried to establish an objective logicist probability, but their results do not differ much from Keynes’s—unless one restricts probability to very specific branches of logic, neglecting its practical use. In doing so, they leave poor Salmon unsatisfied.

More interesting, in my view, is the approach of Jaynes and Jeffreys, which can be seen as a kind of logicism with a more subjectivist twist, as well as a commitment to inductivism.

For Jeffreys, in particular, probability expresses a relation between a proposition and a set of data. In other words, probability is a purely epistemological notion that expresses the degree of belief we have in the occurrence of an event based on the given evidence.

A reasonable degree of belief is one uniquely determined by the available information. According to Jeffreys, if two people have the exact same information and act logically and rationally, they should arrive at a single, identical probability evaluation. If not, one of them is in error—and the other may, if they wish, take advantage of it.

Jeffreys vs. de Finetti

This idea echoes Keynes, while at the same time standing in opposition to de Finetti, for whom it is entirely natural that two people—even with the same information—may disagree on probability.

This is because, according to de Finetti, probability is not rational; rationality is merely an invention of some scholars and does not exist in reality. In reality, there are our sensations, our more or less founded beliefs, and our limitations. What counts is the coherence of probability with each individual’s beliefs—and nothing else.

It is on individual events and on the possibility of always expressing numerical probabilities that Jeffreys aligns with de Finetti, distancing himself from Keynes and even from the frequentists. For Jeffreys, it is always possible to define a probability and elicit it—something that neither Keynes nor von Mises, nor, if you recall, Peirce, consider feasible.

Jeffreys, like Jaynes, was also a fervent Bayesian, advocating the use of Bayes’ theorem as a method of learning in statistics and dealing with uncertainty.

Indeed, if probability depends on the available information, it is natural—according to Jeffreys—that when this information changes, the subject updates the probability.

However, his logicist tendencies led him to define the so-called Jeffreys priors, which tend to be less subjective—not as closely tied to the evaluator’s personal beliefs, but more connected to empirical evidence.
This is exactly in line with his view of probability as a relation between a proposition, an event, and a set of data.

Jaynes and Entropy

In short, you can see that the logicist view of probability is quite rich because of its hybrid nature.

Jaynes, for example—similar in many respects to Jeffreys—tends to be more subjectivist than the latter, aligning more with de Finetti while still maintaining a logical approach.

The interesting thing about Jaynes, simply put, is his idea that probability is not the product of our knowledge or our beliefs, but rather the apophatic result of our ignorance. Probability derives from what we do not know.

Once again, Laplace’s demon makes an appearance. Jaynes formalizes this idea of probability as ignorance very effectively, using the physical concept of entropy.

But let’s close this logicist parenthesis here. I plan to add further details when the opportunity arises. As I mentioned last time, today—given the resurgence of machine learning—many logicist insights have interesting applications.

The Axiomatic Definition of Probability

The last definition I want to briefly consider with you, merely as an introduction (since it will accompany us in the rest of our conversations and we will have the chance to delve into it), is the axiomatic definition.

In reality, the axiomatic definition is a non-definition. It defines what probability is, but in a purely mathematical way, without concerning itself with everything we have said about subjectivity and objectivity.

According to the axiomatic definition, probability is something metaphysical that obeys certain mathematical properties, which are assumed as axioms. This definition—traced mainly to Kolmogorov, though it already found a strong supporter in Hilbert—stands above the fray; it does not get its hands dirty with objects, subjects, propositions, propensities, and all that mess.

No—in a grand spirit of mutual respect, the axiomatic definition agrees with all the other definitions, as long as they don’t cause trouble and satisfy its basic axioms.

This is why Salmon requires that the various definitions be admissible: they must be compatible with the axiomatic one. And why? Because the axiomatic definition is the basis of probability calculus, the starting point for all models and formalizations. There are alternative “metaphysical” approaches, but at the moment they are in the minority.

Kolmogorov’s Three Axioms

So, what are the properties required of probability to define the axiomatic view?
There are three, from which we can derive, according to mathematical criteria, fundamental theorems and methods for probability calculus.

Axiom 1 (Non-negativity):
The probability of an event in the sample space is a non-negative real number, that is, it is greater than or equal to 0.
Remember—the sample space (or state space) is nothing other than the set of all possible outcomes of a random phenomenon.

Axiom 2 (Normalization):
The probability of the certain event—defined as the occurrence of at least one of the simple events that make up the sample space—is equal to 1.
For example, in our case of a die roll, if I roll the die, something will happen with probability 1.
This is known as the normalization axiom, and it allows us to distinguish probability from other quantities used in measure theory, a very important branch of mathematics.

Axiom 3 (Sigma-additivity):
If we take a sequence of mutually exclusive events, the probability of the union of all these events is equal to the sum of the probabilities of the individual events.
Recall that two events are said to be mutually exclusive if the occurrence of one cannot happen simultaneously with the other.

This third axiom is the most subtle of the three, and its discussion could easily be the subject of two or three university lectures—depending on how meticulous or pedantic the professor is.

Here, we will simply say that in taking the sequence of events these can be infinite in number, provided they are countable. That is, in a hypothetically crazy attempt to count them, we can do so using the natural numbers (0, 1, 2, 3, 4, …), which, as you know, are infinite.

In mathematics there exist different types of infinity, and countable infinity is the easiest to grasp. I won’t go any further, lest we fall into a black hole from which we might never escape—and perhaps go as crazy as poor Cantor.

Thus, the third axiom, also known as sigma-additivity, tells us that we can sum probabilities indefinitely, and that the sum of these probabilities will equal the probability of the union of all the considered events, provided they are mutually exclusive. If they are not, one must account for the intersections so as not to count the same sub-event more than once.

Implications and Philosophical Notes

From these axioms, we can immediately derive fundamental results—for example, that probability is a number between 0 and 1, that the probability of the null or impossible event is 0, or that if the probability of a given event is P (with P a number between 0 and 1), then the probability of its complement is 1–P, and so on.

A small side note: even on Kolmogorov’s axiomatic view, de Finetti (and not only him) had objections. In fact, for de Finetti, just as the concept of the frequency limit—which requires waiting an infinite amount of time—is meaningless, so is summing probabilities ad infinitum.

For de Finetti, probability must deal with the finite, since we humans are incapable of handling infinity. After all, if probability is subjective, it also depends on our limitations—such as our inability to count to infinity.

Thus, for de Finetti, the third axiom should be replaced with one of simple additivity, where the sum over events is finite. The events can be many, but not infinite. Philosophically, I share this view.

Conclusion and Farewell

As mentioned, we will return to the axiomatic view several times in future episodes, since it represents the framework within which we will define many objects of interest—such as risk measures.

It is only when we put different models and tools into practical use that we cannot avoid getting our hands dirty, descending from Kolmogorov’s realm of ideal forms and returning to the debates between objectivists and subjectivists.

Alright, that’s enough for today—I believe I should stop now, also because we’re exceeding the 20-minute limit, but we do it for a good cause, as they say.

I’d like to close with a quote by Bertrand Russell:
Probability is the most important concept in modern science, especially because no one has the slightest idea of what it means.

Until next time, if you’re willing to take the risk.
And yes, we will be back to risk indeed.

3- Risk Measures

Episode 13: Back to risk, introducing risk measures

Defining Risk and Its Components

In the first episode of this podcast, we defined risk as the possibility of a loss—namely, an event with negative consequences—caused by internal and/or external vulnerabilities and reasons, which can potentially be mitigated, at least in part, if not completely avoided. In the subsequent episodes, we focused on specific parts of this definition.

Understanding Damage and Its Measurement

We started with the concept of damage, highlighting how quantifying it is less straightforward than one might initially think. We emphasized the need to develop a “damage profile” and noted that we will soon see why this is necessary—namely, why we need to look at damage from various perspectives, such as its nature, extent, and timing, in order to establish a basic starting point.

Even during our discussion of the concept of damage, we questioned the idea that its measurement can be entirely objective, introducing the need for an axiological perspective that takes into account the evaluator and their value system. Remember that, at the end of the day, risk management involves transforming data and facts into values and judgments. And these are rarely objective; they depend on the individual and the context. I probably don’t need to remind you how many things once considered acceptable no longer are today.

Although there will always be nostalgics, we’re fortunate not to distinguish anymore between the value of a slave’s life and that of a free man—just to mention an easy example. Or at least we pretend not to distinguish formally, having abolished slavery, though many masks fall away when the death of a fellow citizen resonates more than that of an immigrant.

Probability and Risk Assessment

After briefly examining damage, we turned our attention to the possibility of damage happening, looking at probability as one of the tools to manage this contingency. This led us to the various definitions of probability that we discussed together—if you haven’t listened yet, I encourage you to do so.

The underlying message is that there is no single definition of probability, that probability is not a monolith, nor a Revealed Truth carved in stone. Different schools of thought exist, each with its own dignity and each more or less useful depending on the situation. Certainly, as I’ve said several times, I find the subjectivist view more coherent; and if you want, nothing stops us from seeing the others as special cases. In the end, when I choose to be a frequentist, I’m expressing a personal preference, so the frequentist approach can easily be seen as a particular manifestation of my being a subjectivist.

What’s important is to remember the aspect of choice and not assume that the limit frequency, or propensity, or logical definition is “Truth” with a capital T. I know I’m repeating myself, but if we don’t get that through our heads, everything else might as well go down the drain.

The Next Steps in Risk Analysis

Returning to our definition of risk, we still have some essential elements to address: identifying internal or external vulnerabilities and reasons, and devising strategies to mitigate—or even avoid—risk. That’s what we’ll be focusing on from now on.

To do so, we need to refine the tools at our disposal, tools that are essential for carrying out the four key tasks of risk management:

  1. Identify risks.
  2. Quantify them.
  3. Prioritize them—that is, create a ranking of risks based on acceptability and available resources to address them.
  4. Communicate clearly and efficiently.

If you think about it, in these last weeks of podcasting, we’ve covered a lot of ground, which I hope has been interesting for you. Considering that the best is yet to come, make yourselves comfortable, because I still have many hours of rambling stories for you.

Risk Measures: A More Formal Approach

Today, we’re going to take a small step forward and begin talking about risk in a more formal way. For instance, we’ll try to understand more about risk measures—those mathematical-statistical constructs that allow us to quantify and rank risks.

Under the common definition of “risk measure,” we find many quantities that I’m sure you’ve already encountered in your studies or at work. We can mention, for example, the various means—from arithmetic to geometric—the median, the mode, the standard deviation, the variance, the value-at-risk (VaR), the expected shortfall, and so on. If you don’t know them, don’t worry; we’ll redefine them when we need to use them.

To be precise, we can distinguish between positional measures—such as the mean, the median, and the value-at-risk—which help us get a sense of the order of magnitude of the quantities involved, and dispersion measures—those like the standard deviation, the variance, or the absolute mean deviation—that tell us how variable and volatile those quantities of interest are. I use positional in a rather general way, including location measures, not to be too specific. Purists will excuse me.

In quantifying risk, a positional measure will allow us to estimate, for example, the average potential loss, while a dispersion measure helps us understand how reliable that average potential loss is in reality.

Defining a Risk Measure

Before we delve into some formalization, how can we define a risk measure in general? A risk measure, as the name suggests, is something that allows us to measure risk, that is, the possibility of a loss, of a damage. In other words, it is a tool that transforms a potential risk into a number that we can then use for our evaluations.

Mathematically, a risk measure is a function mapping from the space of potential losses to the extended real numbers—that is, the set of real numbers, plus positive and negative infinity. Indeed, in some situations, we want our risk measure to be infinite, and we’ll later discuss what that means.

Mathematical Foundations: Sample and Event Spaces

If you recall, we talked about the sample space—or state space. We defined it as the fundamental set that contains all possible single outcomes of a random phenomenon. In probability, it’s customary to denote this space by the Greek letter Omega (in its uppercase form).

We also mentioned the event space, i.e., the set of all the events that a random phenomenon can generate—essentially, the set of all single outcomes and all possible combinations of single outcomes.

As you can see, the number of elements in the event space is substantially greater than that in the sample space. Yes, it may seem trivial, but formally proving these facts may involve Cantor and cardinalities, which is a treacherous terrain we’ll avoid for now. Let’s consider a manageable example.

Conclusion and Next Steps

From a mathematical point of view, the event space we’ve defined as the power set of the sample space directly forms what is called a sigma-algebra. If, on the other hand, we were to limit ourselves to a subset of the power set, we would have to verify this very important property. And what is a sigma-algebra?

Well, we’ll answer that in the next episode.

Episode 14: Sigma-algebras and probability spaces

Introduction to Sigma-Algebra

Now, let’s return to our discussion. And in particular, to the question we ended with in the last episode.
What Is a Sigma-Algebra?

Given a set, such as our sample space Omega, a sigma-algebra or sigma-field on Omega is a family of subsets of Omega that satisfies a few properties—three properties, to be precise.

Properties of Sigma-Algebra

First property: Omega itself is in the sigma-algebra.
This is clearly true for the event space if it contains all the subsets of Omega, including Omega itself.

Second property: Closure under complementation.
This means that if I take a subset of Omega that belongs to the sigma-algebra, then its complement also belongs to the sigma-algebra. For example, if I roll a die, getting an odd number is an event, and getting an even number is its complement.
If the event space is the power set of Omega, as we said last time, closure under complementation is trivially satisfied. For every event defined as a subset of Omega, its complement is also a subset of Omega, and hence part of the event space.
This is also true in the case of the certain event, namely the entire sample space. In fact, the event space includes the null event, that is, the empty set, which is the complement of the certain event.
In the roll of a die, the certain event is that one of the numbers from 1 to 6 appears. The null event is that nothing comes up when I roll a die—which is clearly impossible in our idealized game, where a face must necessarily be revealed.

Third property: Closure under countable unions.
The third property tells us that the countable union of elements in the sigma-algebra defines a new event that belongs to the sigma-algebra. “Countable union” means it can be an infinite union, as long as it can be indexed by the natural numbers.
Simply put, if I take events from the event space and combine them by taking their union—say in a set containing {a, b, c}, I take a and b and unite them into {a, b}—then that pair belongs to the event space. If the event space is the power set, this is again trivially true.

Measurable and Probability Spaces

Just as the sample space is usually called Omega, the event space is usually called F, writing the capital letter in cursive. The pair (Omega, F) is collectively called a measurable space.
Why “measurable”? Because it represents the minimal structure on which we can mathematically define a function that measures its components.
For us, this measure is probability, the probability measure, which we call P.
Probability is therefore a function that takes a subset of Omega—that is, an event belonging to F, the event space—and returns a number between 0 and 1, interpreted as the likelihood of that event’s occurrence.
We require this probability function P to respect the axioms of probability we listed when introducing the axiomatic definition.
Hence, P will assign the value 1 if we consider the certain event, the value 0 if we consider the null event, and something in between for every other possible event.
This function also lets us calculate the probability of a complement event as 1 minus the probability of the original event, and it can handle countable unions of mutually exclusive events.
The key here is sigma-additivity: in a sigma-algebra like the event space, we can consider countable unions of events, and we need probability to accommodate them.

On Proper Subsets and de Finetti’s View

Ok, important remark: if the event space does not coincide with the power set (as we’re assuming here), but is rather a proper subset of it, we are not automatically guaranteed that such an event space is a sigma-algebra. We would have to verify it in order to work with the standard axiomatic definition of probability.
If you recall, I also mentioned that for de Finetti, sigma-additivity makes no sense, and that finite additivity is preferable. In that case, we wouldn’t need sigma-algebras but algebras, where only finite unions of events must remain within the algebra, without worrying about infinite unions.
Essentially, nothing else changes. This has roots in de Finetti’s more subjective approach, where probability focuses on personal degrees of belief, and depends on our limitations, such as our inability, given our limited amount of time on earth, to count to infinity.
Again, for de Finetti, probability should not depend on the possibility of handling infinite collections.
But let us come back to the standard axiomatic approach.
So, again, P must satisfy the three axioms of the axiomatic definition.
The exact details—whether probability is objective or subjective—are issues we’ll set aside for now, because all the definitions we’ve seen so far are valid.
It’s not so much the mathematics that changes when going from one definition of probability to another, but rather how we interpret the different quantities. We will revisit these questions in more depth when we look at applications.

From Measurable Space to Random Variables

If (Omega, F) is a measurable space, then the triplet (Omega, F, P) is called a probability space, and that is all we need to start formalizing risk.
A probability space is the starting point for our calculations. In fact, starting from the probability space (Omega, F, P), we can define random variables, and these random variables will be the tool we use to represent any possible damages or losses, which we then measure with risk measures.

Discrete and Continuous Random Variables

What is a random variable?
Put simply, it’s just a quantity (mathematically a function) that can take different values depending on different random events.
Take the toss of a coin. We could define a random variable, which we call X, that takes the value 0 if tails comes up and 1 if heads comes up.
If the coin is fair, we can say that X is 1 with probability 1/2 and 0 with probability 1/2.
In this case, X is a discrete random variable because it can take on a finite number of values: 0 or 1.
A variable is called discrete if the number of values it can take is finite or countably infinite.
Specifically, our X here is a Bernoulli random variable, meaning it takes two values, 0 and 1, each with a given probability.
The equally likely scenario (1/2 and 1/2) is just a special case. More generally, a Bernoulli X might be 0 with a probability p of, say, 70% and 1 with a probability 1-p, of 30%, or 15% and 85%, and so on.
Other examples of discrete random variables include the binomial, the Poisson, and categorical quantities, among others.
Consider the number of emails you receive in a day. This can only be 0, 1, 2, 3, and so on—whole numbers—, you cannot get 1.7 emails.
So a random variable describing the number of emails you receive is definitely discrete.

If a random variable can instead take a continuous range of values, we call it continuous.
Examples of continuous random variables include the normal, the exponential, the Pareto, and the lognormal.
The time you wait at a bus stop can be 3.2 minutes, 10 minutes, 5.17 minutes, or any other real number in an interval.
So, in this case, your random waiting time is a continuous random variable.
But we can make it discrete if, for example, we count only whole seconds and ignore smaller divisions of time.

Why Random Variables and Risk Measures Matter

Why do we need random variables?
Simply because they allow us to be more flexible and to use more powerful tools—namely, risk measures.
Mathematically, it’s not very practical to work directly with heads and tails, or with broken bones, lost work hours, number of infected individuals, power blackouts, feelings of sadness, or any other random event in its raw form.
These events are better handled if we represent them by variables that take numerical values we can add, multiply, divide, transform, and so on—making it much easier to do calculations and analyze scenarios.

A coin toss, for example, always represents the same phenomenon, but by adjusting the random variable X associated with it, I can decide that tails is 0 and heads is 1, or tails is –1 and heads is 2, or any other values I find useful.
So I can start with the same real or imaginary random phenomenon and use it to describe different situations.
For example, we will see that a Bernoulli random variable can be used to study the default of a company, as well as my chance of winning at the casino.

Analyzing and Interpreting Risk

Once I define the most suitable random variable or variables to represent the potential damages or losses I want to protect myself from, a risk measure allows me to study those damages, those losses, to analyze them in depth, to quantify them, and to prioritize them.
For instance, I can ask myself what the expected loss might be, and then consider whether that average is really representative of the situation.
After all, I can get an average of, say, 10 from 9 and 11, but also from 0 and 20.
And if 9, 11, 0, and 20 represent millions of euros in losses, or the number of people injured in a disastrous event, there’s a clear difference between these situations—even though the average is the same.
So, we shall see that the right choice of the right risk measure is of paramount importance.

I can also ask with what probability I might face the worst potential losses, thus focusing on what we call the tails.
Especially in fields like natural disaster management or certain areas of finance—damages and losses often follow sub exponential if not fat-tailed distributions, such as the lognormal or the Pareto.
These imply that extremely large losses, though rare, can happen more frequently than a normal distribution might suggest.
That’s why looking at the tails of the distribution is crucial for proper risk management, as we will soon see, so don’t worry: we will be back.

Wrap-Up and Recommendations

I think today’s episode is already quite dense, so it seems wise to stop here without overdoing it.
Please stick with it—I encourage you not to give up listening.
Perseverance is key. It can take time to absorb some of these concepts, but I promise that in time, everything will become clearer.
And we’ll repeat anything that needs repeating.
Moreover, now you can also rely on the written text!
And on thelogicofrisk.com you will find a summary slide.

In the next episode, we’ll delve a bit more into random variables, for instance by talking about probability distributions.
We’ll also refine our discussion on risk measures, which we’ll want to have certain desirable properties for effective risk management.

Today, I recommend a little book by Darrell Huff that is both useful and fun.
Its title is
How to Lie with Statistics.

Closing Quote

Finally, today’s closing quote is from Warren Buffett, the Oracle of Omaha—the famous American entrepreneur, investor, economist, and philanthropist. Someone who definitely knows a thing or two about risk.
“Only when the tide goes out do you discover who’s been swimming naked."

With what we’ll see together, we’ll try to at least wear a minimal swimsuit—if wetsuits are out of reach.
Nothing against naturism, of course, as long as it’s a conscious choice.
It’s awkward to swim naked if you’re then ashamed about it.

Episode 15: The cumulative distribution function of a random variable

Recap and Learning Tips

If you recall, in the last two episodes we introduced some technical terminology and a bit of formalism. We talked about measurable spaces and probabilities, and then moved on to random variables. I hope the discussion has been sufficiently clear, and once again I ask you not to give up if something isn’t clear—in fact, please let me know. Please also note that on thelogicofrisk.com, you can find additional materials—including full transcripts of my speeches—to ensure clarity in case the audio quality or my pronunciation isn't perfect.

Advice for New Listeners

If you are new here, a word of advice: please listen to everything discussed so far. We will need all these concepts. For pedagogical reasons, I tend to repeat myself, but I cannot dedicate the same amount of time to each concept every time.

Definition of Random Variables

Speaking of random variables, we mentioned that these are quantities that can take on different values—with given probabilities—in correspondence with different random events. We use them to describe randomness in a more flexible and mathematically manageable way.

Introducing the Cumulative Distribution Function (CDF)

One of the tools used to describe how a random variable assumes various possible values and the probability with which it takes each value is called the cumulative distribution function (CDF) or cumulative probability function. It is usually denoted by the capital letter F. Please note that capital F should not be confused with the cursive or calligraphic capital F, which we use to represent the sigma-algebra corresponding to the event space. Given a random variable (capital X), its cumulative distribution function F, evaluated at a lowercase x, tells us the probability that X is less than or equal to x. Remember that in probability theory uppercase letters denote random objects (such as random variables or probability measures), while lowercase letters denote specific values. Thus, while X represents the random variable, x is simply a placeholder for a particular value—say, 4. The value of F evaluated at x=4 tells us the probability that X is less than or equal to 4. Clearly, F depends closely on our probability measure P, defined on the probability space (Ω, 𝒻, P), where Ω is the sample space and 𝒻 (the calligraphic F) is the sigma-algebra of events. To spare you further repetition, I will stop specifying uppercase versus lowercase unless necessary. For any doubts, feel free to check the scripts.

Support and Example of a Discrete CDF

The cumulative distribution function F is defined on a set of values the random variable can take. This set is called the support of the distribution. Let’s take an example. Assume that a given investment can generate only three outcomes. I deliberately choose small, easy-to-handle numbers. We denote the profits and losses of the investment by the random variable X. Based on historical data—or because someone we trust told us, or even by divine revelation—we know that we can expect a profit of 2 euros with a 40% probability, a zero profit (that is, 0) with a 10% probability, and a loss of 1 euro (i.e., –1) with a 50% probability. Thus, ordered from smallest to largest, the values –1 (the loss), 0, and 2 represent the support of the random variable X and serve as the reference points for its cumulative distribution function. This function is calculated as follows.

Constructing the CDF Step-by-Step

In our problem, what is the probability of observing a loss greater than one euro in absolute terms? It is clearly 0, since we know that at most we lose 1 euro (with a 50% probability). Therefore, our cumulative distribution function F will be zero for every value less than –1. For instance, –3.5 would represent a loss of 3.5 euros, but in our example that outcome is impossible—so its probability is zero. And what is the probability of losing exactly 1 euro? It is 50%, so our cumulative distribution function jumps from 0 to 0.5 at the point –1. In other words, the probability of observing a value less than or equal to –1 is 0.5 (we assign 0 probability to values lower than –1 and add 0.5 at –1). What is the probability of observing a value less than or equal to 0? Well, we can observe –1 with a probability of 50% and 0 with a probability of 10%, so our CDF at 0 jumps from 0.5 to 0.6. This is because we are accumulating the probabilities—which is why the CDF is also called the cumulative probability function. So, what is the probability of observing a value less than or equal to 1? It remains 0.6 (or 60%)—in other words, we can observe –1 and 0, but not 1. We must wait for the value 2 for another jump in F. The probability of observing any value strictly less than 2 remains fixed at 0.6, and then, only at 2, does it jump to 1. In fact, we have 0.6 plus 0.4 (the probability of observing 2). For values greater than 2, our cumulative function cannot increase further and will remain at 1, since values above 2 are not possible in this example. Once the value 2 is reached, we have encountered a certain event—that is, one of the possible outcomes (–1, 0, or 2) has occurred—and the probability reaches 1 and cannot go any higher.

Key Properties of the CDF

Based on what we have just seen, we can already note some important points. The cumulative distribution function, the CDF, accumulates the probabilities associated with the random variable X. Hence, it represents the probability that a random observation is less than or equal to the specific value where we evaluate F. Being a cumulative probability, the more outcomes we consider (once we have sorted them from smallest to largest), the higher the cumulative probability becomes. F will always be 0 to the left of the smallest outcome (which can, as in our example, be a negative number) and will always be 1 to the right of the largest outcome. These extreme outcomes are called the endpoints of the support—the left endpoint and the right endpoint. In our example these endpoints are finite values (–1 and 2), but sometimes—at the modeling level—we might assume the support to extend to minus and plus infinity. In that case, we say that the limit of F is 0 for values of x tending to minus infinity and 1 for values tending to plus infinity. As we move from the smallest outcome to the largest, the cumulative distribution function either remains constant or increases, but it cannot decrease. It takes values between 0 and 1. In fact, it is a non-decreasing function that represents cumulative probability. In our example it was 0.5 for values less than or equal to –1, jumped to 0.6 at 0, and remained constant at 0.5 between –1 and 0. Once again, on thelogicofrisk.com you can find a graphical summary of what has been discussed so far.

Mathematical Expression of the CDF

A cumulative distribution function can be derived, as we did—and as you can clearly see in the picture on my website—by ordering the outcomes and manually accumulating the probabilities. Whether there are 3 outcomes or twenty, it is not a big problem. It may be tedious, of course, but it is entirely manageable. However, for modeling purposes it is often useful to express F as a genuine mathematical function. For example, I could define F for my random variable X as follows: F(x) = 0 for values of x less than 0, and F(x) = 1 – e^(–λx) for non-negative x, with e denoting Euler’s number, and lambda a positive parameter that, we shall see, is called intensity. This means that the random variable X is exponentially distributed—or in short, X is an exponential.

Continuous and Discrete Distributions

An exponential is an example of a continuous random variable (since it can take on any value, not just discrete ones, between 0 and plus infinity); it is widely used in statistics and has countless applications. We will encounter it again soon. Of course, I can also have discrete random variables, such as the one in the example we considered. Or we can we think about the Bernoulli random variable from the last episode. Its cumulative distribution function is 0 for values of x strictly less than 0, equal to 1 – p for values from 0 (inclusive) up to (but not including) 1, and 1 for x values of 1 and above.

Practical Implications and Motivation

From the cumulative distribution function, many pieces of information regarding a given random variable can be obtained. In fact, It characterizes the probabilistic behavior of the variable we are studying to model risk. But this is something we will explore next time. So, please hold on and be patient. You will soon see how all these technical concepts allow us to do some quite interesting things in practice. It is not just sterile blah blah blah—like when people blabber about artificial intelligence these days without any idea of what a basic linear regression is. You need strong foundations to build a sturdy and valuable house. And that's exactly what we're doing—we are laying the foundations. Strong foundations.

Closing Thoughts

Galileo Galilei, one of the fathers of modern science, used to say that “l'universo è scritto in lingua matematica,” which translates into “the universe is written in a mathematical language.” It appears we have to learn at least a bit of this language in order to do something meaningful. Trust me, it is totally worth it. What we are learning together will turn out useful in your daily life. Just wait for it.

Episode 16: The quantile function, the survival function and the PDF

CDF and Quantile Function

In the last episode, we introduced the CDF of a random variable X, and we said that from it we can obtain plenty of information about X. Let’s see. By inverting the cumulative distribution function—and purists will forgive me for the inaccuracies—you obtain the so-called quantile function, usually denoted with an uppercase Q or an uppercase F raised to -1. Without getting lost in details, the quantile function answers a question that is the inverse of the one answered by the CDF. The cumulative distribution function F(x) tells us the probability of observing a value less than or equal to a given lowercase x. The quantile function does the opposite. It tells us which lowercase x corresponds to a given cumulative probability p.

Examples of Quantiles and the Median

In our investment example from the last episode, I might ask: what is the first value of X that corresponds to a cumulative probability of 60%? And the answer would be 0. In technical terms, we would say that 0 is the 60% quantile or the sixtieth percentile. But since we are talking about investments, I could also ask what is the threshold value that splits the total probability in two, such that 50% of the outcomes are lower and 50% are higher. For us, it would be -1. And that value, which leaves 50% of the probability to the left and the same amount to the right, is called the median. It is indeed one of the most famous quantiles. A quantile (or percentile), as you have understood, is a value of X that corresponds to a given cumulative probability.

Value at Risk (VaR)

Another famous quantile is the Value at Risk, an ugly name given by the quants at JP Morgan to a simple tail quantile. Renaming things is a quirk many have, especially when they have to sell you something. Returning to Value at Risk, or VaR: given a specific probability value, let’s say 95%, the 95% VaR is nothing more than the value of X such that 95% of the probability lies to its left and 5% to its right. So if I have a distribution of losses, in which losses are nonnegative quantities, represented by a given cumulative distribution function, the 95% VaR will simply be the first value of x for which the uppercase F assigns a cumulative probability of 0.95.

Survival Function

Now, if the cumulative probability at lowercase x is 95%, what is the probability of observing a value greater than lowercase x? Well, it will be equal to 1 minus the probability of observing something less than or equal to x, i.e. 1 – 0.95, which is 0.05 or 5%. Given the cumulative distribution function F, I can then define a new function, uppercase S, equal to 1 – F, which I will call the survival function, and which assigns to each lowercase x the probability of observing a greater value. In our earlier investment example, if I take 0, what is the value of the survival function at 0? It will be 0.4, that is 1 – 0.6, which we saw was the value of the cumulative distribution function at 0. You see, it doesn’t take much to introduce objects such as the CDF, the quantile function, or the survival function. On my website, thelogicofrisk.com, I have prepared a small visual summary for you.

Probability Density Function (PDF)

There is only one last important function associated with a random variable X. It is a function that, from a mathematical point of view, is obtained as the derivative of the cumulative distribution function F (uppercase). Do you remember the exponential random variable? We said that its CDF is 1 – e^(–λx) for non-negative values of x. Here, if we calculate the derivative with respect to x, we obtain λe^(–λx). While we use the uppercase F for the cumulative distribution function, for its derivative we use the lowercase f. The function f(x) is what we call the density function, or PDF, probability density function.

Discrete vs Continuous Variables

To be precise, it’s called the density function if the random variable is continuous, and probability mass function if it’s discrete. But if you always say density function or PDF, it’s fine, it’s understood from the context. In this podcast I’m not interested, at least not for the moment, and especially not if it’s not strictly necessary, in boring you with derivatives and integrals. So we won’t spend too much time on the density function as the derivative of the cumulative distribution function. What I’m interested in is explaining the meaning of the density function. If the cumulative distribution function gives the cumulative probability, the density function gives that point probability. But here you have to be careful, even if we are a bit rough around the edges as we are.

Understanding the Mass Function

In the case of discrete variables, what has been said is entirely true. The probability mass function in our example tells us that the probability of observing -1 is 50%, of observing 0 is 10%, and for 2 we have 40%. It also tells us that the probability of observing -0.5 or 3 is zero. For a Bernoulli variable, the density function tells us that the probability of observing 1 is a generic probability p, and that of observing 0 is 1 – p. Meanwhile, the probability of observing 0.5, -4, or 7 is zero. In a discrete variable, each possible outcome has a well-defined probability mass, which is why we speak of a probability mass function. The total probability, which as we know must be equal to 1, is divided over a certain finite or countable number of events and we can see it split into many small columns, much like in a histogram, if you’re familiar, and each column is a height that tells us how much probability mass there is at that point.

Density for Continuous Variables

For a continuous variable, the matter is not so simple, because the points, even in a small interval, are infinite, so the total probability equal to 1 must be distributed over an infinite number of values. That’s why we say that the mass at a single point, for a continuous random variable, is zero. However, the density can tell us what the probability is of falling within any small interval around the value of our interest. Simply put, from the density we cannot derive what the probability is of getting exactly 4.62, but for example what the probability is of getting a value between 4.61 and 4.63. Of course, the same value could also be obtained via the cumulative distribution function, taking the value calculated at 4.63 and subtracting the one calculated at 4.61 (try it with paper and pen to see why), but let’s say that with the density, when well defined, it is often easier to work. For now, we won’t add anything more.

Summary of Key Concepts

So, summarizing, we have talked about random variables and some functions relevant to describe their meaning. We mentioned the CDF, the density function, the quantile function, and the survival function. If something is not clear, re-listen to the recent episodes, and if you have any questions, write to me.

The Nature of Random Variables

Regarding random variables, we said in general that they are variables that can take on different values, each with a given probability, corresponding to different random events. These random events are those described by the probability space (Omega, F, P). To be more precise, we should say that a random variable X is in fact a function that transforms the original probability space (Omega, F, P) into a new probability space, usually indicated by the triplet (R, B, P_X) where R represents the real line or a subset thereof, indicating the values that the random variable X can take; B is the so-called Borel sigma-algebra, that is, the space of events of the random variable X, represented as the set of all open subsets of R; and finally P_X represents the probability distribution of X, as described, for example, by its cumulative distribution function.

The Loss Space and Final Remarks

But at the end of the day, it’s like saying that the original probability space (Omega, F, P) is in English, and for me being Italian, or for you being German, it’s better to work in the language that is more familiar, so we use the random variable X to translate it into another probability space in Italian or German, which will be the space (R, B, P_X). Without going into further details, for the moment, I hope the basic idea is clear. Just know that the new probability space (R, B, P_X) is the one in which we quantify risk, i.e. the damages and their likelihood. This is also why, in risk management, it is sometimes called the loss space, to distinguish it more quickly from the original probability space.

Support, Extremes, and Outliers

We have also talked about the support of X, and defined it as the set of values that our random variable can take. This set can be the entire real line, if X can vary continuously from negative infinity to positive infinity, or any subset of that line. Now, connected to the concept of support, there is a fundamental distinction that, alas, even many experts—or so-called experts—ignore, using as synonyms two terms that are not synonyms. The two terms are “extreme” and “outlier.” But I’ll stop here; we’ll talk about that again in a couple of weeks.

Book Recommendation and Closing

In the meantime, I recommend an interesting book: The Case Against Reality by Donald Hoffman. It’s a book that, starting from the cognitive sciences, seems to excellently support the idea of subjectivity as defined by Bruno de Finetti. We’ll discuss it further.

Episode 17: Extremes and Outliers

Recap: Describing Random Variables

In the last few weeks we’ve talked about random variables and the key functions we use to describe them: the CDF, the PDF, the quantile function, and the survival function. If those words don’t sound familiar, I strongly suggest you listen to what we’ve already covered. Very briefly, a random variable is one that can take different values, each with a given probability, depending on which random event occurs. Those random events are exactly the ones described by the probability space (Ω,F,P). We also touched—just touched—on the loss space (R,B,P_X ), but we said we’d save the deep dive for later.

Introducing Extremes vs Outliers

Today I want to tackle a distinction that’s crucial for anyone interested in risk and, as we’ll see, in the tails of distributions: the difference between an extreme and an outlier. Let’s start with outliers. “Outlier” is obviously an English word, and the English helps us define it. It comes from the verb to lie and the adverb out. So an outlier is something that lies outside something else. Outside what, exactly? Easy: outside the support.

Understanding Support

Remember the support? We defined it as the set of values a random variable can take. To be precise, it’s the set of values the variable can take with positive probability (in the discrete case) or where the density is positive (in the continuous case). In plain language, an outlier is a value our random variable cannot take. It’s impossible for us, because it’s not in the support. I can’t roll a 7 when I toss a single six-sided die. Seven simply isn’t in the support {1,2,3,4,5,6}.

What is an Extreme?

What about an extreme? An extreme is a value we observe rarely—sometimes very, very rarely, almost never—and it stands out because of its magnitude: very large (a maximum) or very small (a minimum). If we want to be picky, there are also rare events that are only rare, with no particular magnitude—situations where the outcomes of our random variable aren’t dimensional quantities like weight or height but rather things such as certain genetic mutations. We’ll come back to those when needed. For now, let’s stick to extremes and outliers. So: an extreme is rare—super-rare maybe—but possible. An outlier is NOT. And carve that NOT in your mind in letters the size of a house, please.

Examples of Extremes and Outliers

Example. A guy who is 2.3 metres tall (about 7 feet 6 inches for our American friends) is an extreme, not an outlier. How do we know? Well, we’ve seen people up to Robert Wadlow’s 2.72 meters (that is an astonishing 8 feet 11 inches), and we’ve seen Chandra Bahadur Dangi, who died at 76 and was 54.6 centimetres tall (just 1 foot 9 and a half inches tall). So 2.30 m is absolutely possible. In fact, medicine and biology tell us that, in theory, reaching 3 m—maybe a bit more—isn’t out of the question. At the other end, human physiology can cope with heights down to roughly 40 centimetres. If I found a data set with a girl listed at 6 m (about about 19 feet 8 inches), that would be an outlier because such a height is impossible for a human.

Why the Distinction Matters

Why does the distinction matter? Simple: statistically, an outlier is an error, and we treat it as such. If you can correct it, do; otherwise throw it out and don’t include it in your analysis. Example—still on height. If my data say Mary is 169 m tall, I can assume it’s a typo and fix it to 1.69 m. The corrected value looks perfectly legit, so it won’t skew our analysis. Or suppose I’m analysing a city’s temperature at 2 p.m.: Monday 23 °C (73 for friends in the US), Tuesday 72 °C (161 Fahrenheit), Wednesday 26 °C. At 72°C the townsfolk would be slow-cooked—great for a pot roast or pulled pork, not so great for humans. So 72 is clearly an error. What do we do? Maybe it’s another typo and we flip the digits to 27 °C. Or we use a common time-series trick: replace 72 with the average of its two closest neighbours, the one between 23 and 26—that’s 24.5 °C. In short, we either correct or remove an outlier.

Handling Extremes

An extreme, on the other hand, stays—and we study it. Try removing it and I won’t talk to you anymore. You have exactly two options: analyse it together with all other observations, or analyse it separately, only among extremes. But never, ever delete or “fix” it.

The Role of Domain Knowledge

Here’s the takeaway: there are statistical methods to flag extremes and, partly, outliers, but the best way to tell them apart is to know the phenomenon you’re studying. If you’re not an expert, ask one. It’s simple and amazingly effective. When I recently worked on a cancer report, I handled most of the quantitative analysis—but I’m not an oncologist, I’m a statistician. So I discussed every single result with domain experts. To me the number 3 is just a three; to an oncologist it can mean much more when we’re talking about certain tumour characteristics. They might tell me that my “3” makes no sense and push me to re-check everything. Humility—humility is key.

Contextual Extremes and Mislabeling

Notice, too, that what counts as extreme often depends on context. A 2.10 m (6 ft 10.7 in) man is extreme in the general population, but far less so if I focus on basketball players. Once again, you must know the phenomenon you’re analysing. Also beware: sometimes we label a datum an outlier rather than an extreme simply because we’re wearing blinders—maybe we’re using a wrong model that doesn’t allow certain values.

Model Assumptions and Historical Bias

Remember historical bias? If my model assumes the maximum loss of my portfolio can’t exceed the worst I’ve seen in the past five years, I’m manufacturing a flood of outliers all by myself. Don’t laugh—plenty of financial models do exactly that, unfortunately.

Illusory Extremes and Black Swans

The reverse can happen, too: what looks like an extreme is really just a temporary illusion caused by lack of data. In a few episodes, when we talk about black swans, we’ll see that sometimes a black swan is simply an event we thought impossible because we were using the wrong support. But let’s not get ahead of ourselves. Still, let me give you a quick example. Think of the pandemic. In April and May 2020, after Italy had been battered by the first wave, a bunch of learned nitwits in Europe produced graphs showing France, the Netherlands, Germany way down with far fewer cases and deaths than Italy. Some even called Italy an outlier, broadcasting their cluelessness loud and clear. How can you call “outlier” a value that’s been observed and verified? I get their reasoning: their cute little regression model fits almost every country except Italy, therefore Italy is an outlier. Right—so reality must adapt to their model, not the other way around. Yours truly—and a few others—kept repeating, in vain: it isn’t an outlier, and it probably isn’t even an extreme, as some (more sensible) people claimed. In the end, Italy turned out to be neither outlier (it couldn’t be) nor extreme. Other countries caught up and sometimes surpassed it.

Practical Risk Considerations

In risk management, downgrading an extreme to outlier—and maybe ignoring it—is fatal. No observation, especially an unusual one, should be dropped or “fixed” unless you really know what you’re doing. Distinguishing extremes and outliers is pretty easy when our variable X can take only a limited range of values and natural barriers exist. Height again: bounded between, say, 40 cm and 3 m. Values outside that interval are automatically outliers. Values that approach 40 cm or 3 m from inside are extremes. That doesn’t mean they shouldn’t be verified—unusual observations are always worth double-checking because of their high information value.

Infinite Ranges and Expert Input

Things get trickier when no finite upper or lower limit exists and X can wander toward plus or minus infinity. Then domain knowledge is crucial, so ask for help if you’re not an expert, and always keep open the possibility that you’re wrong. We’ll talk about all this in detail very soon.

Looking Ahead: Moments

In the next episode we’ll start discussing moments—the mean, the variance, and all their assorted friends. Moments constitute an important family of risk measures. They’re fundamental, but, as usual, you have to know how to use them. With moments we can say a lot about our random variable X; we just want to avoid talking nonsense. Even if you’ve studied statistics, I’m sure I can give you an alternative angle you’ll find interesting.

Episode 18: Introducing (simple) moments

Introduction to Risk Measures and Moments

In this episode we finally start talking about risk measures in a more technical way, and to do that we need to introduce statistical moments. Ready? Then let’s get going—but don’t forget to hit that like button and recommend the podcast to anyone who might enjoy it.

What Are Moments?

Moments are fundamental tools for understanding how a random variable behaves and, for us, for quantifying risk. Under the broad label “moments” we can gather a whole family of measures you probably already know, or have at least heard named before: the different kinds of means, from the plain arithmetic to the harmonic, variance, skewness, kurtosis, and many combinations or generalisations of these ideas.

Why Moments Matter in Risk

The point of moments is to answer key questions in risk management. What is the average risk of a given choice? Are we facing a concentrated risk or something more volatile? Does the phenomenon we’re studying have fat tails that can generate extreme events, or thin tails that rarely do so? Does it satisfy the catastrophe principle or not? Being able to say something informed about these questions—and many others—matters to us a great deal.

Limitations of Moments

Like every statistical tool, and every risk measure, moments come with usage limits. It helps to know, for instance, the difference between a theoretical and an empirical moment, and to be clear on what a particular moment can actually tell us—and, above all, what it cannot tell us, no matter how much we wish it would.

Remember the crushes we had as teenagers, when we dreamed that the person we liked would say certain magic words but they never did? Moments often behave exactly the same way. Life can be so unfair!

Learning Approach: Intuition First

Seriously now, we can discuss moments on several levels. We’ll begin with intuition, and only later introduce formal definitions if they’re really needed. As I’ve said before, I’m not going to spend several minutes chanting something like “the definite integral over the support of capital‑X of x f(x) dx.”

That is not what we need to grasp the expected value of a continuous variable. We can do it in other ways and save the formulas for the moments when they matter.

What Moments Let Us Do

Moments are quantitative measures that help us capture the behaviour of a random variable; in many situations they let us understand more clearly what a distribution function—or its corresponding density—implies, and they let us detect fundamental traits in the data we have on hand.

Theoretical Moments

The first step is to distinguish theoretical moments from empirical ones. The theoretical moments of a random variable can be computed—sometimes easily, sometimes with effort—using standard mathematical tools once we know explicitly one of the functions we have discussed in the previous episodes.

Suppose, for example, that I know (or someone tells me) that my X follows an exponential distribution; I can sit down with paper and pencil, solve an integral, and get whatever moment I’m interested in. If I’m lazy, I let some computer algebra system crank out the result while I sip coffee.

Yet theoretical moments can fail to exist for certain random variables: for some parameter settings of the density function, we simply can’t obtain a finite value for a given moment of interest.

Fortunately the list of random variables whose moments might be missing is limited—and that is a relief. The catch is that such variables, with the Pareto distribution as a classic example, are everywhere, and they often describe phenomena that worry us: financial losses (and profits) tend to follow a Pareto‑like pattern, as do damage figures produced by floods, earthquakes, and similar events.

The non‑existence of theoretical moments creates serious headaches in risk management—problems we’ll tackle after we introduce the main moments, in a few episodes from now.

Empirical Moments

Empirical moments, on the other hand, are computed directly from data. A random variable is a theoretical mathematical representation of a real‑world random phenomenon.

By collecting observations we can check whether a theoretical model we have in mind works, or craft one from the data themselves. When I compute the sample mean of my observations, their standard deviation, or anything of that sort, I am calculating empirical moments.

To a statistician, the data we collect are a often manifestation of an underlying random mechanism, described by some variable, and can be exploited to draw inferences about the phenomenon of interest.

Imagine I think human height is well approximated by a Gaussian random variable with a certain mean and standard deviation.

I can collect height data, check whether they look compatible with a Gaussian distribution—for instance, whether they are symmetric around their mean—and extract precious information: for men, an average of roughly 171 cm; for women, around 160 cm; both with a standard deviation near 6 cm.

But note that computing empirical moments does not require already having a theoretical model in mind.

A tricky and crucial point: empirical moments always exist, in the sense that if I have enough data I can always calculate them.

The hitch is that sometimes they make little sense—less inferential and predictive value than used toilet paper—namely when the corresponding theoretical moments fail to exist. We’ll return to this issue quite soon.

Types of Moments

Besides the theoretical/empirical split we also speak of simple (or raw) moments, central moments, and standardised moments.

Simple/raw moments describe how a random variable behaves on average, either in theory (based on a model and ignoring data) or empirically (looking at the data).

The best‑known example is the arithmetic mean, plain or weighted. Also in this group are the moments you get by taking the mean of transformations of the variable: the second simple moment, the third, the fourth, the fifth, and so on, plus the geometric and harmonic means.

Central moments measure how a variable behaves relative to its own mean. Does it condense tightly around the mean? Does it spread out, and if so, how? Does it disperse smoothly or form little clumps of values, which we’ll call clusters?

Variance belongs here, and so does the standard deviation (the square root of the variance), along with less famous but very useful quantities like the third central moment, sometimes called absolute skewness. Central moments can often be obtained conveniently by combining simple moments.

Standardised moments are nothing more than central moments divided by powers of the standard deviation. In statistics “standardisation” usually means dividing some quantity by another—often, but not always, the standard deviation or a function of it—to make results more comparable.

For now just take my word for it. The most famous standardised moments are skewness and kurtosis.

There are other families of moments, but it’s too early to tackle them; we don’t need to throw every steak on the grill at once.

How to Compute the Mean

What I want to teach—if you’ve never seen them—or refresh—if you already have—is how to compute simple moments. Let’s start with the mean, the one everyone knows.

Suppose five people I know weigh respectively 70, 75, 66, 86, and 59 kilograms. If I ask for the group’s average weight, you add the numbers, 70 + 75 + 66 + 86 + 59 = 356, and divide by 5, the number of people, arriving at 71.2 kg. That’s what we call the simple arithmetic mean.

In the simple arithmetic mean each observation carries the same weight (forgive the pun, given the example). All you do is add every observation and divide by how many observations you have.

But what if some values occur more often than others? Or—just as important for us—are more probable? In that case we switch to the weighted arithmetic mean.

Weighted Arithmetic Mean & Expected Value

Imagine our example changes a bit: there aren’t really five people, but seven—three of them weigh 70 kg. You could, of course, keep the old method and add seven numbers, or you can speed things up (especially when the numbers get large) by writing 70 × 3 + 75 + 66 + 86 + 59 and then dividing by 7.

That is equivalent to writing 70 × 3⁄7 + 75 × 1⁄7 + 66 × 1⁄7 + 86 × 1⁄7 + 59 × 1⁄7, i.e., multiplying each observation by its weight—its relative importance, its frequency—within the group. The value we obtain is approximately 70.86 kg.

Another example: a one‑year investment, based on supposedly reliable historical data (and if any financial advisor praises “reliable historical data,” stand up and walk out), can yield +40 % with probability 5 %, +10 % with probability 55 %, −5 % with probability 30 %, and −10 % with probability 10 %. I ask you: what average return should we expect?

First, you verify the probabilities sum to 1, i.e., 100 %. If they didn’t, we’d be dealing with uncertainty, not mere risk; in that case we’d have to put the pen down and ask for clarification or go away. Let’s see: 5 % + 55 % + 30 % + 10 % = 100 %. Good.

Having verified that, we do nothing more than add the returns, weighting each one by its probability: 0.40 × 0.05 + 0.10 × 0.55 + (–0.05 × 0.30) + (–0.10 × 0.10). Multiply, add, and you get 0.05, an expected return of +5 %.

Note the adjective expected, which indicates something that hasn’t happened yet but that, on average, we might look for in the future—if the past is reliably informative, meaning it contains enough useful information to make sensible forecasts.

In probability theory a weighted arithmetic mean whose weights are probabilities is called the expected value.

More precisely, expected value is the official name for the theoretical weighted arithmetic mean, where the probabilities come from a specified model or assumed distribution, while in the empirical realm we usually say simply mean.

When you say mean and don’t add an adjective (geometric, harmonic, etc.), people generally assume you’re talking about the arithmetic weighted mean.

Bernoulli and Higher-Order Simple Moments

A classic theoretical illustration is the Bernoulli random variable, which equals 1 with probability p and 0 with probability 1 – p. Its theoretical weighted mean—the expected value—is 1 × p + 0 × (1 – p) = p.

Here’s a neat surprise: in a standard Bernoulli random variable the quantity p isn’t just the probability of success (that is, of observing a 1); it’s also the variable’s expected value.

If you’re more mathematically inclined, this will look trivial—we just took the expectation of an indicator of an event, so we got the event’s probability—but for everyone else it’s worth remembering.

The expected value, or weighted arithmetic mean if you prefer, is the simple moment of order 1. Written symbolically, the expected value of a random variable X appears as E[X], where E stands for expectation or expected value.

Higher-Order Moments: Squaring, Cubing, and More

Simple moments of higher order—second, third, fourth, fifth, and so on—are obtained by raising each outcome to the corresponding power and then taking the weighted arithmetic mean of these new values.

Let’s return to our investment. The possible returns are +40 % with probability 5 %, +10 % with 55 %, −5 % with 30 %, and −10 % with 10 %. Take each outcome and square it: 0.40 squared is 0.16, 0.10 squared is 0.01, –0.05 squared becomes 0.0025 (and the minus sign vanishes), and –0.10 squared is 0.01.

Now multiply each squared value by its original probability—super important: the probabilities stay untouched: 0.16 × 0.05 + 0.01 × 0.55 + 0.0025 × 0.30 + 0.01 × 0.10. Summing these we get 0.01525, about 1.5 %. This is the simple moment of order 2, also called the second moment.

As you can see, it is always positive: every outcome, even the negative ones, is squared, and we multiply only non‑negative numbers (that is, squared outcomes and probabilities).

For the simple moment of order 3—or third moment—you would cube the outcomes before applying the weighted mean; for order 4 you’d raise them to the fourth power, and so on. The corresponding probabilities always remain exactly what they were—never, ever raise probabilities to a power.

Higher‑order simple moments are very helpful in revealing notable properties of a random variable—or, in the empirical setting, of a data set. They also come in handy for quickly computing certain central and standardised moments, as we shall see.

What’s Next?

In the next episode we’ll make a little trip back to the first thirty years of the twentieth century. In papers by leading probabilists like Kolmogorov, Nagumo, de Finetti, and Chisini we’ll find a fascinating and useful way of looking at the mean, one that will provide valuable tools for our risk analysis.