3 of 68

Acting Under Uncertainty

Agents may need to handle uncertainty, whether due to partial observability, nondeterminism, or a combination of the two. An agent may never know for certain what state it’s in or where it will end up after a sequence of actions.
Belief state-a representation of the set of all possible world states that it might be in and generating a contingency plan that handles every possible eventuality that its sensors may report during execution.
Despite its many virtues, however, this approach has significant drawbacks when taken literally as a recipe for creating agent programs:

• When interpreting partial sensor information, a logical agent must consider every logically possible explanation for the observations, no matter how unlikely. This leads to impossible large and complex belief-state representations.

4 of 68

Cont…

A correct contingent plan that handles every eventuality can grow arbitrarily large and must consider arbitrarily unlikely contingencies.
Sometimes there is no plan that is guaranteed to achieve the goalyet the agent must act. It must have some way to compare the merits of plans that are not guaranteed.
Suppose, for example, that an automated taxi!automated has the goal of delivering a passenger to the airport on time.
The agent forms a plan, A90, that involves leaving home 90 minutes before the flight departs and driving at a reasonable speed.
Even though the airport is only about 5 miles away, a logical taxi agent will not be able to conclude with certainty that “Plan A90 will get us to the airport in time.”

5 of 68

Cont…

Instead, it reaches the weaker conclusion “Plan A90 will get us to the airport in time, as long as the car doesn’t break down or run out of gas, and I don’t get into an accident, and there are no accidents on the bridge, and the plane doesn’t leave early, and no meteorite hits the car, and . . . .”
None of these conditions can be deduced for sure, so the plan’s success cannot be inferred. This is the qualification problem.
Nonetheless, in some sense A90 is in fact the right thing to do. What do we mean by this? we mean that out of all the plans that could be executed, A90 is expected to maximize the agent’s performance measure.
The performance measure includes getting to the airport in time for the flight, avoiding a long, unproductive wait at the airport, and avoiding speeding tickets along the way.

6 of 68

Cont…

The agent’s knowledge cannot guarantee any of these outcomes for A90, but it can provide some degree of belief that they will be achieved.
Other plans, such as A180, might increase the agent’s belief that it will get to the airport on time, but also increase the likelihood of a long wait. The right thing to do the rational decision therefore depends on both the relative importance of various goals and the likelihood that, and degree to which, they will be achieved.
The remainder of this section hones these ideas, in preparation for the development of the general theories of uncertain reasoning and rational decisions.

7 of 68

Summarizing uncertainty

Let’s consider an example of uncertain reasoning: diagnosing a dental patient’s toothache.
Diagnosis—whether for medicine, automobile repair, or whatever—almost always involves uncertainty.
Let us try to write rules for dental diagnosis using propositional logic, so that we can see how the logical approach breaks down. Consider the following simple rule:
The problem is that this rule is wrong. Not all patients with toothaches have cavities; some of them have gum disease, an abscess, or one of several other problems:

Unfortunately, in order to make the rule true, we have to add an almost unlimited list of possible problems. We could try turning the rule into a causal rule:

8 of 68

Cont…

But this rule is not right either; not all cavities cause pain. The only way to fix the rule is to make it logically exhaustive: to augment the left-hand side with all the qualifications required for a cavity to cause a toothache. Trying to use logic to cope with a domain like medical diagnosis thus fails for three main reasons:
Laziness: It is too much work to list the complete set of antecedents or consequents needed to ensure an exceptionless rule and too hard to use such rules.
Theoretical ignorance: Medical science has no complete theory for the domain.
Practical ignorance: Even if we know all the rules, we might be uncertain about a particular patient because not all the necessary tests have been or can be run.

9 of 68

Cont…

The connection between toothaches and cavities is just not a logical consequence in either direction. This is typical of the medical domain, as well as most other judgmental domains: law, business, design, automobile repair, gardening, dating, and so on.
The agent’s knowledge can at best provide only a degree of belief in the relevant sentences.
Our main tool for dealing with degrees of belief is probability theory. In the terminology of Section 8.1, the ontological commitments of logic and probability theory are the same that the world is composed of facts that do or do not hold in any particular case but the epistemological commitments are different: a logical agent believes each sentence to be true or false or has no opinion, whereas a probabilistic agent may have a numerical degree of belief between 0 (for sentences that are certainly false) and 1 (certainly true).

10 of 68

Cont…

Probability provides a way of summarizing the uncertainty that comes from our laziness and ignorance, thereby solving the qualification problem.
We might not know for sure what afflicts a particular patient, but we believe that there is, say, an 80% chance that is, a probability of 0.8 that the patient who has a toothache has a cavity.
That is, we expect that out of all the situations that are indistinguishable from the current situation as far as our knowledge goes, the patient will have a cavity in 80% of them.
This belief could be derived from statistical data 80% of the toothache patients seen so far have had cavities or from some general dental knowledge, or from a combination of evidence sources.
One confusing point is that at the time of our diagnosis, there is no uncertainty in the actual world: the patient either has a cavity or doesn’t. So what does it mean to say the probability of a cavity is 0.8? Shouldn’t it be either 0 or 1?

11 of 68

Cont…

The answer is that probability statements are made with respect to a knowledge state, not with respect to the real world.
We say “The probability that the patient has a cavity, given that she has a toothache, is 0.8.”
If we later learn that the patient has a history of gum disease, we can make a different statement: “The probability that the patient has a cavity, given that she has a toothache and a history of gum disease, is 0.4.”
If we gather further conclusive evidence against a cavity, we can say “The probability that the patient has a cavity, given all we now know, is almost 0.” Note that these statements do not contradict each other; each is a separate assertion about a different knowledge state.

12 of 68

Uncertainty and Rational Decisions

Consider again the A90 plan for getting to the airport. Suppose it gives us a 97% chance of catching our flight. Does this mean it is a rational choice? Not necessarily: there might be other plans, such as A180, with higher probabilities.
If it is vital not to miss the flight, then it is worth risking the longer wait at the airport. What about A1440, a plan that involves leaving home 24 hours in advance?
In most circumstances, this is not a good choice, because although it almost guarantees getting there on time, it involves an intolerable wait not to mention a possibly unpleasant diet of airport food.
To make such choices, an agent must first have preferences between the different possible outcomes of the various plans. An outcome is a completely specified state, including such factors as whether the agent arrives on time and the length of the wait at the airport. We use utility theory to represent and reason with preferences.

13 of 68

Cont…

Utility theory says that every state has a degree of usefulness, or utility, to an agent and that the agent will prefer states with higher utility.
The utility of a state is relative to an agent. For example, the utility of a state in which White has checkmated Black in a game of chess is obviously high for the agent playing White, but low for the agent playing Black.
But we can’t go strictly by the scores of 1, 1/2, and 0 that are dictated by the rules of tournament chess some players (including the authors) might be thrilled with a draw against the world champion, whereas other players might not. There is no accounting for taste or preferences: you might think that an agent who prefers jalapeno bubble-gum ice cream to chocolate chocolate chip is odd or even misguided, but you could not say the agent is irrational.

14 of 68

Cont…

A utility function can account for any set of preferences quirky or typical, noble or perverse. Note that utilities can account for altruism, simply by including the welfare of others as one of the factors.
Preferences, as expressed by utilities, are combined with probabilities in the general theory of rational decisions called decision theory:

The fundamental idea of decision theory is that an agent is rational if and only if it chooses the action that yields the highest expected utility, averaged over all the possible outcomes of the action. This is called the principle of maximum expected utility (MEU).
Note that “expected” might seem like a vague, hypothetical term, but as it is used here it has a precise meaning: it means the “average,” or “statistical mean” of the outcomes, weighted by the probability of the outcome.

15 of 68

Cont…

Figure 13.1 sketches the structure of an agent that uses decision theory to select actions.

16 of 68

Basic Probability Notation

For our agent to represent and use probabilistic information, we need a formal language.
The language of probability theory has traditionally been informal, written by human mathematicians to other human mathematicians.
Appendix A includes a standard introduction to elementary probability theory; here, we take an approach more suited to the needs of AI and more consistent with the concepts of formal logic.
What probabilities are about: Like logical assertions, probabilistic assertions are about possible worlds. Whereas logical assertions say which possible worlds are strictly ruled out (all those in which the assertion is false), probabilistic assertions talk about how probable the various worlds are. In probability theory, the set of all possible worlds is called the sample space.

17 of 68

Cont…

The possible worlds are mutually exclusive and exhaustive two possible worlds cannot both be the case, and one possible world must be the case.
For example, if we are about to roll two (distinguishable) dice, there are 36 possible worlds to consider: (1,1), (1,2), . . ., (6,6).
The Greek letter Ω (uppercase omega) is used to refer to the sample space, and ω (lowercase omega) refers to elements of the space, that is, particular possible worlds.
A fully specified probability model associates a numerical probability P(ω) with each possible world.1 The basic axioms of probability theory say that every possible world has a probability between 0 and 1 and that the total probability of the set of possible worlds is 1:

18 of 68

Cont…

For example, if we assume that each die is fair and the rolls don’t interfere with each other, then each of the possible worlds (1,1), (1,2), . . ., (6,6) has probability 1/36.
On the other hand, if the dice conspire to produce the same number, then the worlds (1,1), (2,2), (3,3), etc., might have higher probabilities, leaving the others with lower probabilities.
Probabilistic assertions and queries are not usually about particular possible worlds, but about sets of them.
For example, we might be interested in the cases where the two dice add up to 11, the cases where doubles are rolled, and so on. In probability theory, these sets are called events.
In AI, the sets are always described by propositions in a formal language.
For each proposition, the corresponding set contains just those possible worlds in which the proposition holds.

19 of 68

Cont…

The probability associated with a proposition is defined to be the sum of the probabilities of the worlds in which it holds:

For example, when rolling fair dice, we have P(Total =11) = P((5, 6)) + P((6, 5)) = 1/36 + 1/36 = 1/18. Note that probability theory does not require complete knowledge of the probabilities of each possible world.
For example, if we believe the dice conspire to produce the same number, we might assert that P(doubles) = 1/4 without knowing whether the dice prefer double 6 to double 2.
Just as with logical assertions, this assertion constrains the underlying probability model without fully determining it.

20 of 68

Cont…

Probabilities such as P(Total =11) and P(doubles) are called unconditional or prior probabilities (and sometimes just “priors” for short); they refer to degrees of belief in propositions in the absence of any other information.
Most of the time, however, we have some information, usually called evidence, that has already been revealed.
For example, the first die may already be showing a 5 and we are waiting with bated breath for the other one to stop spinning.
In that case, we are interested not in the unconditional probability of rolling doubles, but the conditional or posterior probability (or just “posterior” for short) of rolling doubles given that the first die is a 5.
This probability is written P(doubles | Die1 =5), where the “ | ” is pronounced “given.” Similarly, if I am going to the dentist for a regular checkup, the probability P(cavity)=0.2 might be of interest;

21 of 68

Cont…

But if I go to the dentist because I have a toothache, it’s P(cavity | toothache)=0.6 that matters. Note that the precedence of “ | ” is such that any expression of the form P(. . . | . . .) always means P((. . .)|(. . .)).
It is important to understand that P(cavity)=0.2 is still valid after toothache is observed; it just isn’t especially useful. When making decisions, an agent needs to condition on all the evidence it has observed.
It is also important to understand the difference between conditioning and logical implication. The assertion that P(cavity | toothache)=0.6 does not mean “Whenever toothache is true, conclude that cavity is true with probability 0.6” rather it means “Whenever toothache is true and we have no further information, conclude that cavity is true with probability 0.6.”

22 of 68

Cont…

The extra condition is important; for example, if we had the further information that the dentist found no cavities, we definitely would not want to conclude that cavity is true with probability 0.6; instead we need to use P(cavity|toothache ∧ ￢cavity)=0.

23 of 68

Cont…

The definition of conditional probability, Equation (13.3), can be written in a different form called the product rule:

P(a ∧ b) = P(a | b)P(b) ,

The product rule is perhaps easier to remember: it comes from the fact that, for a and b to be true, we need b to be true, and we also need a to be true given b.

24 of 68

The language of propositions in probability assertion

Variables in probability theory are called random variables and their names begin with an uppercase letter. Thus, in the dice example, Total and Die1 are random variables.
Every random variable has a domain—the set of possible values it can take on. The domain of Total for two dice is the set {2, . . . , 12} and the domain of Die1 is {1, . . . , 6}.
A Boolean random variable has the domain {true, false} (notice that values are always lowercase);
For example, the proposition that doubles are rolled can be written as Doubles =true. By convention, propositions of the form A=true are abbreviated simply as a, while A=false is abbreviated as ￢a.
As in CSPs, domains can be sets of arbitrary tokens; we might choose the domain of Age to be {juvenile, teen, adult} and the domain of Weather might be {sunny, rain, cloudy, snow}.

25 of 68

Cont…

When no ambiguity is possible, it is common to use a value by itself to stand for the proposition that a particular variable has that value; thus, sunny can stand for Weather =sunny.
The preceding examples all have finite domains. Variables can have infinite domains, too either discrete (like the integers) or continuous (like the reals).
For any variable with an ordered domain, inequalities are also allowed, such as
Finally, we can combine these sorts of elementary propositions (including the abbreviated forms for Boolean variables) by using the connectives of propositional logic.
For example, we can express “The probability that the patient has a cavity, given that she is a teenager with no toothache, is 0.1” as follows:

26 of 68

Cont…

where the bold P indicates that the result is a vector of numbers, and where we assume a predefined ordering sunny, rain, cloudy, snow on the domain of Weather .
We say that the P statement defines a probability distribution for the random variable Weather. The P notation is also used for conditional distributions: P(X | Y ) gives the values of P(X =xi | Y =yj) for each possible i, j pair.

27 of 68

Cont…

For continuous variables, it is not possible to write out the entire distribution as a vector, because there are infinitely many values. Instead, we can define the probability that a random variable takes on some value x as a parameterized function of x.
For example, the sentence

expresses the belief that the temperature at noon is distributed uniformly between 18 and 26 degrees Celsius. We call this a probability density function.

28 of 68

Cont…

Probability density functions (sometimes called pdfs) differ in meaning from discrete distributions. Saying that the probability density is uniform from 18C to 26C means that there is a 100% chance that the temperature will fall somewhere in that 8C-wide region and a 50% chance that it will fall in any 4C-wide region, and so on.
We write the probability density for a continuous random variable X at value x as P(X =x) or just P(x); the intuitive definition of P(x) is the probability that X falls within an arbitrarily small region beginning at x, divided by the width of the region:

29 of 68

Cont…

where C stands for centigrade (not for a constant). In P(NoonTemp =20.18C)= 1C , note that 18C is not a probability, it is a probability density.
The probability that NoonTemp is exactly 20.18C is zero, because 20.18C is a region of width 0. Some authors use different symbols for discrete distributions and density functions; we use P in both cases, since confusion seldom arises and the equations are usually identical. Note that probabilities are unitless numbers, whereas density functions are measured with a unit, in this case reciprocal degrees.
In addition to distributions on single variables, we need notation for distributions on multiple variables.
Commas are used for this. For example, P(Weather , Cavity) denotes the probabilities of all combinations of the values of Weather and Cavity.
This is a 4×2 table of probabilities called the joint probability distribution of Weather and Cavity.

30 of 68

Cont…

We can also mix variables with and without values; P(sunny, Cavity) would be a two-element vector giving the probabilities of a sunny day with a cavity and a sunny day with no cavity.
The P notation makes certain expressions much more concise than they might otherwise be.
For example, the product rules for all possible values of Weather and Cavity can be written as a single equation:

instead of as these 4×2=8 equations (using abbreviations W and C):

31 of 68

Cont…

As a degenerate case, P(sunny, cavity) has no variables and thus is a one-element vector that is the probability of a sunny day with a cavity, which could also be written as P(sunny, cavity) or P(sunny ∧ cavity).
We will sometimes use P notation to derive results about individual P values, and when we say “P(sunny)=0.6” it is really an abbreviation for “P(sunny) is the one-element vector 0.6, which means that P(sunny)=0.6.”

32 of 68

Cont…

From the preceding definition of possible worlds, it follows that a probability model is completely determined by the joint distribution for all of the random variables the so-called full joint probability distribution.
For example, if the variables are Cavity, Toothache, and Weather , then the full joint distribution is given by P(Cavity,Toothache,Weather ).
This joint distribution can be represented as a 2×2×4 table with 16 entries.
Because every proposition’s probability is a sum over possible worlds, a full joint distribution suffices, in principle, for calculating the probability of any proposition.

33 of 68

Probability axioms and their reasonableness

The basic axioms of probability (Equations (13.1) and (13.2)) imply certain relationships among the degrees of belief that can be accorded to logically related propositions.
For example, we can derive the familiar relationship between the probability of a proposition and the probability of its negation:

34 of 68

Cont…

We can also derive the well-known formula for the probability of a disjunction, sometimes called the inclusion–exclusion principle:

This rule is easily remembered by noting that the cases where a holds, together with the cases where b holds, certainly cover all the cases where a ∨ b holds; but summing the two sets of cases counts their intersection twice, so we need to subtract P(a ∧ b).
Equations (13.1) and (13.4) are often called Kolmogorov’s axioms in honor of the Russian mathematician Andrei Kolmogorov, who showed how to build up the rest of probability theory from this simple foundation and how to handle the difficulties caused by continuous variables.

35 of 68

Cont...

If an agent has some degree of belief in a proposition a, then the agent should be able to state odds at which it is indifferent to a bet for or against a.
Think of it as a game between two agents: Agent 1 states, “my degree of belief in event a is 0.4.” Agent 2 is then free to choose whether to wager for or against a at stakes that are consistent with the stated degree of belief.
That is, Agent 2 could choose to accept Agent 1’s bet that a will occur, offering $6 against Agent 1’s $4. Or Agent 2 could accept Agent 1’s bet that ￢a will occur, offering $4 against Agent 1’s $6.
Then we observe the outcome of a, and whoever is right collects the money.
If an agent’s degrees of belief do not accurately reflect the world, then you would expect that it would tend to lose money over the long run to an opposing agent whose beliefs more accurately reflect the state of the world.

36 of 68

Cont…

But de Finetti proved something much stronger: If Agent 1 expresses a set of degrees of belief that violate the axioms of probability theory then there is a combination of bets by Agent 2 that guarantees that Agent 1 will lose money every time.
For example, suppose that Agent 1 has the set of degrees of belief from Equation (13.5). Figure 13.2 shows that if Agent 2 chooses to bet $4 on a, $3 on b, and $2 on ￢(a ∨ b), then Agent 1 always loses money, regardless of the outcomes for a and b.
De Finetti’s theorem implies that no rational agent can have beliefs that violate the axioms of probability.

38 of 68

INFERENCE USING FULL JOINT DISTRIBUTIONS

probabilistic inference—that is, the computation of posterior probabilities for query propositions given observed evidence.
We use the full joint distribution as the “knowledge base” from which answers to all questions may be derived.
Along the way we also introduce several useful techniques for manipulating equations involving probabilities.
We begin with a simple example: a domain consisting of just the three Boolean variables Toothache, Cavity, and Catch (the dentist’s nasty steel probe catches in my tooth).
The full joint distribution is a 2×2×2 table as shown in Figure 13.3.

39 of 68

Cont…

41 of 68

Cont…

43 of 68

Cont…

In other words, we can calculate P(Cavity | toothache) even if we don’t know the value of P(toothache)! We temporarily forget about the factor 1/P(toothache ) and add up the values for cavity and ￢cavity, getting 0.12 and 0.08.
Those are the correct relative proportions, but they don’t sum to 1, so we normalize them by dividing each one by 0.12 + 0.08, getting the true probabilities of 0.6 and 0.4.
Normalization turns out to be a useful shortcut in many probability calculations, both to make the computation easier and to allow us to proceed when some probability assessment (such as P(toothache)) is not available.
From the example, we can extract a general inference procedure. We begin with the case in which the query involves a single variable, X (Cavity in the example).

44 of 68

Cont…

Let E be the list of evidence variables (just Toothache in the example), let e be the list of observed values for them, and let Y be the remaining unobserved variables (just Catch in the example). The query is P(X | e) and can be evaluated as

45 of 68

INDEPENDENCE

48 of 68

BAYES’ RULE AND ITS USE

The product rule can actually be written in two forms:

49 of 68

Applying Bayes’ rule: The simple case

we perceive as evidence the effect of some unknown cause and we would like to determine that cause. In that case, Bayes’ rule becomes

The conditional probability P(effect | cause) quantifies the relationship in the causal direction, whereas P(cause | effect) describes the diagnostic direction.
In a task such as medical diagnosis, we often have conditional probabilities on causal relationships (that is, the doctor knows P(symptoms | disease)) and want to derive a diagnosis, P(disease | symptoms).
For example, a doctor knows that the disease meningitis causes the patient to have a stiff neck, say, 70% of the time.
The doctor also knows some unconditional facts: the prior probability that a patient has meningitis is 1/50,000, and the prior probability that any patient has a stiff neck is 1%.

50 of 68

Cont…

Letting s be the proposition that the patient has a stiff neck and m be the proposition that the patient has meningitis, we have

That is, we expect less than 1 in 700 patients with a stiff neck to have meningitis.
Notice that even though a stiff neck is quite strongly indicated by meningitis (with probability 0.7), the probability of meningitis in the patient remains small.
This is because the prior probability of stiff necks is much higher than that of meningitis.

51 of 68

Cont…

52 of 68

Using Bayes’ rule: Combining evidence

We have seen that Bayes’ rule can be useful for answering probabilistic queries conditioned on one piece of evidence for example, the stiff neck.
In particular, we have argued that probabilistic information is often available in the form P(effect | cause).
What happens when we have two or more pieces of evidence?
For example, what can a dentist conclude if her nasty steel probe catches in the aching tooth of a patient?
If we know the full joint distribution (Figure 13.3), we can read off the answer:

53 of 68

Cont…

For this reformulation to work, we need to know the conditional probabilities of the conjunction toothache ∧catch for each value of Cavity.
That might be feasible for just two evidence variables, but again it does not scale up. If there are n possible evidence variables (X rays, diet, oral hygiene, etc.), then there are 2n possible combinations of observed values for which we would need to know conditional probabilities.
We might as well go back to using the full joint distribution. This is what first led researchers away from probability theory toward approximate methods for evidence combination that, while giving incorrect answers, require fewer numbers to give any answer at all.

56 of 68

The Wumpus World in Artificial intelligence

The Wumpus world is a simple world example to illustrate the worth of a knowledge-based agent and to represent knowledge representation. It was inspired by a video game Hunt the Wumpus by Gregory Yob in 1973.
The Wumpus world is a cave which has 4/4 rooms connected with passageways. So there are total 16 rooms which are connected with each other.
We have a knowledge-based agent who will go forward in this world. The cave has a room with a beast which is called Wumpus, who eats anyone who enters the room.
The Wumpus can be shot by the agent, but the agent has a single arrow. In the Wumpus world, there are some Pits rooms which are bottomless, and if agent falls in Pits, then he will be stuck there forever.
The exciting thing with this cave is that in one room there is a possibility of finding a heap of gold. So the agent goal is to find the gold and climb out the cave without fallen into Pits or eaten by Wumpus.
The agent will get a reward if he comes out with gold, and he will get a penalty if eaten by Wumpus or falls in the pit.

57 of 68

Cont…

Following is a sample diagram for representing the Wumpus world. It is showing some rooms with Pits, one room with Wumpus and one agent at (1, 1) square location of the world.

58 of 68

THE WUMPUS WORLD REVISITED

We can combine of the ideas in this chapter to solve probabilistic reasoning problems in the wumpus world.
Uncertainty arises in the wumpus world because the agent’s sensors give only partial information about the world.
For example, Figure 13.5 shows a situation in which each of the three reachable squares—[1,3], [2,2], and [3,1]—might contain a pit.
Pure logical inference can conclude nothing about which square is most likely to be safe, so a logical agent might have to choose randomly.
We will see that a probabilistic agent can do much better than the logical agent.
Our aim is to calculate the probability that each of the three squares contains a pit. The relevant properties of the wumpus world are that
(1) a pit causes breezes in all neighboring squares, and
(2) each square other than [1,1] contains a pit with probability 0.2.

59 of 68

Cont…

The first step is to identify the set of random variables we need:
As in the propositional logic case, we want one Boolean variable Pij for each square, which is true iff square [i, j] actually contains a pit.
We also have Boolean variables Bij that are true iff square [i, j] is breezy; we include these variables only for the observed squares—in this case, [1,1], [1,2], and [2,1].

62 of 68

Cont…

To answer this query, we can follow the standard approach of Equation (13.9), namely, summing over entries from the full joint distribution.
Let Unknown be the set of Pi,j vari ables for squares other than the Known squares and the query square [1,3]. Then, by Equation (13.9), we have

63 of 68

Cont…

Surely, one might ask, aren’t the other squares irrelevant? How could [4,4] affect whether [1,3] has a pit? Indeed, this intuition is correct.
Let Frontier be the pit variables (other than the query variable) that are adjacent to visited squares, in this case just [2,2] and [3,1].
Also, let Other be the pit variables for the other unknown squares; in this case, there are 10 other squares, as shown in Figure 13.5(b).
The key insight is that the observed breezes are conditionally independent of the other variables, given the known, frontier, and query variables.
To use the insight, we manipulate the query formula into a form in which the breezes are conditioned on all the other variables, and then we apply conditional independence:

1 of 68

2 of 68

3 of 68

4 of 68

5 of 68

6 of 68

7 of 68

8 of 68

9 of 68

10 of 68

11 of 68

12 of 68

13 of 68

14 of 68

15 of 68

16 of 68

17 of 68

18 of 68

19 of 68

20 of 68

21 of 68

22 of 68

23 of 68

24 of 68

25 of 68

26 of 68

27 of 68

28 of 68

29 of 68

30 of 68

31 of 68

32 of 68

33 of 68

34 of 68

35 of 68

36 of 68

37 of 68

38 of 68

39 of 68

40 of 68

41 of 68

42 of 68

43 of 68

44 of 68

45 of 68

46 of 68

47 of 68

48 of 68

49 of 68

50 of 68

51 of 68

52 of 68

53 of 68

54 of 68

55 of 68

56 of 68

57 of 68

58 of 68

59 of 68

60 of 68

61 of 68

62 of 68

63 of 68

64 of 68

65 of 68

66 of 68

67 of 68

68 of 68