Let’s say we want to predict the outcome of an (American) football game.

Building a Model

To accomplish this, let’s build a simple model of what we know about how football games are decided:

Essentially this model says “the probability of winning (W) depends on the quarterback’s performance (QBP) which depends on his skill (QBS) and the opponent’s defense (D). Also W depends on having homefield advantage (H).”

Let’s assume all the variables are binary so that e.g. W = 1 means a win; QBP = 0 means the quarterback performed poorly, etc.

Finally we have to specify the nature of the dependencies in the model. We will do this in code:

(def football-model

  [{:vars #{:QBS}                   

    :CPT {{:QBS 0} 0.8 {:QBS 1} 0.2}}   ;1  

   {:vars #{:D}                      

    :CPT {{:D 0} 0.6 {:D 1} 0.4}}

   {:vars #{:H}

    :CPT {{:H 0} 0.5 {:H 1} 0.5}}

   {:vars #{:QBP :D :QBS}

    :CPT {{:QBS 0 :D 0 :QBP 0} 0.5      ;2

          {:QBS 0 :D 0 :QBP 1} 0.5

          {:QBS 0 :D 1 :QBP 0} 0.9

          {:QBS 0 :D 1 :QBP 1} 0.1

          {:QBS 1 :D 0 :QBP 0} 0.1

          {:QBS 1 :D 0 :QBP 1} 0.9

          {:QBS 1 :D 1 :QBP 0} 0.5

          {:QBS 1 :D 1 :QBP 1} 0.6}}

   {:vars #{:W :QBP :H}

    :CPT {{:QBP 0 :H 0 :W 0} 0.8

          {:QBP 0 :H 0 :W 1} 0.2

          {:QBP 0 :H 1 :W 0} 0.75       ;3

          {:QBP 0 :H 1 :W 1} 0.25

          {:QBP 1 :H 0 :W 0} 0.4

          {:QBP 1 :H 0 :W 1} 0.6

          {:QBP 1 :H 1 :W 0} 0.3

          {:QBP 1 :H 1 :W 1} 0.7}}])

;1 Implies p(QBS = 1) is 0.2

;2 p(QBP=1|D=0,QBS=0)=0.5

;3 p(W=1|QBP=0,H=1)=0.25

The probability tables for the model are specified in {:vars #{...} :CPT {...}} format where the :vars entry is for later programming convenience and the :CPT specifies the (conditional) probability of the target variable given the other variables. The highlighted comments illustrate how to interpret the numbers.

Doing Inference

We want to answer queries such as:

  1. If I have a high skilled QB facing a low skilled D but do not have homefield advantage, what is the probability I win?
  2. If I lost and did not have homefield advantage, what is the probability my quarterback is highly skilled?

For our simple model we could easily directly calculate answers to these two queries. However for more complicated models this may not be possible or feasible, and we would want to use approximate inference instead.

The motivation for the approximate inference technique we are going to use is the following: the model we just defined gives a probability distribution over possible states of the world (e.g. {:QBS 1 :D 0 :QBP 0 :H 1 :W 1}). If we had a finite collection of these states sampled from the model’s distribution, we could calculate answers to our queries using the frequency counts in the samples. For example the solution to query 1 above could be approximated by counting in our sampled world-states the relative frequency of observing a win versus a loss where we have a high skilled QB facing a low skill D and without homefield advantage. As the number of samples we have grows, using this technique becomes more accurate.

Generating the required samples for this approximation technique is nontrivial for many distributions. A sampling technique that is broadly applicable with good performance is known as Gibbs sampling.

Gibbs Sampling

The algorithm is simple:

  1. Start at an arbitrary world-state s
  2. For every variable v in the world-state s
  1. Update v (and hence s) by sampling from , i.e. the conditional probability of v given all other variables in the state
  1. Collect samples s

Intuitively the algorithm stochastically steps each variable towards its highest-probability state given the other states. Eventually, the world-states wander around in high-probability regions of the state space with proportionally high frequency and low-probability regions with proportionally low frequency and the samples can thus be used for estimating the probability model.

How do we compute the ’s? Using the definition of conditional probability:

Implementation and Results

Here is a quick implementation. (Note the dependency on incanter’s sample-multinomial):

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;HELPER FNS;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defn product [s] (reduce * s))

(defn normalize

" Totals all counts returns proportions, e.g.

 

  In: {:a 2 :b 8}

  Out:{:a 0.2 :b 0.8}

"

  [d]

  (let [sum (reduce + (vals d))]

    (reduce (fn [m [k v]] (assoc m k (float (/ v sum)))) {} d)))

(defn factored-query

" For every factor in a bayes net looks up

  the factor's contribution to the prob of

  the query q and returns the product.

"

  [bn q]

  (product

    (for [{:keys [vars CPT]} (:factors bn)]

      (get CPT (select-keys q vars)))))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CORE FNS;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defn var-update

" Helper function to gibbs-sample. Calculates

  p(var|other vars) and returns a new state

  with the var updated with a sample from

  p(var|other vars).

"

  [model state var]

  (let [p (normalize

            (reduce

              (fn [m v]

                (assoc m v (factored-query model

                             (assoc state var v))))

              {} (get-in model [:vars var])))]

    (assoc state var

      (first (sample-multinomial 1 :probs (vals p)

                                   :categories (keys p))))))

(defn gibbs-sample

" Starts at a random state assignment then

  for iters times updates each non-evidence

  variable and collects the full sampled states

  into a vector

"  

  [model evidence iters]

  (let [free-vars (keys (apply dissoc (:vars model)

                                      (keys evidence)))

        init-state (reduce

                     (fn [m [k v]]

                       (assoc m k (get evidence k

                                       (rand-nth (seq v)))))

                     evidence

                     (:vars model))]

    (loop [state init-state

           samples []

           iter 0]

      (if (< iter iters)

        (recur

          (reduce (partial var-update model) state free-vars)

          (conj samples state)

          (inc iter))

        (conj samples state)))))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;DEMO;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defn gibbs-infer [model evidence target]

  (let [samples (gibbs-sample model evidence 5000)]

    (normalize (frequencies (map target samples)))))

And to answer the initial queries:

Query 1:

(gibbs-infer football-model {:QBS 1 :D 0 :H 0} :W)

=> {0 0.5178964, 1 0.4821036}

Thus the probability of winning with a highly skilled QB, facing a low skilled defense, without home advantage, is roughly 50/50. Probabilistic modeling is not a crystal ball!

Query 2:

(gibbs-infer football-model {:W 0 :H 0} :QBS)

=> {0 0.8254349, 1 0.17456509}

If we did not win, but we lacked home field advantage, there is a slightly higher probability our QB is low skilled than we would expect with no evidence.

Cites

PGM, BRML, incanter.org, clojure.org, etc.