BAYESIAN LEARNING
Introduction
Cont…
Cont…
Cont…
Cont…
BAYES THEOREM
Cont…
Cont…
according to Bayes theorem.
It is also reasonable to see that P(h/D) decreases as P(D) increases, because the more probable it is that D will be observed independent of h, the less evidence D provides in support of h.
Cont…
Cont…
Cont…
Cont…
An Example
Cont…
Cont…
Cont…
BAYES THEOREM AND CONCEPT LEARNING
Brute-Force Bayes Concept Learning
Cont…
Cont…
Cont…
Cont…
given a world in which hypothesis h holds (i.e., given a world in which h is the correct description of the target concept c). Since we assume noise-free training data, the probability of observing classification di given h is just 1 if di = h(xi) and 0 if di # h(xi). Therefore,
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
MAP Hypotheses and Consistent Learners
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
MAXIMUM LIKELIHOOD AND LEAST-SQUARED ERROR HYPOTHESES
Cont…
,where R represents the set of real numbers).
drawn from H.
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
directly from the exponent in the definition of the Normal distribution. Similar derivations can be performed starting with other assumed noise distributions, producing different results.
MAXIMUM LIKELIHOOD HYPOTHESES FOR PREDICTING PROBABILITIES
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
becomes equal to 1. Hence P(di = 1/h,xi) = h(xi), which is equivalent to the first case in Equation (6.9). A similar analysis shows that the two equations are also equivalent when di = 0.
Cont…
Cont…
Gradient Search to Maximize Likelihood in a Neural Net
Cont…
Cont…
Cont…
Cont…