Binary Cross-Entropy Loss
Session 7
ACM AI + ACM TeachLA
Slides Link:
https://teachla.uclaacm.com/resources
What is your favorite city?
Recap: Bayes’ Theorem
Bayes’ Review:
Problem:
What is the probability that Grogu is a Star Wars fan (h) given that he has watched The Mandalorian (D)?
h: Grogu is a Star Wars fan (given) D: Grogu has watched The Mandalorian
Data Collected:
= 0.3
= 0.95
= 0.2
h: Grogu is a Star Wars fan (given) D: Grogu has watched The Mandalorian
The probability Grogu is a Star Wars fan given that he has watched The Mandalorian
(0.95) (0.2)
(0.3)
0.63, 63%
Bayes’ Theorem and AI/ML?
Connection to ML
Maximum a Posteriori Hypothesis
Ex: which type of ad has the highest chance of being clicked based on given user
Maximum a Posteriori Hypothesis
Maximum a Posteriori Hypothesis
Maximum a Posteriori Hypothesis
Why can we get rid of P(D)?
Number 1 | Number 2 | Number 3 | Number 4 | Number 5 |
2 | 4 | 6 | 8 | 9 |
What if we multiplied each number by 5?
10
40
60
80
90
Maximum a Posteriori Hypothesis
What if our data is uniformly distributed?
What if our data is uniformly distributed?
Maximum Likelihood Estimation
Last week, we learned about Bayes’ Theorem.
We can use Bayes’ Theorem to measure the probability of hypothesis h occurring given data D.
h : hypothesis (an event)
D : data (background information/event)
Last week, we learned about Bayes’ Theorem.
REMEMBER: Bayes’ Theorem is trying to find the probability of our hypothesis given some data
Do we want our probability to be high or low?
Hypothesis: I am funny
Data: My comedy tik tok got over 100k likes
Let’s Go Through an Example!
Problem:
Given that Grogu has watched The Mandalorian (D) is he a Star Wars fan or not?
What are our hypotheses?
h1 = Star Wars fan
h2 = not a Star Wars fan
Data Collected
= 0.3
= 0.95
= 0.1
= 0.2
= 0.8
h1 = Star Wars fan
h2 = not a Star Wars fan
Bayes’ Review:
Problem:
What is the probability that Grogu is a Star Wars fan (h) given that he has watched The Mandalorian (D)?
We need 3 things….
Probability that Grogu watched the Mandalorian given he is a Star Wars fan
Probability that Grogu is a Star Wars Fan
Probability that Grogu watched the Mandalorian
Probability that Grogu is a Star Wars Fan given that he has watched the Mandalorian
=0.3
=0.95
=0.2
h: Grogu is a Star Wars fan (given) D: Grogu has watched The Mandalorian
The probability Grogu is a Star Wars fan given that he has watched The Mandalorian
(0.95) (0.2)
(0.3)
0.63, 63%
Rate your understanding! Bayes’ Theorem
How much do you understand Bayes’ Theorem?
1 2 3 4 5 6 7 8 9 10
If someone mentioned Bayes’ Theorem I wouldn’t know what they were talking about
I know what Bayes’ Theorem is used for and I know how to solve for P(h|D)
I know what Bayes’ Theorem is, but I wouldn’t be able to solve a Bayes’ Theorem problem
Maximum a Posteriori Hypothesis(MAP)
Maximum Likelihood Estimation(MLE)
Review
= 0.3
= 0.95
= 0.1
= 0.2
= 0.8
= 0.95 (0.2) = 0.19
= 0.1 (0.8) = 0.08
We predict that hypothesis 1 is correct!
Rate your understanding! MAP and MLE
How much do you understand Maximum a Posteriori Hypothesis and Maximum Likelihood Estimation?
1 2 3 4 5 6 7 8 9 10
If someone mentioned MLE or MAP I wouldn’t know what they were talking about
I know what MAP and MLE are used for and I can solve for them
I know what MAP and MLE are but I don’t know how to do problems with them
*Short* Recap of ML Classification
Training Data
Frog
Frog
Rabbit
Frog
Rabbit
Rabbit
Inputs (X)
Labels (Y)
Given
ML Classification Model
(Logistic Regression)
Using data, Model learns the relationship between Inputs & Labels
Rabbit ✓
Rabbit X
Frog ✓
Then we use a loss function to measure performance of model
Use Gradient Descent to improve our model based off loss function
Model Prediction
Inputs (X)
Steps for Machine Learning
Rate your understanding! ML Framework
How much do you understand Machine Learning Framework?
1 2 3 4 5 6 7 8 9 10
im not sure whats going on
I completely get the jist of it
I get an idea of what they are, but still have questions
Binary Cross-Entropy (BCE) Loss
Some terms to recall
Entropy
We define entropy as the measure of uncertainty (or uniqueness) associated with a given distribution.
High entropy
Low entropy
High entropy
Group of items are quite dissimilar
(Lots of similarity here!)
Therefore HIGH entropy
Rate your understanding! Entropy
How much do you understand Entropy?
1 2 3 4 5 6 7 8 9 10
im not sure whats going on
I completely get the jist of it
I get an idea of what they are, but still have questions
Entropy
For classification problems, we can use entropy to measure the amount of loss of our model
Recall: in classification models, we draw a line to separate groups
Goal for Machine Learning (Classification)
--- Blue (Y)
--- Orange (Y)
--- Binary Cross Entropy Loss
Binary Cross Entropy (BCE)
BCE: tells us how good our model (separating line) is
BCE measures amount of dissimilarity (uncertainty)
Rate your understanding! Classification and BCE
How much do you understand Classification and Binary Cross Entropy (BCE) Loss?
1 2 3 4 5 6 7 8 9 10
im not sure whats going on
I completely get the jist of it
I get an idea of what they are, but still have questions
* Where a is our model’s output, and y is the true value.
Binary Cross-Entropy Loss
Cross-entropy is a measure of the difference between two probability distributions.
In a binary classification problem, we can represent the cross-entropy loss as such:
* Where a is our model’s output, and y is the true value (label).
Binary Cross-Entropy Loss
Binary Cross Entropy (BCE)
BCE measures amount of dissimilarity (uncertainty)
Cross-entropy is a measure of the difference between two probability distributions.
How do MLE and BCE tie together?
-- Maximum Likelihood Estimation
-- Binary Cross Entropy [Loss]
Binary Cross-Entropy Loss
is either
1
0
Binary Cross-Entropy Loss
This means the probability of this ordered pair existing in our training examples
Binary Cross-Entropy Loss
event- we know we are looking at an image of a cat or not a cat
event- a certain image exists
Binary Cross-Entropy Loss
For MLE we choose option 1, not option 2
1
1
1
1
1
1
Binary Cross-Entropy Loss
1
1
Binary Cross-Entropy Loss
Binary Cross-Entropy Loss
Binary Cross-Entropy Loss
1
a
1
Binary Cross-Entropy Loss
(image is a cat)
(image is not a cat)
Binary Cross-Entropy Loss
Binary Cross-Entropy Loss
Binary Cross-Entropy Loss
(using binary cross entropy loss)
Binary Cross-Entropy Loss
Binary Cross-Entropy Loss
Binary Cross-Entropy Loss
Binary Cross-Entropy Loss
Binary Cross-Entropy Loss
Binary Cross-Entropy Loss
(using mathematical magic but not really)
Binary Cross-Entropy Loss
Rate your understanding! Mathematical Derivation of BCE
How much do you understand the mathematical derivation of Binary Cross Entropy (BCE) Loss?
1 2 3 4 5 6 7 8 9 10
If someone mentioned BCE I wouldn’t know what they are talking about
I know what BCE is, and I got a jist of what the math meant
I know what BCE is used for, but the math just showed confuses me
Thanks!
ACM AI