STATS / DATA SCI 315
Lecture 06
Softmax Derivatives
Information theory basics
Softmax and Derivatives
Recall squared loss case
Applying chain rule
Cross-entropy in terms of the o’s
Gradient of loss w.r.t. o
Gradient of loss w.r.t. weights
Cross-entropy also works with soft observed labels
Information Theory Basics
Entropy
Entropy
Cross-entropy
KL divergence or relative entropy