1 of 13

STATS / DATA SCI 315

Lecture 04

Regression wrap-up

2 of 13

The Normal Distribution and Squared Loss

Gaussian distribution

Linear regression and Gaussian distribution

Maximum likelihood

Likelihood of the entire dataset (assuming independence among samples)��
Maximum likelihood principle: maximize this (over parameters)
We can take log (monotonic transformation) and a minus sign
Then, equivalently, minimize the negative log likelihood
The equivalence is for the minimizers, not for the value of the objective function!�

Expression for the negative log likelihood

Is it clear to everyone how to derive this?
Since variance is a positive constant, this shows that MLE is equivalent to minimizing squared error (sum or average)

From Linear Regression to Deep Networks

Linear model as a beginning for deep learning

Number of output nodes = 1

Number of input nodes = d

Number of layers is 1 (we don’t count input layer)

Fully connected

Or dense layer

All inputs connect to all outputs

Cartoonish picture of a biological neuron

Dendrites = input terminals

Nucleus = computational unit

axon = output wire�Axon terminal = output terminal

Axons connect to other neurons via connections called synapses (not shown)

Neuronal computation

Info 𝑥_𝑖 arriving from other neurons (or sensors, e.g., retina) is received in the dendrites
Info is weighted by synaptic weights 𝑤_𝑖 determining the effect of the inputs

Simple linear weighting gives the result��
After applying a nonlinear 𝜎, the result 𝜎(y) is sent to other neurons via the axon for further processing

Neuroscience provides high level inspiration

Simple units can be cobbled together to produce far more interesting and complex behavior than a single neuron
Need the right connectivity (DL engineers provide this)
Need the right learning algorithm (DL uses backprop which is unlikely to be used by the brain)
As far as these high level ideas are concerned, DL does derive its inspiration from neuroscience

On biological vs artificial neurons

The cartoonish picture is imprecise
There is evidence (see paper if interested) that a single biological neurons actually needs an artificial NN with several layers to model its complexity
Deep learning today draws little direct inspiration in neuroscience
Airplanes might have been inspired by birds
But ornithology has not been the primary driver of aeronautics innovation
Deep learning inspiration comes from math, stats, and CS