1 of 13

STATS / DATA SCI 315

Lecture 04

Regression wrap-up

2 of 13

The Normal Distribution and Squared Loss

3 of 13

Gaussian distribution

4 of 13

Linear regression and Gaussian distribution

  • Linear model with Gaussian errors���
  • This gives us a likelihood of observing a particular y for a given x���
  • Note that this likelihood is a function of w and b for fixed x, y

5 of 13

Maximum likelihood

  • Likelihood of the entire dataset (assuming independence among samples)���
  • Maximum likelihood principle: maximize this (over parameters)
  • We can take log (monotonic transformation) and a minus sign
  • Then, equivalently, minimize the negative log likelihood
  • The equivalence is for the minimizers, not for the value of the objective function!�

6 of 13

Expression for the negative log likelihood

  • Is it clear to everyone how to derive this?
  • Since variance is a positive constant, this shows that MLE is equivalent to minimizing squared error (sum or average)

7 of 13

From Linear Regression to Deep Networks

8 of 13

Linear model as a beginning for deep learning

  • Understand the complex/unfamiliar by relating it to sth simple/familiar
  • We have only talked about linear models
  • Deep learning builds much more complex models
  • However, we can think about linear model in the language of NNs
  • To begin, let us rewrite a linear model in “layer” notation

9 of 13

Number of output nodes = 1

Number of input nodes = d

Number of layers is 1 (we don’t count input layer)

Fully connected

Or dense layer

All inputs connect to all outputs

10 of 13

Cartoonish picture of a biological neuron

Dendrites = input terminals

Nucleus = computational unit

axon = output wire�Axon terminal = output terminal

Axons connect to other neurons via connections called synapses (not shown)

11 of 13

Neuronal computation

  • Info 𝑥𝑖 arriving from other neurons (or sensors, e.g., retina) is received in the dendrites
  • Info is weighted by synaptic weights 𝑤𝑖 determining the effect of the inputs
    • Positive weights – activation
    • Negative weights – inhibition
  • Simple linear weighting gives the result��
  • After applying a nonlinear 𝜎, the result 𝜎(y) is sent to other neurons via the axon for further processing

12 of 13

Neuroscience provides high level inspiration

  • Simple units can be cobbled together to produce far more interesting and complex behavior than a single neuron
  • Need the right connectivity (DL engineers provide this)
  • Need the right learning algorithm (DL uses backprop which is unlikely to be used by the brain)
  • As far as these high level ideas are concerned, DL does derive its inspiration from neuroscience

13 of 13

On biological vs artificial neurons

  • The cartoonish picture is imprecise
  • There is evidence (see paper if interested) that a single biological neurons actually needs an artificial NN with several layers to model its complexity
  • Deep learning today draws little direct inspiration in neuroscience
  • Airplanes might have been inspired by birds
  • But ornithology has not been the primary driver of aeronautics innovation
  • Deep learning inspiration comes from math, stats, and CS