1 of 30

The neural code�Lessons from machine learning

Kenneth D Harris, UCL

2 of 30

Multiple linear regression

  •  

3 of 30

Too many predictors

  •  

4 of 30

 

5 of 30

 

6 of 30

 

7 of 30

Overfitting = large weight vectors

  •  

8 of 30

Example

 

 

9 of 30

Ridge regression introduces a bias

 

 

10 of 30

Equivariance

11 of 30

Equivariance of ridge and linear regression

  •  

12 of 30

Delta rule

  • Learning rule for 1-layer neural networks
  • “Hebbian” coincidence detector of input with error

 

 

 

 

 

13 of 30

Delta rule vs. linear regression

  •  

14 of 30

There are problems linear regression can’t solve

  • Not all patterns can be linearly separated

15 of 30

Solution: non-linear hidden units

Input

Representation

Output

16 of 30

“Codon” theory

Dense input

Sparse representation

Dense input

Sparse representation

17 of 30

“Untangling hypothesis”

  • Cortex forms a high-dimensional representation of its inputs
  • This allows nonlinear discriminations to be done with linear readout
  • The cortical code could be – but does not need to be sparse

18 of 30

“No free lunch theorem”

  • There is no representation that is good for all problems
  • “If you can learn anything, you can’t learn anything”

  • Neural representations have evolved to provide an “inductive bias” that lets animals quickly learn tasks that they will likely encounter
  • This makes them bad at other tasks

  • E.g. humans find it hard to distinguish two white noise images (different in every pixel)
  • And easy to distinguish male and female faces (subtle differences in a few pixels)

  • All human languages share grammatical features
  • LLMs can learn artificial languages that do not share these features

19 of 30

Primate IT cortex visual code predicts human performance

  • Record primate IT responses to images
    • Including variations in pose, position, scale, etc.

  • Train a linear decoder to distinguish image pairs

  • Its performance correlates with human psychophysics

20 of 30

How to characterize a code

  •  

Equivalent to

21 of 30

Kernel matrix vs covariance matrix

  •  

22 of 30

Kernel matrix vs. Kernel function

  •  

23 of 30

Kernel eigenfunctions

  •  

24 of 30

Relation between eigenvalues

  • As the number of neurons and stimuli increases:

The sample kernel matrix eigenvalues converge to �the kernel function eigenvalues.

  • If we measure responses to enough stimuli, we can estimate kernel function eigenvalues

  • Like PCA but for all stimuli we could have shown, not just those we did

V. Koltchinskii & E. Giné, Berrnoulli, 2000

25 of 30

Higher eigenfunctions encode finer stimulus features

26 of 30

What the eigenspectrum means

  •  

27 of 30

 

 

 

fractal

discontinuous

28 of 30

An experimental prediction

  •  

29 of 30

Low-dimensional inputs

Low dimensional stimuli:

30 of 30

Summary

  • Cortex may form high-dimensional representations of stimuli to allow linear readout

  • Representational geometry is summarized by the kernel function: the dot-product of representations of every stimulus pair.

  • For rotation-equivariant readouts, this is the only thing that matters

  • Kernel eigenvalues tell you how much weight the representation gives to fine details vs. big picture

  • V1 eigenvalues decay just slowly enough to not be pathological