1 of 30

The neural code�Lessons from machine learning

Kenneth D Harris, UCL

2 of 30

Multiple linear regression

3 of 30

Too many predictors

4 of 30

5 of 30

6 of 30

7 of 30

Overfitting = large weight vectors

8 of 30

Example

9 of 30

Ridge regression introduces a bias

10 of 30

Equivariance

11 of 30

Equivariance of ridge and linear regression

12 of 30

Delta rule

Learning rule for 1-layer neural networks
“Hebbian” coincidence detector of input with error

13 of 30

Delta rule vs. linear regression

14 of 30

There are problems linear regression can’t solve

Not all patterns can be linearly separated

15 of 30

Solution: non-linear hidden units

Input

Representation

Output

16 of 30

“Codon” theory

Dense input

Sparse representation

Dense input

Sparse representation

17 of 30

“Untangling hypothesis”

Cortex forms a high-dimensional representation of its inputs
This allows nonlinear discriminations to be done with linear readout
The cortical code could be – but does not need to be sparse

18 of 30

“No free lunch theorem”

There is no representation that is good for all problems
“If you can learn anything, you can’t learn anything”

Neural representations have evolved to provide an “inductive bias” that lets animals quickly learn tasks that they will likely encounter
This makes them bad at other tasks

E.g. humans find it hard to distinguish two white noise images (different in every pixel)
And easy to distinguish male and female faces (subtle differences in a few pixels)

All human languages share grammatical features
LLMs can learn artificial languages that do not share these features

19 of 30

Primate IT cortex visual code predicts human performance

Record primate IT responses to images

Including variations in pose, position, scale, etc.

Train a linear decoder to distinguish image pairs

Its performance correlates with human psychophysics

20 of 30

How to characterize a code

Equivalent to

21 of 30

Kernel matrix vs covariance matrix

22 of 30

Kernel matrix vs. Kernel function

23 of 30

Kernel eigenfunctions

24 of 30

Relation between eigenvalues

As the number of neurons and stimuli increases:

The sample kernel matrix eigenvalues converge to �the kernel function eigenvalues.

If we measure responses to enough stimuli, we can estimate kernel function eigenvalues

Like PCA but for all stimuli we could have shown, not just those we did

V. Koltchinskii & E. Giné, Berrnoulli, 2000

25 of 30

Higher eigenfunctions encode finer stimulus features

https://github.com/MouseLand/rastermap

26 of 30

What the eigenspectrum means

27 of 30

fractal

discontinuous

28 of 30

An experimental prediction

29 of 30

Low-dimensional inputs

Low dimensional stimuli:

30 of 30

Summary

Cortex may form high-dimensional representations of stimuli to allow linear readout

Representational geometry is summarized by the kernel function: the dot-product of representations of every stimulus pair.

For rotation-equivariant readouts, this is the only thing that matters

Kernel eigenvalues tell you how much weight the representation gives to fine details vs. big picture

V1 eigenvalues decay just slowly enough to not be pathological