1 of 89

Hopfield Networks

Robert M. Keller

February 2016

2 of 89

Hopfield Networks

Proposed in 1982 by John Hopfield: formerly Professor at Princeton, Caltech, now again at Princeton

Hopfield may have been the first to observe the connection of these networks to Ising models (or spin models) known in physics.

John Hopfield

3 of 89

Use of Hopfield Nets

Hopfield nets extend the idea of linear associator, �by adding recurrent connections.

They may be used as associative memories.

Hopfield nets offer more robust error-correction and de-noising than their non-recurrent counterparts.

4 of 89

Extensions of Hopfield Nets

Hopfield nets are a gateway model for others, such as:

Boltzmann machines
Restricted Boltzmann Machines (RBMs)
Deep-Belief Networks

5 of 89

Successive (L to R) Images (130x180 pixels) from Hopfield’s Paper

Enhancement

Association

Completion

Application

6 of 89

Hopfield Net Diagram

There is a delay in the output fed back as input.

There are no input lines as such.

The inputs are the initial values in the neurons.

The outputs are “read out” from the neurons.

The diagonal weights are 0: w_ii = 0.

The weights are symmetric: w_ij = w_ji.

7 of 89

Thresholds/Biases

Per Hopfield’s article in Scholarpedia, neurons can have an added bias.

http://www.scholarpedia.org/article/Hopfield_network

8 of 89

Learning

The Hopfield net does not learn dynamically.�
Instead, the weights are pre-computed based on a set of patterns presented to the neurons, using the Hebb rule, for example.

These patterns are vectors over {0, 1} and the weight changes implied for each pattern are summed over all patterns.

(There are models based on the Hopfield net which do learn, however.)

9 of 89

Several Variations on One Model

Various value sets

{0, 1}
{-1, 1}
[0, 1] (continuous interval)
[-1, 1] (continuous interval)�

Various activation functions�
Various updating methods�
Relaxing the symmetry assumption

10 of 89

General Update Method

The state of the network consists of a value for each neuron.�
The values of all neurons determine an activation (“net”) value for each neuron.�
The activation function acting on the activation value determines an �implied output value for each neuron:

Depending on the activation function, a neuron may be stable or firable:

Stable: implied output value = current value
Firable: implied output value ≠ current value�

11 of 89

General Update Method

Synchronous operation:

All firable neurons are simultaneously switched to their implied values.�

Asynchronous operation:

One firable neuron selected at random is switched to its implied value.�

12 of 89

Example Synchronous Operation

Assume activation is a threshold function with threshold 0.

Assume weight matrix is W:

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

13 of 89

Example Synchronous Operation

Assume activation is a threshold function with threshold 0.

If the current state is 1 1 1 1 1, then the weighted sums are on the right:

The implied next state is thus 0 1 1 0 1.

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

-2
0
0
-2
0

14 of 89

Example Synchronous Operation

Assume activation is a threshold function with threshold 0.

If the current state is 0 1 1 0 1, then the weighted sums are on the right:

The implied next state is thus 0 1 1 0 1. Now no neuron is firable, so the network will persist in state 0 1 1 0 1.

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

-3
2
2
-2
2

15 of 89

Example Asynchronous Operation

Assume activation is a threshold function with threshold 0.

If the current state is 1 1 1 1 1, then the weighted sums are on the right:

Suppose the first node fires, giving state 0 1 1 1 1.

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

-2
0
0
-2
0

16 of 89

Example Asynchronous Operation

Assume activation is a threshold function with threshold 0.

If the current state is 1 1 1 1 1, then the weighted sums are on the right:

Alternatively, the fourth node could fire first, giving state 1 1 1 0 1.

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

-2
0
0
-2
0

17 of 89

Example Asynchronous Operation

Assume activation is a threshold function with threshold 0.

If the current state is 0 1 1 1 1, then the weighted sums are on the right:

If the first node fires, then the fourth node fires, we get state 0 1 1 0 1.

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

-2
1
1
-3
1

18 of 89

Example Asynchronous Operation

Assume activation is a threshold function with threshold 0.

If the current state is 1 1 1 0 1, then the weighted sums are on the right:

If the fourth node fires first, then the first node, we get 0 1 1 0 1.

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

-3
1
1
-2
1

19 of 89

Example Asynchronous Operation

Assume activation is a threshold function with threshold 0.

If the current state is 0 1 1 0 1, then the weighted sums are on the right:

Now no neuron is firable, so the network persists in state 0 1 1 0 1.

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

-3
2
2
-3
2

20 of 89

Partial Summary of Behavior, Asynchronous Case

1 1 1 1 1

0 1 1 1 1

1 1 1 0 1

0 1 1 0 1

Red nodes are firable.

21 of 89

Summary of Synchronous vs. Asynchronous So Far

We see from the above example that, although the number of steps is different for asynchronous vs synchronous, the network ends in the same state in both cases.

Will this always be the case?

22 of 89

Diagram of All States

Asynchronous Operation

[1, 1, 0, 0, 0]

[0, 1, 0, 0, 0]

[1, 0, 0, 0, 0]

[1, 0, 1, 0, 0]

[0, 0, 1, 0, 0]

[1, 0, 0, 0, 1]

[0, 0, 0, 0, 1]

[0, 1, 0, 1, 0]

[0, 0, 0, 1, 0]

[0, 0, 1, 1, 0]

[0, 0, 0, 1, 1]

[1, 1, 1, 1, 0]

[0, 1, 1, 1, 0]

[1, 1, 1, 0, 0]

[1, 1, 0, 1, 0]

[1, 0, 1, 1, 0]

[1, 1, 0, 1, 1]

[0, 1, 0, 1, 1]

[1, 0, 1, 0, 1]

[1, 0, 0, 1, 1]

[1, 0, 1, 1, 1]

[0, 0, 1, 1, 1]

[0, 1, 1, 0, 0]

[1, 0, 0, 1, 0]

[0, 0, 1, 0, 1]

[0, 1, 0, 0, 1]

[1, 1, 1, 1, 1]

[1, 1, 1, 0, 1]

[0, 1, 1, 1, 1]

[0, 1, 1, 0, 1]

[0, 0, 0, 0, 0]

[0, 1, 1, 0, 0]

[1, 1, 0, 0, 1]

23 of 89

Energy and Computing the Weight Matrix

For the next part of this discussion, I will use {-1, 1} as neuron values, rather than {0, 1}.

However, I will show these as

O (“oh”) for -1 and �
I (“eye”) for +1

to make the spacing look nice, i.e. no minus sign.

24 of 89

Energy and Computing the Weight Matrix

Another assumption we will make, standard with Hopfield networks:

A neuron is firable iff:

Its current value is -1 and the activation is >0, or
Its current value is 1 and the activation is <0.

Thus if its activation is 0, it is not firable, regardless of its current state.

This questionable assumption is made to make the energy argument work out.

25 of 89

The “Energy” Function for a Hopfield Network

Define an energy function as a mapping from states (y₁, y₂, …, y_n) to numbers:�� E(y₁, y₂, …, y_n) = -Σ Σ w_ijy_iy_j (note negative sign)_�_�where w_ij is the weight from neuron j to neuron i, and the double sum is over i and j.�
Remember that w is symmetric (w_ij= w_ji) and diagonal terms are 0. So we only need to sum over i < j.

26 of 89

Example Energy Function

Assume activation is a threshold function with threshold 0.

If the current state is OOOOO, then the energy is 4.

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

1
1
1
1
1

-1	-1	-1	-1	-1

27 of 89

Example Energy Function

Assume activation is a threshold function with threshold 0.

If the current state is IIIII, then the energy is -4.

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

1
1
1
1
1

1	1	1	1	1

28 of 89

Example Energy Function

Assume activation is a threshold function with threshold 0.

If the current state is OIIOI, then the energy is -20.

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

-1
1
1
-1
1

-1	1	1	-1	1

29 of 89

Depicting Multiple Stable States

30 of 89

Energy Changes for Asynchronous Operation

Claim: Firing any transition always decreases the value of the energy function E(y₁, y₂, …, y_n) = -Σ Σ w_ijy_iy_j_�
Suppose y_i fires. E(y₁, y₂, …, y_n) = -y_i Σ w_ijy_j-Σ Σ w_ijy_iy_j�where the summations are over j ≠ i, by factoring out y, the only one that changes.�
The corresponding ΔE is -Δy_i Σ w_ijy_jwhere the sum is over j, as y_iis the only neuron changing. �
Δy_i ∈ {2 or -2}, as the neuron’s values are in {-1, 1}.

If Δy_i = 2, (i.e. -1 ⇒ 1) we need Σ w_ijy_j > 0 to fire, so ΔE < 0.
If Δy_i = -2, (i.e. 1 ⇒ -1) we need Σ w_ijy_j < 0 to fire, so ΔE < 0.�

31 of 89

Energy Changes for Asynchronous Operation

Energy decreases with each asynchronous firing.�
There is a lower bound on energy for a given network.�
Therefore an asynchronous network eventually stops firing.

Note: Here is where we make use of the assumption that if a neuron’s activation is 0, it is not firable, regardless of its current state. Without this assumption, the energy could stay the same several states in a row, even though there is no cycle in the state diagram.

�

32 of 89

Listing of All States, Their Energies, and Successors

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

Weight Matrix

Alphabetic Order

Sorted by Decreasing Energy

33 of 89

Partial State Diagram

0	-1	-1	1	-1
-1	0	1	-1	1
-1	1	0	-1	1
1	-1	-1	0	-1
-1	1	1	-1	0

Weight Matrix

IOOIO (-20)

OIIOI (-20)

IIIOI (-4)

IIOIO (-4)

OOIOI (-4)

OOOIO (-4)

. . .

IIII0 (4)

IOIOI (4)

. . .

10 states with energy -4

2 states with energy -20

20 states with energy 4

For this particular network, each state has a unique limit state.�

This won’t always be the case.

34 of 89

Duality of Firing Rule and Energy Change

There are two ways of thinking about a hopfield network firing asynchronously:

Pick an unstable neuron and fire it using the activation function, or�
Pick a neuron, the firing of which decreases the energy, and fire it.

In other words, think of the network as a minimum-energy-seeking device.

35 of 89

Minimizing Energy Example using the {O, I} model

Which value (O or I) gives the lower energy for the neuron marked ‘?’?

Case O (-1):

-4*I*O + 3*O*O + 3*O*O = �4 + 3 + 3 = 10

Case I (1):

-4*I*I + 3*O*I + 3*O*I = �-4 -3 -3 = -10

I

O

?

-4

-1

3

2

3

36 of 89

Minimizing Energy Example using the {O, I} model

Which value (O or I) gives the lower energy for the neuron marked ‘?’?

Case O (-1):

Case I (1):

I

O

-4

-1

3

2

3

37 of 89

Synchronous Updating Example

Both upper neurons want to flip (5*I - 10*O = 15 > 0)

O

I

O

-10

5

38 of 89

Synchronous Updating Example

Both upper neurons still want to flip (5*I - 10*I = -5 < 0).

This creates a cycle of length 2�in the state diagram.

The energy changes between�100 and -100.

I

-10

5

39 of 89

Attractors and Memories

Minimum energy states are known as attractors, because a running network will be attracted toward them.

Some, but not all, attractors will correspond to stored patterns, i.e. memories, in the network.

Other attractors will be spurious.

40 of 89

Basin of Attraction Concept

https://en.wikipedia.org/wiki/Hopfield_network

41 of 89

Attractors in a Continuous Version of a Hopfield Net: Neural “Flip-Flop”

https://page.mi.fu-berlin.de/rojas/neural/chapter/K13.pdf

(-1, 1) is minimum energy

(1, -1) is minimum energy

42 of 89

Hebbian Weight Assignment

Hebb says Δw_ijwill be positive if y_iand y_jare 1 simultaneously.

Δw_ij= y_iy_jHebb rule

y_j

y_i

w_ij

synapse

43 of 89

Generalized Hebbian Interpretation

If w_ij> 0, the two neurons will tend to be in the same state.

If w_ij< 0, the two neurons will tend to be in opposite states. �
But the states generally depend on all weights, not just this one.

y_j

y_i

w_ij

synapse

44 of 89

Switching Models

The {0, 1} model may be a good one for biologicaly modeling.

(It is also used in “sparse coding”.)

However, the {-1, 1} model tends to have fewer spurious minima.

We will use {-1, 1}, but call these values {O, I} (‘oh” and “eye”) for aesthetic reasons.

We can also use switch back and let O be 0 and I be 1 if we want.

45 of 89

Example

Suppose there are two patterns to be learned:

[I, I, O, O]

[O, O, I, I]

46 of 89

Weight Computation using Hebb Rule

Given a pattern Y as a column-vector, the weight change matrix contribution Δw_ij= y_iy_jfor a pattern is the outer product � Y Y^T�which is an n x n matrix.

This computation is the same as for the linear associator.

However, after the contributions of the individual patterns is summed,

the diagonal weights are forced to 0.

For the example patterns [I, I, O, O], [O, O, I, I] this gives us the following weight matrix:

[[0, 2, -2, -2], [2, 0, -2, -2], [-2, -2, 0, 2], [-2, -2, 2, 0]]

47 of 89

Hopfield Net as Associative Memory

As with the Linear Associator, the set of “stored patterns” are represented by the weight matrix.�
As with linear associators, to be most effective, the patterns should be reasonably orthogonal.�

48 of 89

Stored Patterns Correspond to Attractors

If the Hebb rule were used with orthogonal patterns, and linear activations, stored patterns correspond to attractors (minimum-energy states).�
The reasoning is analogous to the case with the linear associative memory.

49 of 89

Examples

For the example patterns [I, I, O, O], [O, O, I, I] using this weight matrix:

[[0, 2, -2, -2], [2, 0, -2, -2], [-2, -2, 0, 2], [-2, -2, 2, 0]]

here are some net simulations:

simulation from IIOO

0: IIOO Energy = -24

stable

simulation from OOII

0: OOII Energy = -24

stable

States corresponding to the example patterns are stable as desired.

50 of 89

Examples

For the example patterns [I, I, O, O], [O, O, I, I] using this weight matrix:

[[0, 2, -2, -2], [2, 0, -2, -2], [-2, -2, 0, 2], [-2, -2, 2, 0]]

Examples differing from patterns in one bit:

simulation from IIIO

0: IIIO Energy = 0

1: IIOO Energy = -24

stable

simulation from IIOI

0: IIOI Energy = 0

1: IIOO Energy = -24

stable

simulation from OIII

0: OIII Energy = 0

1: OOII Energy = -24

stable

51 of 89

Examples

For the example patterns [I, I, O, O], [O, O, I, I] using this weight matrix:

[[0, 2, -2, -2], [2, 0, -2, -2], [-2, -2, 0, 2], [-2, -2, 2, 0]]

Examples differing from patterns in two bits:�

simulation from IOIO

0: IOIO Energy = 8

1: IIIO Energy = 0

2: IIOO Energy = -24

stable

simulation from IOIO

0: IOIO Energy = 8

1: IIIO Energy = 0

2: IIOO Energy = -24

stable

52 of 89

Summary of this Example

For the example patterns [I, I, O, O], [O, O, I, I] using this weight matrix:

[[0, 2, -2, -2], [2, 0, -2, -2], [-2, -2, 0, 2], [-2, -2, 2, 0]]

The possible energies are 8, 0, -24 (from computation over all states).

Some states have more than one successor.

However, each state has a unique limit state.

The limit states are the stored patterns, with energy -24.

53 of 89

Analysis Code

Rather than conduct a bunch of simulations, we compute all of the limit states from a given state using depth-first search (relying on there being no cycles).

def limits(weight, state):

""" Returns the limits of a sequence of firings from a given state """

succs = successors(weight, state)

if succs == []:

return [state]

return union(map(lambda x:limits(weight, x), succs))

def successors(weight, state):

""" Returns a list of successors of the given state under weight matrix """

successors = []

i = 0

for neuron in weight:

successors += successor(neuron, state, i)

i += 1

return successors

54 of 89

Successor defined

For any state and any neuron, there can be a unique successor if the neuron is firable, or no successor if not. (Note that if tendency is 0, there is no successor.)

def successor(neuron, state, i):

""" Returns a list of 1 or 0 states, depending on whether the ith neuron """

""" (a vector of weights) fires or not. """

tendency = inner(neuron, state)

if state[i] < 0 and tendency > 0:

successor = copy.copy(state)

successor[i] = I

return [successor]

elif state[i] > 0 and tendency < 0:

successor = copy.copy(state)

successor[i] = O

return [successor]

return []

55 of 89

Checking for more than one limit

def limit_analysis(weight, states, patterns, columns, Is, Os):

""" For each state in states, checks whether there is more than one limit for """

""" that state and if so, the state and limits of the state are printed out. """

""" Each printout is given a sequence number. """

count = 0

for state in states:

stateLimits = limits(weight, state)

if len(stateLimits) > 1:

count += 1

print str(count) + ":", len(stateLimits), "limits:"

print show_list(weight, [state] + stateLimits, columns, Is, Os)

56 of 89

Letter Storage Example

In this example, there are 3 patterns as 35-element vectors.

We view them as 7x5 grids representing letters “H”, “M”, and “C”.

We compute a 35x35 weight matrix using the Hebb rule

The patterns with * as I and blank as O, with energies shown below, are:

Note that these patterns are definitely not orthogonal, so we do not expect this to work perfectly.

* *

*****

* *

(-1658)

* *

** **

* * *

* *

(-1658)

***

* *

*

* *

***

(-1138)

57 of 89

The 35x35 Weight Matrix for HMC

[[0, -3, -3, -3, 3, 1, 1, -1, 1, 1, 1, -1, 1, -1, 3, 1, 1, 1, 1, 3, 1, -1, -1, -1, 3, 1, -1, -1, -1, 1, 3, -3, -3, -3, 3], �[-3, 0, 3, 3, -3, -1, -1, 1, -1, -1, -1, 1, -1, 1, -3, -1, -1, -1, -1, -3, -1, 1, 1, 1, -3, -1, 1, 1, 1, -1, -3, 3, 3, 3, -3], �[-3, 3, 0, 3, -3, -1, -1, 1, -1, -1, -1, 1, -1, 1, -3, -1, -1, -1, -1, -3, -1, 1, 1, 1, -3, -1, 1, 1, 1, -1, -3, 3, 3, 3, -3], �[-3, 3, 3, 0, -3, -1, -1, 1, -1, -1, -1, 1, -1, 1, -3, -1, -1, -1, -1, -3, -1, 1, 1, 1, -3, -1, 1, 1, 1, -1, -3, 3, 3, 3, -3], �[3, -3, -3, -3, 0, 1, 1, -1, 1, 1, 1, -1, 1, -1, 3, 1, 1, 1, 1, 3, 1, -1, -1, -1, 3, 1, -1, -1, -1, 1, 3, -3, -3, -3, 3], �[1, -1, -1, -1, 1, 0, -1, -3, -1, 3, 3, -3, -1, -3, 1, 3, -1, -1, -1, 1, 3, -3, -3, -3, 1, 3, -3, -3, -3, 3, 1, -1, -1, -1, 1], �[1, -1, -1, -1, 1, -1, 0, 1, 3, -1, -1, 1, 3, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, 1, 1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1], �[-1, 1, 1, 1, -1, -3, 1, 0, 1, -3, -3, 3, 1, 3, -1, -3, 1, 1, 1, -1, -3, 3, 3, 3, -1, -3, 3, 3, 3, -3, -1, 1, 1, 1, -1], �[1, -1, -1, -1, 1, -1, 3, 1, 0, -1, -1, 1, 3, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, 1, 1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1], �[1, -1, -1, -1, 1, 3, -1, -3, -1, 0, 3, -3, -1, -3, 1, 3, -1, -1, -1, 1, 3, -3, -3, -3, 1, 3, -3, -3, -3, 3, 1, -1, -1, -1, 1], �[1, -1, -1, -1, 1, 3, -1, -3, -1, 3, 0, -3, -1, -3, 1, 3, -1, -1, -1, 1, 3, -3, -3, -3, 1, 3, -3, -3, -3, 3, 1, -1, -1, -1, 1], �[-1, 1, 1, 1, -1, -3, 1, 3, 1, -3, -3, 0, 1, 3, -1, -3, 1, 1, 1, -1, -3, 3, 3, 3, -1, -3, 3, 3, 3, -3, -1, 1, 1, 1, -1], �[1, -1, -1, -1, 1, -1, 3, 1, 3, -1, -1, 1, 0, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, 1, 1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1], �[-1, 1, 1, 1, -1, -3, 1, 3, 1, -3, -3, 3, 1, 0, -1, -3, 1, 1, 1, -1, -3, 3, 3, 3, -1, -3, 3, 3, 3, -3, -1, 1, 1, 1, -1], �[3, -3, -3, -3, 3, 1, 1, -1, 1, 1, 1, -1, 1, -1, 0, 1, 1, 1, 1, 3, 1, -1, -1, -1, 3, 1, -1, -1, -1, 1, 3, -3, -3, -3, 3], �[1, -1, -1, -1, 1, 3, -1, -3, -1, 3, 3, -3, -1, -3, 1, 0, -1, -1, -1, 1, 3, -3, -3, -3, 1, 3, -3, -3, -3, 3, 1, -1, -1, -1, 1], �[1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 0, 3, 3, 1, -1, 1, 1, 1, 1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1], �[1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 3, 0, 3, 1, -1, 1, 1, 1, 1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1], �[1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 3, 3, 0, 1, -1, 1, 1, 1, 1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1], �[3, -3, -3, -3, 3, 1, 1, -1, 1, 1, 1, -1, 1, -1, 3, 1, 1, 1, 1, 0, 1, -1, -1, -1, 3, 1, -1, -1, -1, 1, 3, -3, -3, -3, 3], �[1, -1, -1, -1, 1, 3, -1, -3, -1, 3, 3, -3, -1, -3, 1, 3, -1, -1, -1, 1, 0, -3, -3, -3, 1, 3, -3, -3, -3, 3, 1, -1, -1, -1, 1], �[-1, 1, 1, 1, -1, -3, 1, 3, 1, -3, -3, 3, 1, 3, -1, -3, 1, 1, 1, -1, -3, 0, 3, 3, -1, -3, 3, 3, 3, -3, -1, 1, 1, 1, -1], �[-1, 1, 1, 1, -1, -3, 1, 3, 1, -3, -3, 3, 1, 3, -1, -3, 1, 1, 1, -1, -3, 3, 0, 3, -1, -3, 3, 3, 3, -3, -1, 1, 1, 1, -1], �[-1, 1, 1, 1, -1, -3, 1, 3, 1, -3, -3, 3, 1, 3, -1, -3, 1, 1, 1, -1, -3, 3, 3, 0, -1, -3, 3, 3, 3, -3, -1, 1, 1, 1, -1], �[3, -3, -3, -3, 3, 1, 1, -1, 1, 1, 1, -1, 1, -1, 3, 1, 1, 1, 1, 3, 1, -1, -1, -1, 0, 1, -1, -1, -1, 1, 3, -3, -3, -3, 3], �[1, -1, -1, -1, 1, 3, -1, -3, -1, 3, 3, -3, -1, -3, 1, 3, -1, -1, -1, 1, 3, -3, -3, -3, 1, 0, -3, -3, -3, 3, 1, -1, -1, -1, 1], �[-1, 1, 1, 1, -1, -3, 1, 3, 1, -3, -3, 3, 1, 3, -1, -3, 1, 1, 1, -1, -3, 3, 3, 3, -1, -3, 0, 3, 3, -3, -1, 1, 1, 1, -1], �[-1, 1, 1, 1, -1, -3, 1, 3, 1, -3, -3, 3, 1, 3, -1, -3, 1, 1, 1, -1, -3, 3, 3, 3, -1, -3, 3, 0, 3, -3, -1, 1, 1, 1, -1], �[-1, 1, 1, 1, -1, -3, 1, 3, 1, -3, -3, 3, 1, 3, -1, -3, 1, 1, 1, -1, -3, 3, 3, 3, -1, -3, 3, 3, 0, -3, -1, 1, 1, 1, -1], �[1, -1, -1, -1, 1, 3, -1, -3, -1, 3, 3, -3, -1, -3, 1, 3, -1, -1, -1, 1, 3, -3, -3, -3, 1, 3, -3, -3, -3, 0, 1, -1, -1, -1, 1], �[3, -3, -3, -3, 3, 1, 1, -1, 1, 1, 1, -1, 1, -1, 3, 1, 1, 1, 1, 3, 1, -1, -1, -1, 3, 1, -1, -1, -1, 1, 0, -3, -3, -3, 3], �[-3, 3, 3, 3, -3, -1, -1, 1, -1, -1, -1, 1, -1, 1, -3, -1, -1, -1, -1, -3, -1, 1, 1, 1, -3, -1, 1, 1, 1, -1, -3, 0, 3, 3, -3], �[-3, 3, 3, 3, -3, -1, -1, 1, -1, -1, -1, 1, -1, 1, -3, -1, -1, -1, -1, -3, -1, 1, 1, 1, -3, -1, 1, 1, 1, -1, -3, 3, 0, 3, -3], �[-3, 3, 3, 3, -3, -1, -1, 1, -1, -1, -1, 1, -1, 1, -3, -1, -1, -1, -1, -3, -1, 1, 1, 1, -3, -1, 1, 1, 1, -1, -3, 3, 3, 0, -3], �[3, -3, -3, -3, 3, 1, 1, -1, 1, 1, 1, -1, 1, -1, 3, 1, 1, 1, 1, 3, 1, -1, -1, -1, 3, 1, -1, -1, -1, 1, 3, -3, -3, -3, 0]]

58 of 89

One-Bit Error Correction

Computation shows that with only one bit of a letter altered, the network still stabilizes to the original letter, regardless of which bit is altered.

* *

*****

* *

(-1658)

* *

** **

* * *

* *

(-1658)

***

* *

*

* *

***

(-1138)

limit analysis for no noise:

limit analysis for 1 level of noise:

(no output in either case, which is good)

limit analysis for 2 levels of noise:

(much output, see next slide)

59 of 89

Spurious Attractors

But with two bits of a letter altered, the network can stabilize in one of two different states, one of which will not be any of the original patterns.

These are called spurious attractors. Below we show some altered patterns for “H” and “M” and their possible limit states (with their energies). �For the altered “C”, there is always just one limit state. (Possible explanation?)

* ** | * * * *

* * | * * * *

** ** | ***** * *

* * | * * * *

(-1434)(-1658)(-1658)

** * | * * * *

** * | ** ** * *

* * * | * * * * *

* * | * * * *

(-1434)(-1658)(-1658)

60 of 89

Ways Spurious Attractors Arise

It can be seen that:

If a state is stable, then the negation of that state is also stable.

Here we compute that the encoded “H” and its inverse are both stable (and have the same energy):�
successors(W, [I,O,O,O,I,I,O,O,O,I,I,O,O,O,I,I,I,I,I,I,I,O,O,O,I,I,O,O,O,I,I,O,O,O,I]) = []
energy(W, [I,O,O,O,I,I,O,O,O,I,I,O,O,O,I,I,I,I,I,I,I,O,O,O,I,I,O,O,O,I,I,O,O,O,I]) = -1658�
successors(W, [O,I,I,I,O,O,I,I,I,O,O,I,I,I,O,O,O,O,O,O,O,I,I,I,O,O,I,I,I,O,O,I,I,I,O]) = []
energy(W, [O,I,I,I,O,O,I,I,I,O,O,I,I,I,O,O,O,O,O,O,O,I,I,I,O,O,I,I,I,O,O,I,I,I,O]) = -1658

61 of 89

Ways Spurious Attractors Arise

It can be seen that:

The xor of any odd number of stable states is stable.

This generally means that there will be lots of stable states that are not original patterns.

62 of 89

Energy Explanation of Spurious Attractors (Hinton)

Two separate minima (purple and green) close to each other (patterns overlap significantly) merge to form a new minimum (red).

63 of 89

64 of 89

Hinton’s Explanation of Unlearning

Listen at 4:40

65 of 89

Hopfield Storage Capacity

Standard wisdom (complicated proof) indicates a maximum pattern storage capacity of 0.14 times the number of neurons.

The actual capacity depends on the patterns stored, which may be less than this.

66 of 89

Hopfield Storage Capacity

Haykin’s book indicates less: that for storage with at most .01 probability of error, asymptotically a Hopfield net can store �at most N/(4 ln N) patterns with N neurons.

N	N/(4 ln N)	.14 * N
100	5	14
1000	36	140
10000	271	1400
100000	2171	14000
1000000	18096	140000

67 of 89

Space Efficiency Considerations

The “hardware” cost for N neurons in a Hopfield net scales as O(N²), since there are N weights per neuron, whereas the storage capacity scales as N. Thus Hopfield nets are not an efficient use of hardware.

A more efficient approach, ignoring the biological aspects, is to use nearest-neighbor, i.e. nearest Hamming distance to a stored pattern. The cost of storing the patterns is proportional to the number of patterns, and the nearest-neighbor computation can be done in log N, sometimes called a Hamming net.

68 of 89

Hamming Net

1 pattern stored in each node

Identifies highest response in O(n) time

69 of 89

Neural “Movies”�(what if the weights aren’t symmetric?)

See Müller, Reinhardt, and Strickland, Neural Networks -- An Introduction, Springer, 1995.

“… we have to study the properties of networks with asymmetric synaptic connections, because periodic activity cannot occur in the presence of thermal equilibrium, toward which all symmetric networks develop.”

70 of 89

Neural “Movies”

“One only needs to modify the Hebb rule in the following manner:�

w_{i j} = Σ p_{k i} * p_{(k+1)(mod n) j} (summation over all patterns)

to get temporal periodicity.”

…

“As a consequence, the network immediately makes a transition into the next pattern.”

Caveat: I haven’t gotten this to work as described.

71 of 89

Possible Application: Neural Network Anatomy in the Hippocampus CA3

Hippocampus plays important roles in the consolidation of information from short-term memory to long-term memory, and in spatial memory that enables navigation.

72 of 89

Neural Network Anatomy in the Hippocampus CA3

https://en.wikipedia.org/wiki/Hippocampus_anatomy

73 of 89

Recurrent Collateral: Neural Network Anatomy in the Hippocampus CA3

74 of 89

Coarse-Grain Model & Activity Levels of the Hippocampus

75 of 89

Diagram of Possible Auto-Associative Model of the Hippocampus

76 of 89

Commercial Successes with Hopfield Nets?

At least one company, Attrasoft� http://attrasoft.com/�claimed to have products based on Hopfield nets.�
(Their products deal with image search and finger print search.)�

77 of 89

Using Nets to Find Solutions to Constraint Satisfaction Problems

Constraints on a solution tell you what you cannot do.

Somehow represent these as inhibitory connections.

78 of 89

Finding Solutions to the TSP �(Traveling Salesperson Problem) using a Hopfield Net

Global minimum is not necessarily found, so it doesn’t truly solve the problem, it just finds a “low cost” solution.�
However, a number of papers have suggested improvements since the original paper.

79 of 89

Traveling Salesperson Problem

The problem is: given a set of n nodes (“cities”) with a specified minimum cost between each pair of nodes, find a permutation (“tour”) of the nodes that minimizes the summed costs between the nodes in the permutation sequence.�
The costs are symmetric, and the general problem does not require that there be any Euclidean relationship among the nodes.�
An instance of the TSP is encoded as a hopfield net with a specific energy function so that:�� minimal cost ↔ minimal energy

80 of 89

TSP Encoding

Represent a given problem as a matrix:

Cities correspond to rows.
Positions on the tour correspond to columns.
Example: 1 2 3 4 5� A 0 0 1 0 0� B 1 0 0 0 0� C 0 0 0 0 1� D 0 1 0 0 0� E 0 0 0 1 0�

means B occurs first on the tour, D occurs second, A third, E fourth, C fifth.

81 of 89

TSP Formulation

Assume {0, 1} values.
The neurons correspond to entries in the matrix (n² neurons for n cities).
Neurons in a row have inhibitory connections from other neurons in same row:

If one neuron is on, then others tend to be off, especially in minimum energy state.

Similarly for neurons in the same column.

82 of 89

TSP Formulation

Need to favor tours that include all n cities, as opposed to just a subset of them.

Need to represent costs between cities as neural weights:

Want to inhibit selection of adjacent cities in proportion to the cost between those cities.
Let X and Y be rows (cities) and i and j be columns (positions in a tour).

83 of 89

TSP Formulation

Using the expression for energy in a Hopfield net Σ Σ w_ijy_iy_j , the corresponding energy is computed to have the form (where A, B, C, D are constants):�� A Σ_XΣ _iΣ_j≠i y_Xiy_{Xj ��}+ B Σ_iΣ_XΣ_Y≠X y_Xiy_{Yi ��}+ C (Σ_XΣ _i y_Xi - n)^2��+ D Σ_i Σ_XΣ_Y≠Xd_XYy_Xi(y_Y,i+1+ y_Y,i-1)
At energy minimum, only the last term, which represents the tour cost, is non-zero.

84 of 89

TSP Formulation

For more details consult

http://genomics.princeton.edu/tank/pdf-publications/neural%20computation%20of%20decisions%20in%20optimization,%20Hopfiedl%20&%20Tank.pdf

85 of 89

TSP Formulation

The authors built a continous electronic version of the neural net and tried it on the TSP.

86 of 89

TSP Formulation

Comparison with an early state-of-the-art algorithm (Lin-Kernighan):

87 of 89

Continuous Hopfield Net Demo (Grayscale)

88 of 89

matlab demohop1 (continuous)

89 of 89

Analogy Between Hopfield Model and Ising Model of Magnetism

http://fuzzy.cs.ovgu.de/ci/nn/v07_hopfield_en.pdf