1 of 19

Improved Addressing in the

Differentiable Neural Computer

Róbert Csordás

Jürgen Schmidhuber

2 of 19

Motivation

  • DNC is very general and can solve many tasks without any modifications:
    • Algorithmic tasks
    • bAbI
    • Few-shot learning
    • Graph processing

Image from https://deepmind.com/blog/differentiable-neural-computers/

3 of 19

Motivation

  • Task-specific methods often have better performance on specific datasets
  • Can we improve DNC's performance while keeping its universality?

Image from https://deepmind.com/blog/differentiable-neural-computers/

4 of 19

How addressing works

3 different addressing method:

  • Content-based lookup
    • Compares every memory cell to a query vector, normalizes the scores to get a distribution
  • Temporal linkage (only for reading)
    • By maintaining a so-called temporal linkage matrix
  • Allocation (only for writing)
    • By usage counters

5 of 19

Content-based lookup

  • We want to recall unknown information based on partial knowledge
  • Comparing by cosine distance

6 of 19

Content-based lookup

Searching for this (known part)

7 of 19

Content-based lookup

To be retrieved (unknown part)

8 of 19

Content-based lookup

  • The problem: No key-value separation
    • The query vector is compared to the full content of the memory cell

9 of 19

Solution: masked content-based lookup

10 of 19

Masked content-based lookup

Advantages:

  • No normalization problem
  • Dynamic key-value separation
  • More general than key-value memories: the network does not have to decide what is the key while storing the information
  • Can be used in any kind of attention mechanism

11 of 19

Masked content-based lookup

12 of 19

Deallocation problem

  • Allocation states are tracked by usage counters
  • Memory allocation chooses the least used address
  • Freeing memory is achieved by decreasing the usage counters of previously read cells
  • Problem:
    • The memory contents do not change
    • Content-based lookup finds the deallocated cells

13 of 19

Deallocation problem

Which is the correct sequence?

  • BDFC?
  • ECA?

deallocated start marker found by content-based lookup

14 of 19

Deallocation problem

Solution is simple: erase the memory content

15 of 19

Link sharpness control

  • Noise from write address distributions is accumulating in the link matrix
  • Forward and backward address distributions might not sum to 1
  • Solution: exponentiation and renormalization:

16 of 19

Link sharpness control

17 of 19

Experimental results - bAbI

18 of 19

Experimental results -bAbI

19 of 19

Thank you for your

attention