1 of 127

Technology-Aware Program Optimization for Emerging Memory Technologies

Pu (Luke) Yi*, Anjiang Wei*, and Sara Achour

2 of 127

We developed a software stack for resistive RAM (ReRAM)

3 of 127

Resistive RAM

RRAM cells stores program data as analog resistances.

Resistance

7.8kΩ

40kΩ

Resistive Memory

4 of 127

Resistive RAM

The analog resistance is configured by applying voltage pulses to the cell.

Resistance

7.8kΩ

40kΩ

Resistive Memory

5 of 127

Resistive RAM

To precisely write a particular resistance, practitioners apply voltage pulses until the resistance is within some write range.

Resistance

7.8kΩ

40kΩ

Resistive Memory

Hardware Read-Write Algorithms

6 of 127

Resistive RAM

The write algorithm determines the schedule of pulses to tune the resistance to be within a certain range.

Resistance

7.8kΩ

40kΩ

Resistive Memory

Hardware Read-Write Algorithms

7 of 127

Resistive RAM

The resistance range for a given write operation is determined by the level allocation scheme.

Resistance

7.8kΩ

40kΩ

Resistive Memory

Hardware Read-Write Algorithms

Level Allocation Schemes

8 of 127

Resistive RAM

The level allocation scheme divides up the resistance range of a ReRAM cell into multiple levels.

Each level store a value – values are typically binary bit strings.��

Example: 4-level allocation that stores bitstrings 00,01,11,10

Level Allocation Schemes

Resistance

7.8kΩ

40kΩ

Resistance Range 3

Binary “01”

Resistance Range 0

Binary “00”

9 of 127

Resistive RAM

Errors occur when the resistance drifts from one level to another. This naturally occurs due to thermal effects, process variation, and relaxation.

The error model for a level allocation is a matrix describing the probability that the memory level changes

Level Allocation Schemes

Resistance

7.8kΩ

40kΩ

Resistance Range 3

Binary “01”

Resistance Range 0

Binary “00”

10 of 127

Resistive RAM

The error model for a level allocation is a matrix that describes the probability that the memory drifts from level i to level j.

Level Allocation Schemes

Resistance

7.8kΩ

40kΩ

Resistance Range 3

Binary “01”

Resistance Range 0

Binary “00”

Level Error Models

11 of 127

Resistive RAM

The error model for a level allocation is a matrix that describes the probability that the memory drifts from level i to level j.

Goal: Co-optimize level allocation scheme and high-level program.

Level Allocation Schemes

Resistance

7.8kΩ

40kΩ

Resistance Range 3

Binary “01”

Resistance Range 0

Binary “00”

Level Error Models

12 of 127

Previously, we developed a software stack for targeting numerical programs.

13 of 127

Application Domain

Last time, we introduced several approximate numerical representations that store approximable bits (e.g., mantissa bits) in MLC ReRAM.

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Numerical Application Domain

And Software Tooling

14 of 127

Application Domain

We architected a program autotuner that figures out the most compact approximate numerical representation for each piece of data in the program.

The autotuned program dynamically checks that the approximate program delivers an end-to-end quality metric - this is validated via simulation over test inputs.

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Numerical Application Domain

And Software Tooling

15 of 127

Application Domain

This approach had several drawbacks…

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Numerical Application Domain

And Software Tooling

16 of 127

Application Domain

Drawback 1: No accuracy guarantees.

Autotuner introduces approximation into programs by dynamically tuning numerical representations on sets of test inputs.

No guarantees the approximate program will generally perform well on thor inputs.

Can we trust the approximate program?

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Numerical Application Domain

And Software Tooling

17 of 127

Application Domain

Drawback 2: Time-Consuming

Autotuner basically performs a search over the numerical representation parameter space.

For each parameter combination, autotuner runs monte-carlo simulation with injected errors to find good parameter combinations.

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Numerical Application Domain

And Software Tooling

18 of 127

Application Domain

Drawback 3: Complex Architecture

Requires hardware support separately storing precise and approximate data. Benefits from specialized logic that packs/unpacks custom numerical data representations

Requires ECC or reliable last-level storage medium for precise data.

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Numerical Application Domain

And Software Tooling

19 of 127

Application Domain

Question: Can we find an application domain that meets the following criteria?

Numerical Application Domain

And Software Tooling

20 of 127

Application Domain

Analysis-Amenable: We want to be able to statically (without execution) provide end-to-end accuracy guarantees, given the level error model for the ReRAM storage array.

Numerical Application Domain

And Software Tooling

Level Allocation Schemes + Level-Based Error Models

21 of 127

Application Domain

Analysis-Amenable: We want to be able to statically (without execution) provide end-to-end accuracy guarantees, given the level error model for the ReRAM storage array.

We want to be able to efficiently derive these accuracy guarantees – no extensive simulation over test inputs!

Numerical Application Domain

And Software Tooling

Level Allocation Schemes + Level-Based Error Models

22 of 127

Application Domain

We want to be able to implement and accelerate computation with a simple ReRAM-based architecture.

Numerical Application Domain

And Software Tooling

Resistive Memory

Hardware Read-Write Algorithms

Read/Write Interface

Level Allocation Schemes + Level-Based Error Models

23 of 127

Application Domain

We want to be able to implement and accelerate computation with a simple ReRAM-based architecture.

1. The computational model should be resilient to memory errors and should require little-to-no precise storage.

2. The computational operators should be amenable to acceleration and/or in-memory computation.

Numerical Application Domain

And Software Tooling

Resistive Memory

Hardware Read-Write Algorithms

Read/Write Interface

Level Allocation Schemes + Level-Based Error Models

24 of 127

Application Domain

Vector symbolic architectures meet these conditions

We want to be able to implement and accelerate computation with a simple ReRAM-based architecture.

1. The computational model should be resilient to memory errors and should require little-to-no precise storage.

2. The computational operators should be amenable to acceleration and/or in-memory computation.

Numerical Application Domain

And Software Tooling

Resistive Memory

Hardware Read-Write Algorithms

Read/Write Interface

Level Allocation Schemes + Level-Based Error Models

25 of 127

Vector Symbolic Architectures

Vector Symbolic Architectures – A “Brain-inspired” computing model from the cognitive science community.

VSA (Vector Symbolic Architecture)

26 of 127

Vector Symbolic Architectures

Vector Symbolic Architectures – A “Brain-inspired” computing model from the cognitive science community.

HD Computation is a special case of a Vector Symbolic Architecture (VSA) that has been explored by the hardware community.

VSA (Vector Symbolic Architecture)

HD Computation

27 of 127

Vector Symbolic Architectures

What can you do with Vector Symbolic Architectures?

VSA (Vector Symbolic Architecture)

HD Computation

?

28 of 127

Vector Symbolic Architectures

What can you do with Vector Symbolic Architectures?

Can perform classification, implement complex data retrieval tasks, and implement content-addressable memory systems.

VSA (Vector Symbolic Architecture)

HD Computation

Workloads:

  • Language classification and other classification tasks.�
  • Complex data retrieval (dollar price of mexico)�
  • Associative Memory / Content Addressable Memory

?

29 of 127

Vector Symbolic Architectures

What can you do with Vector Symbolic Architectures?

Can perform classification, implement complex data retrieval tasks, and implement content-addressable memory systems.

Information encoding/training algorithm can potentially be a “one-shot” procedure, and doesn’t necessarily require iterative tuning.

VSA (Vector Symbolic Architecture)

HD Computation

Workloads:

  • Language classification and other classification tasks.�
  • Complex data retrieval (dollar price of mexico)�
  • Associative Memory / Content Addressable Memory

?

30 of 127

Vector Symbolic Architectures

What can you do with Vector Symbolic Architectures?

Can also implement many classical data structures.

Data structures:

Sets and Lists

Multisets, histograms, frequency lists

Graphs

Binary Trees

Stack

Finite Automata

Key-Value Stores & Databases

VSA (Vector Symbolic Architecture)

HD Computation

?

31 of 127

Vector Symbolic Architectures

Why target this computational model instead of classical computation?

Let’s revisit our application domain wishlist..

  1. Can statically provide accuracy guarantees given a memory error model.�
  2. Resilient to memory errors.�
  3. Amenable to acceleration and/or in-memory computation.

VSA (Vector Symbolic Architecture)

HD Computation

?

32 of 127

Vector Symbolic Architectures

  • Can statically provide accuracy guarantees given a memory error model.�
  • Resilient to memory errors.�
  • Amenable to acceleration and/or in-memory computation.

This computational model is highly error resilient, and can tolerate a large number of memory errors.

VSA (Vector Symbolic Architecture)

HD Computation

33 of 127

Vector Symbolic Architectures

  • Can statically provide accuracy guarantees given a memory error model.�
  • Resilient to memory errors.�
  • Amenable to acceleration and/or in-memory computation.

Computational model uses data representations and computational operators that are amenable to acceleration.

E.g. element-wise XOR, majority

VSA (Vector Symbolic Architecture)

HD Computation

34 of 127

Vector Symbolic Architectures

  • Can statically provide accuracy guarantees given a memory error model.�
  • Resilient to memory errors.�
  • Amenable to acceleration and/or in-memory computation.

Computational model uses data representations and computational operators that are amenable to acceleration.

E.g. element-wise XOR, majority

Some of the computational operators can even be implemented in memory.

E.g. hamming distance

VSA (Vector Symbolic Architecture)

HD Computation

35 of 127

Vector Symbolic Architectures

  • Can statically provide accuracy guarantees given a memory error model.�
  • Resilient to memory errors.�
  • Amenable to acceleration and/or in-memory computation.

One criteria left…

Are vector symbolic architectures amenable to analysis? Can we statically derive accuracy guarantees for a given VSA program?

VSA (Vector Symbolic Architecture)

HD Computation

?

36 of 127

Vector Symbolic Architectures

  • Can statically provide accuracy guarantees given a memory error model.�
  • Resilient to memory errors.�
  • Amenable to acceleration and/or in-memory computation.

Yes! In this talk, we will present an approach for statically obtaining accuracy guarantees for a VSA program.

We will also showcase what kinds of program optimizations and analyses we can do with this technique.

VSA (Vector Symbolic Architecture)

HD Computation

37 of 127

Vector Symbolic Architectures

  • Can statically provide accuracy guarantees given a memory error model.�
  • Resilient to memory errors.�
  • Amenable to acceleration and/or in-memory computation.

First, though we need to introduce the VSA computational model.

VSA (Vector Symbolic Architecture)

HD Computation

38 of 127

Vector Symbolic Architectures

39 of 127

What are Vector Symbolic Architectures?

“Brain-inspired” computing model from the cognitive science community.

Data: Numerical Vectors “hypervectors”

VSA (Vector Symbolic Architecture)

40 of 127

What are Vector Symbolic Architectures?

VSAs compute over hypervectors. A hypervector is an N-dimensional vector of binary, integer, or real values.

Introduction to Hypervectors

[0,1,1,1,0,.........,1,1,0]

[0,3,2,5,4,.........,1,4,0]

Binary Hypervector

Radix-6 Hypervector

41 of 127

What are Vector Symbolic Architectures?

In VSAs, we are primarily interested in the distance between hypervectors.

Small distance between hypervectors �Similar

Large distance between hypervectors� Different or dissimilar

Introduction to Hypervectors

[0,1,0,1,0,.........,1,1,0]

Hypervector 1

[0,1,1,1,0,.........,1,1,0]

Hypervector 2

Hamming Distance

(X, Y)〉 = 1/N ∑ XOR(Xi,Yi)

42 of 127

What are Vector Symbolic Architectures?

Hypervectors are initially randomly generated and don’t encode any specific information.

Introduction to Hypervectors

[0,1,1,1,0,.........,1,1,0]

Binary Hypervector

The bits in the hypervector have no meaning

Random bit

43 of 127

What are Vector Symbolic Architectures?

These random hypervectors are dissimilar with one another and represent distinct concepts in the computational problem.

Example: we use a randomly generated hypervector for each letter in the alphabet. Each letter is distinct.

Introduction to Hypervectors

Random HV1

The Letter “D”

The Letter “A”

Random HV2

D is dissimilar to A

44 of 127

What are Vector Symbolic Architectures?

Codebook: Collection of distinct hypervectors that map to distinct concepts in the computation.

The VSA computation uses the hypervectors in the codebook to build data structures.

Introduction to Hypervectors

Random HV1

The Letter “D”

The Letter “A”

Random HV2

D is dissimilar to A

Codebook

45 of 127

What are Vector Symbolic Architectures?

Codebook: Collection of distinct hypervectors that map to distinct concepts in the computation.

The VSA computation uses the hypervectors in the codebook to build data structures.

Introduction to Hypervectors

Example Codebooks�

Letters in alphabet (M=26+1) �Codebook that can be used to create text.

Base-10 Digits (M=10)�Codebook that can be used to create numerical values.

Vertices (M=100)�Codebook that can be used to create graphs with 100 vertices.

Countries (M=195) �Codebook that can be used to encode geographical information.

46 of 127

What are Vector Symbolic Architectures?

Information is then embedded into hypervectors by performing computational operations over hypervectors.

Operators: Permutation (p), Bundling (+), Binding (*), and Distance Operators (dist).

Introduction to Hypervectors

Compute

codebook: A,B,C

Z = A+C

47 of 127

What are Vector Symbolic Architectures?

Information is then embedded into hypervectors by performing computational operations over hypervectors.

Operators: Permutation (p), Bundling (+), Binding (*), and Distance Operators (dist).

Example: The the computed hypervector Z is similar to the codebook hypervectors A and B.

Introduction to Hypervectors

Compute

codebook: A,B,C

Z = A+C

X and Y are random, but the hypervector Z is related to X and Y

Z similar to A

Z similar to C

Z dissimilar to B

B dissimilar to A

C dissimilar to B

48 of 127

What are Vector Symbolic Architectures?

Each vector symbolic operator can be thought of as an operator that builds a data structure.

Introduction to Hypervectors

Compute

codebook: A,B,C

Z = A+C

X and Y are random, but the hypervector Z is related to X and Y

Z similar to A

Z similar to C

Z dissimilar to B

B dissimilar to A

C dissimilar to B

49 of 127

What are Vector Symbolic Architectures?

VSA bundling operator (+) constructs a set of elements.

Bundling operators produce hypervectors that are similar to all of the input hypervectors.

Introduction to VSA Operators

codebook: A,B,C

Z = A+C

Z encodes the set {A,C}

50 of 127

What are Vector Symbolic Architectures?

After constructing a set, we can query the that an element hypervector (A) belongs to the set hypervector (Z) by computing the distance between the element and the set.

Introduction to VSA Operators

codebook: A,B,C

Z = A+C

Z encodes the set {A,C}

dist(A,Z) < thresh?

A {A, C}?

dist(A,Z) < thresh?

B {A, C}?

51 of 127

What are Vector Symbolic Architectures?

After constructing a set, we can query the that an element hypervector (A) belongs to the set hypervector (Z) by computing the distance between the element and the set.

Hypervector elements that are in the set will have a distance from the set vector that is below a certain threshold.

Introduction to VSA Operators

codebook: A,B,C

Z = A+C

Z encodes the set {A,C}

dist(A,Z) < thresh?

A {A, C}?

dist(A,Z) < thresh?

B {A, C}?

52 of 127

What are Vector Symbolic Architectures?

VSA binding operator (*) constructs a tuple of elements.

Binding operators produce hypervectors that are dissimilar to all of the input hypervectors.

Introduction to VSA Operators

codebook: A,B,C

Z = A*C

Z encodes the tuple <A,C>

53 of 127

What are Vector Symbolic Architectures?

VSA permutation operator (p) encodes positional information into an element.

The permutation operator is an invertable operator that deterministically transforms the hypervector.

Introduction to VSA Operators

codebook: A,B,C

Z = A+p(A)

Z encodes the sequence [A,A]

Sequence [A,A] can also be thought of as the set {A,p(A)}

54 of 127

What are Vector Symbolic Architectures?

So far, we conceptually described hypervectors, distance metrics, and the permutation (p), bundling (+), binding (*), and distance operators (dist).

Z = [C+p(A)]*C+C

{<[C,A],C>, C}

VSA Computation

55 of 127

What are Vector Symbolic Architectures?

So far, we conceptually described hypervectors, distance metrics, and the permutation (p), bundling (+), binding (*), and distance operators (dist).

There are many candidate VSA implementations.

Left: The HD computation implementation of vector symbolic architecture.

Z = [C+p(A)]*C+C

{<[C,A],C>, C}

VSA Computation

HD Computation Implementation

Hypervectors: Binary Vectors

Distance (dist): Hamming Distance

Bundling(+): Majority Vote

Binding(*): XOR

Permutation(p): Cyclic Rotation

56 of 127

Benefits of Vector Symbolic Architectures

57 of 127

Why use VSA Computation?

Resilient to errors! All bits are equally important.

For a large hypervector, it will take many corruptions to change the distance value.

[0,1,1,1,0,.........,1,1,0]

hypervector:

Random bit A

[0,1,1,1,0,.........,0,1,0]

feature vector

Is the word a verb

Random bit B

The word is a proper noun

58 of 127

Why use VSA Computation?

Lots of hypervector representations. Can potentially store data in real-valued and arbitrary multi-level memories.

Nice fit for MLC / analog emerging memory technologies.

[0,3,2,5,4,.........,1,4,0]

Radix-6 Hypervector

0

1

5

4

2

3

Dissimilar (distance=3)

Similar (distance =1)

59 of 127

Why use VSA Computation?

Amenable to analysis! Can potentially reason about error through permutation, bundling, and binding operators.

How do people currently reason about error in VSAs?

Not in Set

In Set

Hamming Distance

Probability

60 of 127

Why develop analyses?

(what do people do now)

61 of 127

Element is in a set?

Goal: we want to identify if an elements A and D are in a hypervector HV that was constructed by bundling A, B, and C

Codebook contains random hypervectors for A, B, C, and D

Codebook = A, B, C, D

Hypervector HV = A+B+C

HV represents the set {A,B,C}

Question 1: Is A ∈ {A, B, C}?

Question 2: Is D ∈ {A, B, C}?

62 of 127

Element is in a set?

Problem: We need to set a threshold distance for distinguishing between elements in a set, and elements not in a set.

We want to be able to correctly differentiate between elements in the set, and elements not in the set.

Codebook = A, B, C, D

Hypervector HV = A+B+C

HV represents the set {A,B,C}

Question 1: Is A ∈ {A, B, C}?

dist(A,HV) < threshold

Question 2: Is D ∈ {A, B, C}?

dist(D,HV) < threshold

63 of 127

Element is in a set?

Problem: We need to set a threshold distance for distinguishing between elements in a set, and elements not in a set.

“Recall accuracy” - fraction of the time we correctly tell when items are in / not in a set.

Codebook = A, B, C, D

Hypervector HV = A+B+C

HV represents the set {A,B,C}

Question 1: Is A ∈ {A, B, C}?

dist(A,HV) < threshold

Question 2: Is D ∈ {A, B, C}?

dist(D,HV) < threshold

64 of 127

Element is in a set?

Problem: We need to set a threshold distance for distinguishing between elements in a set, and elements not in a set.

How do we find a good threshold?

Codebook = A, B, C, D

Hypervector HV = A+B+C

HV represents the set {A,B,C}

Question 1: Is A ∈ {A, B, C}?

dist(A,HV) < threshold

Question 2: Is D ∈ {A, B, C}?

dist(D,HV) < threshold

65 of 127

Element is in a set?

Problem: We need to set a threshold distance for distinguishing between elements in a set, and elements not in a set.

Approach 1: Empirically set threshold by doing a lot of simulations.

Con: Time consuming + requires test inputs. how do we know we have selected the best threshold?

How do we find a good threshold?

Codebook = A, B, C, D

Hypervector HV = A+B+C

HV represents the set {A,B,C}

Question 1: Is A ∈ {A, B, C}?

dist(A,HV) < threshold

Question 2: Is D ∈ {A, B, C}?

dist(D,HV) < threshold

66 of 127

Element is in a set?

Problem: We need to set a threshold distance for distinguishing between elements in a set, and elements not in a set.

Approach 2: Maximize the chances of finding a good threshold by using large (N=10,000) hypervectors.

Con: May unnecessarily use space.

How do we find a good threshold?

Codebook = A, B, C, D

Hypervector HV = A+B+C

HV represents the set {A,B,C}

Question 1: Is A ∈ {A, B, C}?

dist(A,HV) < threshold

Question 2: Is D ∈ {A, B, C}?

dist(D,HV) < threshold

67 of 127

Element is in a set?

Problem: We need to set a threshold distance for distinguishing between elements in a set, and elements not in a set.

Approach 3: Maximize the chances of finding a good threshold by executing multiple instantiations of the HD computation with different codebooks.

Con: May unnecessarily use space.

How do we find a good threshold?

Codebook = A, B, C, D

Hypervector HV = A+B+C

HV represents the set {A,B,C}

Question 1: Is A ∈ {A, B, C}?

dist(A,HV) < threshold

Question 2: Is D ∈ {A, B, C}?

dist(D,HV) < threshold

68 of 127

Element is in a set?

Problem: We need to set a threshold distance for distinguishing between elements in a set, and elements not in a set.

Approach 4: Winner take all – don’t set a threshold, find the lowest distance hypervector.

Con: Restrictive (works mostly in associative memory cases), cannot determine if an element isn’t contained in any hypervectors.

How do we find a good threshold?

Codebook = A, B, C, D

Hypervector HV = A+B+C

HV represents the set {A,B,C}

Question 1: Is A ∈ {A, B, C}?

dist(A,HV) < threshold

Question 2: Is D ∈ {A, B, C}?

dist(D,HV) < threshold

69 of 127

Summary

Basically, standard practices improve computational robustness with empirical approaches.

As we discussed earlier, empirical methods have a lot of drawbacks…

Cons:

  • No accuracy guarantees. Difficult to “trust” approximate computation.�
  • Resource-intensive; requires extensive simulation and storage.

70 of 127

Can we more intelligently optimize HD computations?

71 of 127

Recall in Vector Symbolic Architectures

Codebook = A, B, C, D

Hypervector HV = A+B+C

HV represents the set {A,B,C}

Question 1: Is A ∈ {A, B, C}?

dist(A,HV) < threshold

Question 2: Is D ∈ {A, B, C}?

dist(D,HV) < threshold

Hamming Distance

dist(D,HV)

dist(A,HV)

Threshold

When picking our threshold in simulation, we considered one random instantiation of the codebook…

72 of 127

Recall in Vector Symbolic Architectures

Codebook = A, B, C, D

Hypervector HV = A+B+C

HV represents the set {A,B,C}

Question 1: Is A ∈ {A, B, C}?

dist(A,HV) < threshold

Question 2: Is D ∈ {A, B, C}?

dist(D,HV) < threshold

Hamming Distance

dist(D,HV)

dist(A,HV)

Threshold

It could be that if we try more random instantiations of the code book, the threshold doesn’t work so well.

Errors

73 of 127

Recall in Vector Symbolic Architectures

Alternate approach: we model the distribution of distances across random codebook instantiation. We model distributions for the distances of elements in the set, and elements not in the set.

In Set

Probability

Hamming Distance

dist(D,HV)

dist(A,HV)

74 of 127

Recall in Vector Symbolic Architectures

We then set the threshold that minimizes misclassification error.

In Set

Probability

Hamming Distance

dist(D,HV)

dist(A,HV)

Threshold

False Negative

False Positive

False Positive: element is reported as in set, when it’s not in a set.

False Negative: element is reported as not in the set, when it’s in a set.

75 of 127

Recall in Vector Symbolic Architectures

Symbolic normal distributions over number of elements in bundled hyper vector (m), and dimension of the hypervector (d)

In Set

Probability

Hamming Distance

dist(D,HV)

dist(A,HV)

76 of 127

Recall in Vector Symbolic Architectures

Analytically set threshold to minimize overlap between parametrized distributions.

In Set

Probability

Hamming Distance

dist(D,HV)

dist(A,HV)

Threshold

77 of 127

Recall in Vector Symbolic Architectures

In-set recall accuracy: area under the distribution to the left of the threshold.

In Set

Probability

Hamming Distance

dist(D,HV)

dist(A,HV)

Threshold

78 of 127

Recall in Vector Symbolic Architectures

Okay, so what can you do with this?

Statically compute the recall accuracy for a given n,m and threshold without running simulations. This threshold will work well across a range of codebooks.

In Set

Probability

Hamming Distance

dist(D,HV)

dist(A,HV)

Threshold

79 of 127

Recall in Vector Symbolic Architectures

Okay, so what can you do with this?

Automatically find a good threshold by studying the distribution. Having a threshold will enable the identification of elements that are in a set / not in the set.

In Set

Probability

Hamming Distance

dist(D,HV)

dist(A,HV)

Threshold

80 of 127

Recall in Vector Symbolic Architectures

Okay, so what can you do with this?

Given the number of elements in the hypervector set m and a recall accuracy x, find the smallest hypervector size (n) that gives you the recall accuracy x.

In Set

Probability

Hamming Distance

dist(D,HV)

dist(A,HV)

Threshold

81 of 127

Recall in Vector Symbolic Architectures

Okay, so what can you do with this?

Given a hypervector size (n) and a recall accuracy x, find the largest set you can represent with that hypervector.

In Set

Probability

Hamming Distance

dist(D,HV)

dist(A,HV)

Threshold

82 of 127

What else can you do with this model?

The model gives you probabilistic guarantees:�

  • Can be used as a basis for making principled optimizations. �e.g., reducing vector dimension for application, discarding part of a vector
  • Can be extended to include memory error models and be used to inform hardware designs. �e.g., figure out accuracy requirements of multi-level emerging memory, choose optimal level allocation for a given problem.
  • Guarantees – confidence the computation will, on average, deliver the desired results. No need for automatic tuning / other ad-hoc methods of setting parameters + thresholds.

Probabilistic model is computationally efficient, easy to work with.

83 of 127

Incidentals

For the remainder of the talk, we consider the following VSA implementation.

Hypervector Model: Randomly generated 0-1 vectors

Computational Operators

  • Bundling (majority)
  • Binding (XOR)
  • Distance (Hamming)
  • Permutation (Cyclic Shift)

84 of 127

Incidentals

For the remainder of the talk, we consider the following VSA implementation.

Hypervector Model: Randomly generated 0-1 vectors

Computational Operators

  • Bundling (majority)
  • Binding (XOR)
  • Distance (Hamming)
  • Permutation (Cyclic Shift)

Analytical Models: We developed the following analytical models for VSAs:

  • Set recall: recall element from a set.�
  • Multiset recall: recall element from multiset (set with duplicate elements)�
  • Set-Tuple recall: recall a tuple from a cartesian product of sets.�

Future Models:

  • Subset of set recall: recall if subset appears in set.

85 of 127

Model: Cartesian Product Of Sets

Problem: Given two hypervector sets s1 and s2, a s1 and b s2

What is the recall accuracy of <a,b> s1 x s2?

Some of these models get quite complicated..

86 of 127

Case Study 1: Dollar Price of Mexico with 99% accuracy

87 of 127

Dollar Price of Mexico Problem

Analogical query: given the following facts, what is the dollar price of mexico?�

  • Currency of USA is Dollar
  • Currency of Mexico is Peso

With VSAs, we can elegantly encode the problem of finding the equivalent to the dollar for mexico.

The answer: the Peso!

88 of 127

Dollar Price of Mexico Problem

How big of a hypervector do we need to return “Peso” 99% recall accuracy?

With analytical model: we deduce we need a hypervector with d=334 and we also obtain the threshold thresh=0.437

to use for the comparison.

89 of 127

Dollar Price of Mexico Problem

How big of a hypervector do we need to return “Peso” 99% recall accuracy?

With analytical model: we deduce we need a hypervector with d=334 and we also obtain the threshold thresh=0.437

to use for the comparison.

Significantly smaller than d=10,000!

90 of 127

Case Study 2: Testing if an element is in a set

91 of 127

How long should a vector

For a given recall of 0.99, maximum # of supported elements m grows approximately linearly with dimension of hypervector.

Implications of this analysis:

  • Storing a set in two hypervectors is the same as storing the set in a single hypervector that is 2x the size.�
  • Program optimizations that partition sets into multiple hypervectors will perform the same as optimizations that store data as a single, large hypervector.

92 of 127

Case Study 2: HD computation of finite automata with 99.7% accuracy

93 of 127

Finite Automata with HD Vectors

HD representation of a finite automata maintains a set of states the automata may enter. What should the size of this hypervector be if we want a 99% string matching accuracy?

94 of 127

Finite Automata with HD Vectors

To obtain a 99% string matching accuracy, we analytically determine we need a 99.7% recall accuracy.

95 of 127

Finite Automata with HD Vectors

HD representation of a finite automata maintains a set of states the automata may enter. What hypervector dimension + threshold do we need to obtain a 99.7% recall accuracy?

96 of 127

Finite Automata with HD Vectors

HD representation of a finite automata maintains a set of states the automata may enter. What hypervector dimension + threshold do we need to obtain a 99.7% recall accuracy?

Initially, we need a hypervector with a d=2126.

We also derive a threshold of 0.471

97 of 127

Finite Automata with HD Vectors

Initially, we need a hypervector with a d=2126.

After one transition, we need a hypervector with d=820.

We also derive a threshold of 0.453

HD representation of a finite automata maintains a set of states the automata may enter. What hypervector dimension + threshold do we need to obtain a 99.7% recall accuracy?

98 of 127

Finite Automata with HD Vectors

Initially, we need a hypervector with a d=2126.

After one transition, we need a hypervector with d=820.

After two transitions, we need a hypervector with d=193.

We also derive a threshold of 0.404

HD representation of a finite automata maintains a set of states the automata may enter. What hypervector dimension + threshold do we need to obtain a 99.7% recall accuracy?

99 of 127

Finite Automata with HD Vectors

Initially, we need a hypervector with a d=2126.

After one transition, we need a hypervector with d=820.

After two transitions, we need a hypervector with d=193.

We can discard/partially compute over the state hypervector as the computation progresses!

HD representation of a finite automata maintains a set of states the automata may enter. What hypervector dimension + threshold do we need to obtain a 99.7% recall accuracy?

100 of 127

Conclusions

Our recall accuracy model enables practitioners to analytically compute the recall accuracy for a given VSA computation. (subject to constraints over the implementation)

101 of 127

Conclusions

Our recall accuracy model enables practitioners to analytically compute the recall accuracy for a given VSA computation. (subject to constraints over the implementation)

With this model we can intelligently parametrize the VSA computation:

102 of 127

Conclusions

Our recall accuracy model enables practitioners to analytically compute the recall accuracy for a given VSA computation. (subject to constraints over the implementation)

With this model we can intelligently parametrize the VSA computation:

  • We can drastically reduce the memory footprint of VSA computations without compromising the accuracy of the recall operations.�
  • We can automatically derive in-set/not-in-set thresholds from our models. These thresholds are usually empirically tuned by performing monte-carlo simulations.

103 of 127

Conclusions

Our recall accuracy model enables practitioners to analytically compute the recall accuracy for a given VSA computation. (subject to constraints over the implementation)

With this model we can craft new program optimizations:

  • We can devise new partial computation and data discarding optimizations using this model. These optimizations reduce memory/compute without sacrificing accuracy.�
  • We can study whether certain optimizations are worthwhile (e.g., vector partitioning) using our model.

104 of 127

These are some initial steps to identify if this direction is worth exploring.

105 of 127

These are some initial steps to identify if this direction is worth exploring.

What’s the goal here?

106 of 127

Goal: develop a memory-aware VSA computation compiler – optimize VSA computations for accelerators that use emerging memory technologies.

Memory-Aware Vector Symbolic Architecture Compiler

Level Error Models and Memory Array Architecture (#rows,#cols)

Vector Symbolic Architecture Computation

Recall Accuracy

107 of 127

Accept as input a VSA computation (uses binding, bundling, and permute operators), a target recall accuracy, and an error model of the emerging memory architecture.

Memory-Aware Vector Symbolic Architecture Compiler

Level Error Models and Memory Array Architecture (#rows,#cols)

Vector Symbolic Architecture Computation

Recall Accuracy

108 of 127

Statically (at compile-time) select VSA operator implementations, a hypervector representation (e.g., binary/k-radix) + dimension, and a level allocation that (1) meets the target recall accuracy (2) maximizes performance / minimizes memory utilization.

Optimization-Aware Vector Symbolic Architecture Compiler

Level Error Models and Memory Array Architecture (#rows,#cols)

Vector Symbolic Architecture Computation

VSA Implementation

+,*,dist operator implementations

Hypervector Representation

+,*,dist operator implementations

Recall Accuracy

Level Allocation

109 of 127

Automatically insert compute- and memory-optimizing program transformations (e.g., data discard) when appropriate, while still delivering desired recall accuracy.

Optimization-Aware Vector Symbolic Architecture Compiler

Level Error Models and Memory Array Architecture (#rows,#cols)

Vector Symbolic Architecture Computation

VSA Implementation

+,*,dist operator implementations

Hypervector Representation

+,*,dist operator implementations

Recall Accuracy

Level Allocation

110 of 127

What’s the broader vision?

111 of 127

Broad Vision

With this system, we will be able to effectively map VSA computations to approximate accelerators while still providing accuracy guarantees.

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Optimization-Aware Vector Symbolic Architecture Compiler

112 of 127

Broad Vision

With this system, we will be able to effectively map VSA computations to approximate accelerators while still providing accuracy guarantees.

  1. Enables full evaluation of the potential of the VSA computational model.

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Optimization-Aware Vector Symbolic Architecture Compiler

113 of 127

Broad Vision

With this system, we will be able to effectively map VSA computations to approximate accelerators while still providing accuracy guarantees.

2. Can productively target accelerators that

make use of promising, but unreliable emerging technologies.

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Optimization-Aware Vector Symbolic Architecture Compiler

114 of 127

Broad Vision

With this system, we will be able to effectively map VSA computations to approximate accelerators while still providing accuracy guarantees.

3. Can inspire new hardware optimization algorithms for VSA accelerators.

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Optimization-Aware Vector Symbolic Architecture Compiler

115 of 127

A Quick Update on the Level Allocation Algorithm from Last Time

116 of 127

Software Stack

Last time, we implemented a new level allocation algorithm, QAlloc.

Given K (# of levels) and hardware characterization data, QAlloc finds the lowest-error level allocation scheme for the storage medium.

Resistive Memory

Hardware Characterizion

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

Read/Write Interface

Numerical Application Domain

And Software Tooling

117 of 127

Software Stack

Level allocation algorithm produces level allocation schemes by analyzing hardware characterization data.

Hardware characterization data captures resistance drift, process variation-induced resistance variations, and other effects.

Resistive Memory

Hardware Characterization Data

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

118 of 127

Level Allocation

Our level allocation approach confers two key benefits over existing level allocation approaches.

Resistive Memory

Hardware Characterization Data

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

119 of 127

Level Allocation

Distribution-Agnostic: Doesn’t assume any particular distribution. Works with non-normally distributed analog resistance errors.

Resistive Memory

Hardware Characterization Data

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

120 of 127

Level Allocation

Distribution-Agnostic: Doesn’t assume any particular distribution. Works with non-normally distributed analog resistance errors.

In contrast, existing techniques typically fit a distribution to data and then work with distribution parameters (e.g., variance)

Resistive Memory

Hardware Characterization Data

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

121 of 127

Level Allocation

We employ a search algorithm that intelligently searches over a larger search space of level allocations than existing techniques.

Resistive Memory

Hardware Characterization Data

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

122 of 127

Level Allocation

Last year, we obtained preliminary results that showed our level allocation approach produces lower-error level allocations than competing approaches.

SOTA (SBA): 11.25%

8-Level Allocation results

Average Level Drift Probability

Our Approach: 6.24%

123 of 127

Level Allocation

Last year, we obtained preliminary results that showed our level allocation approach produces lower-error level allocations than competing approaches.

This year, we performed an ablation study to understand why.

SOTA (SBA): 11.25%

8-Level Allocation results

Average Level Drift Probability

Our Approach: 6.24%

N(0,sigma): 8.74% �(Same Distribution as SBA)

Ablation study

N(mu,sigma): 10.35%

Distribution-Agnostic: 6.24%

124 of 127

Level Allocation

Our allocation algorithm searches over a larger space of level allocations.

As a result, it is sometimes able to find better level allocations than SBA, even when working with the same analog resistance model as SBA.

Conclusion 1

SOTA (SBA): 11.25%

8-Level Allocation results

Average Level Drift Probability

Our Approach: 6.24%

N(0,sigma): 8.74% �(Same Distribution as SBA)

Ablation study

N(mu,sigma): 10.35%

Distribution-Agnostic: 6.24%

125 of 127

Level Allocation

Our level allocation algorithm’s ability to model non-normal distributions is critical to finding better performing allocations.

We find this is true for all the level allocations it produces.

Conclusion 2

SOTA (SBA): 11.25%

8-Level Allocation results

Average Level Drift Probability

Our Approach: 6.24%

N(0,sigma): 8.74% �(Same Distribution as SBA)

Ablation study

N(mu,sigma): 10.35%

Distribution-Agnostic: 6.24%

126 of 127

Level Allocation

Summary: We have developed a distribution-agnostic level allocation technique that works directly with characterization data.

Resistive Memory

Hardware Characterization Data

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation

127 of 127

Level Allocation

This level allocation algorithm is generic, and can be applied (or adapted to) to any resistance-based memory device that supports MLC storage.

Resistive Memory

Hardware Characterization Data

Hardware Read-Write Algorithms

Level Allocation Schemes + Level-Based Error Models

Level Allocation