1 of 36

Systolic Arrays & Their Applications

DR.SANGEETA ARORA

(HOD, P.G DEPT OF COMPUTER SCIENCE & IT)

2 of 36

Overview

  • What it is
  • N-body problem
  • Matrix multiplication
  • Cannon’s method
  • Other applications
  • Conclusion

3 of 36

What Is a Systolic Array?

A systolic array is an arrangement of processors in an array

where data flows synchronously across the array between

neighbors, usually with different data flowing in different directions

Each processor at each step takes in data from one or more

neighbors (e.g. North and West), processes it and, in the next

step, outputs results in the opposite direction (South and East).

H. T. Kung and Charles Leiserson were the first to publish

a paper on systolic arrays in 1978, and coined the name.

4 of 36

What Is a Systolic Array?

  • A specialized form of parallel computing.
  • Multiple processors connected by short wires.
  • Unlike many forms of parallelism which lose speed through their connection.
  • Cells(processors), compute data and store it independently of each other.

5 of 36

Systolic Unit(cell)

  • Each unit is an independent processor.
  • Every processor has some registers and an ALU.
  • The cells share information with their neighbors, after performing the needed operations on the data.

6 of 36

Some simple examples of systolic array models.

7 of 36

N-body Problem

  • Conventional method: N2
  • Systolic method: N
  • We will need one processing element for each body, lets call them planets.

8 of 36

B1

B2

B3

B6

B4

B5

Six planets in orbit with different masses, and different forces

acting on each other, so are systolic array will look like this.

9 of 36

Each processor will hold the newtonian force formula:

Fij = kmi mj /d2ij , k being the gravitational constant.

Then load the array with values.

B1

B2

B3

B4

B5

B6

Now that the array has the mass and the coordinates of

Each body we can begin are computation.

10 of 36

B1m

B1c

B2m

B2c

B5m

B5c

B4m

B4c

B6m

B6c

B3m

B3c

B6, B5, B4, B3, B2, B1

Fij = kmimj/d2ij

11 of 36

B1m

B1c

B2m

B2c

B5m

B5c

B4m

B4c

B6*B1

B3m

B3c

B6, B5, B4, B3, B2

Fij = kmimj/d2ij

12 of 36

B1m

B1c

B2m

B2c

B5*B1

B4m

B4c

B6*B2

B3m

B3c

B6, B5, B4, B3

Fij = kmimj/d2ij

13 of 36

B1m

B1c

B2m

B2c

B5*B2

B4*B1

B6*B3

B3m

B3c

B6, B5, B4

Fij = kmimj/d2ij

14 of 36

B1m

B1c

B2m

B2c

B5*B3

B4*B2

B6*B4

B3*B1

B6, B5

Fij = kmimj/d2ij

15 of 36

B1m

B1c

B2*B1

B5*B4

B4*B3

B6*B5

B3*B2

B6

Fij = kmimj/d2ij

16 of 36

Done

Done

Done

Done

Done

Done

Fij = kmimj/d2ij

17 of 36

Matrix Multiplication

a11 a12 a13

a21 a22 a23

a31 a32 a33

*

b11 b12 b13

b21 b22 b23

b31 b32 b33

=

c11 c12 c13

c21 c22 c23

c31 c32 c33

Conventional Method: N3

For I = 1 to N

For J = 1 to N

For K = 1 to N

C[I,J] = C[I,J] + A[J,K] * B[K,J];

18 of 36

Systolic Method

This will run in O(n) time!

To run in N time we need N x N processing units, in this case

we need 9.

P9

P8

P7

P6

P5

P4

P1

P2

P3

19 of 36

We need to modify the input data, like so:

a13 a12 a11

a23 a22 a21

a33 a32 a31

b31 b32 b33

b21 b22 b23

b11 b12 b13

Flip columns 1 & 3

Flip rows 1 & 3

and finally stagger the data sets for input.

20 of 36

P9

P8

P7

P6

P5

P4

P1

P2

P3

a13 a12 a11

a23 a22 a21

a33 a32 a31

b31

b21

b11

b32

b22

b12

b33

b23

b13

At every tick of the global system clock data is passed to each

processor from two different directions, then it is multiplied

and the result is saved in a register.

21 of 36

3 4 2

2 5 3

3 2 5

*

=

3 4 2

2 5 3

3 2 5

23 36 28

25 39 34

28 32 37

Lets try this using a systolic array.

P9

P8

P7

P6

P5

P4

P1

P2

P3

2 4 3

3 5 2

3

2

3

5 2 3

5

3

2

2

5

4

22 of 36

3*3

2 4

3 5 2

3

2

5 2 3

5

3

2

2

5

4

Clock tick: 1

9

0

0

0

0

0

0

0

0

P1

P2

P3

P4

P6

P5

P7

P8

P9

23 of 36

2*3

4*2

3*4

2

3 5

3

5 2 3

5

3

2

2

5

Clock tick: 2

17

12

0

6

0

0

0

0

0

P1

P2

P3

P4

P6

P5

P7

P8

P9

24 of 36

3*3

2*4

5*2

2*3

4*5

3*2

3

5 2

5

3

2

Clock tick: 3

23

32

6

16

8

0

9

0

0

P1

P2

P3

P4

P6

P5

P7

P8

P9

25 of 36

3*4

2*2

2*2

5*5

3*3

2*2

4*3

5

5

Clock tick: 4

23

36

18

25

33

4

13

12

0

P1

P2

P3

P4

P6

P5

P7

P8

P9

26 of 36

3*2

5*2

5*3

5*3

3*2

2*5

Clock tick: 5

23

36

28

25

39

19

28

22

6

P1

P2

P3

P4

P6

P5

P7

P8

P9

27 of 36

2*3

5*2

3*5

Clock tick: 6

23

36

28

25

39

34

28

32

12

P1

P2

P3

P4

P6

P5

P7

P8

P9

28 of 36

5*5

Clock tick: 7

23

36

28

25

39

34

28

32

37

P1

P2

P3

P4

P6

P5

P7

P8

P9

29 of 36

37

32

28

34

39

25

23

36

28

23

36

28

25

39

34

28

32

37

P1

P2

P3

P4

P6

P5

P7

P8

P9

Same answer! In 2n + 1 time, can we do better?

The answer is yes, there is an optimization.

30 of 36

Cannons Technique

  • No processor is idle.
  • Instead of feeding the data, it is cycled.
  • Data is staggered, but slightly different than before.
  • Not including the loading which can be done in parallel for a time of 1, this technique is effectively N.

31 of 36

Other Applications

  • Signal processing
  • Image processing
  • Solutions of differential equations
  • Graph algorithms
  • Biological sequence comparison
  • Other computationally intensive tasks

32 of 36

Samba: Systolic Accelerator for Molecular Biological Applications

This systolic array contains 128 processors shared into 32 full custom

VLSI chips. One chip houses 4 processors, and one processor performs

10 millions matrix cells per second.

33 of 36

Why Systolic?

  • Extremely fast.
  • Easily scalable architecture.
  • Can do many tasks single processor machines cannot attain.
  • Turns some exponential problems into linear or polynomial time.

34 of 36

Why Not Systolic?

  • Expensive.
  • Not needed on most applications, they are a highly specialized processor type.
  • Difficult to implement and build.

35 of 36

Summary

  • What a systolic array is.
  • Step through of matrix multiplication.
  • Cannon’s optimization.
  • Other applications including samba.
  • Positives and negatives of systolic arrays.

36 of 36

THANK YOU