1 of 21

Vector and Array Processing

1

krraju.in

2 of 21

  • What is Vector Processing? How is it different from scalar processing?
  • What are SIMD computers?
  • Vector Processors
  • Array Processors
  • Difference between Vector and Array processors

2

Parallel Processing, Pipelining, Arithmetic and Instruction Pipelines, RISK Pipeline, Vector Processing, Array Processors, Multiprocessors, Interconnection structures, Cache coherence.

Unit-5

What you’ll learn

krraju.in

3 of 21

SIMD Computers

Parallel computers appear as either SIMD or MIMD configurations.

  • SIMD computers appeal more to special purpose applications.
    • Includes both array and vector processors

Advantages:

  • Simplicity of concept and programming
  • Regularity of structure
  • Easy scalability of size and performance
  • Straightforward applicability in a number of fields which demand parallelism to achieve necessary performance

3

krraju.in

4 of 21

Vector Processing Application Areas

  • Long-range weather forecasting
  • Aerodynamics and space flight simulations
  • Artificial Intelligence and Expert systems
  • Mapping the human genome
  • Petroleum explorations
  • Seismic data analysis
  • Medical diagnosis
  • Image processing

4

krraju.in

5 of 21

Vector

  • A vector is a one-dimensional array of numbers.
  • Many scientific/commercial programs use vectors

Scalar (1 operation)

add r3, r1, r2

Vector (n operations)

add.vv v3, v1, v2

Vector length

5

for (i=0; i ≤ 100; i++)

C(i) = A(i) + B(i)

r1

+

r2

r3

v1

+

v2

v3

v1

+

v2

v3

v1

+

v2

v3

v1

+

v2

v3

v1

+

v2

v3

Vector processors have high-level operations that work on linear arrays of numbers

krraju.in

6 of 21

Basic Requirements for Vector Processing

  • Need to load/store vectors
    • Vector registers (contain vectors)
  • Need to operate on vectors of different lengths
    • Vector length register (VLEN)
  • Elements of a vector might be stored apart from each other in memory
    • Vector stride register (VSTR)
      • Stride: Distance in memory between two elements of a vector

6

krraju.in

7 of 21

Vector Processing

Instructions operate on vectors rather than scalar (single data) values. A vector instruction performs an operation on each element in consecutive cycles.

  • Vector functional units are pipelined.
    • Each pipeline stage operates on a different data element
  • Vector instructions allow deeper pipelines
    • No intra vector dependencies
      • No hardware interlocking needed within a vector.
    • No control flow within a vector.
  • Known stride allows easy address calculation for all vector elements.
    • Enables prefetching of vectors into registers/cache/memory

7

Stride: Distance separating elements that are to be merged into a single vector

krraju.in

8 of 21

Vector Processor

A single processor element operates in sequence on many data elements.

  • Same regularity of action as an array processor but on smaller data sets.
  • Each result is independent of previous result
    • long pipeline, compiler ensures no dependencies
    • high clock rate
  • Vector instructions access memory with known pattern
    • highly interleaved memory
    • amortize memory latency of over ­ 64 elements
    • no (data) caches required! (Do use instruction cache)
  • Reduces branches and branch problems in pipelines
  • Single vector instruction implies lots of work (loop)
    • fewer instruction fetches

8

krraju.in

9 of 21

Vector Instruction

9

Vector length

Opcode

Base Address Source1

Base Address Source2

Base Address Destination

Instruction Format

Vector instruction includes the initial address of the operands, the length of the vectors, and the operation to be performed.

Fortran

DO 20 I = 1,100

20

C(I) = A(I) + B(I)

Machine Language

Initialize I = 0

20

Read A(I)

Read B(I)

Store C(I) = A(I) + B(I)

Increment I = I+1

If I ≤ 100 go to 20

Continue

Single Vector Instruction

C(1:100) = A(1:100) + B(1:100)

krraju.in

10 of 21

Vector Processor

10

krraju.in

11 of 21

Vector Processing

Advantages

  • No dependencies within a vector
    • Pipelining and parallelisation work really well.
    • Can have very deep pipelines, no dependencies
  • Each instruction execution generates a lot of work
    • Reduces instruction fetch bandwidth requirements
  • Highly regular memory access pattern
  • No need to explicitly code loops
    • Fewer branches in the instruction sequence

11

krraju.in

12 of 21

Vector Processing

Disadvantages

  • Works (only) if parallelism is regular (data/SIMD parallelism)
    • Very inefficient if parallelism is irregular.

Limitations

Memory (bandwidth) can easily become a bottleneck, especially if

  • Compute/ memory operation balance is not maintained
  • Data is mapped appropriately to memory banks

12

krraju.in

13 of 21

Vector vs Array Processors

Array Processor

PE0

Same Ops

@ Same Time

PE1

PE2

PE3

LD0

LD1

LD2

LD3

AD0

AD1

AD2

AD3

MU0

MU1

MU2

MU3

ST0

ST1

ST2

ST3

Different Ops

@ Same Space

Processing Elements

Vector Processor

LD

Different Ops

@ Same Time

AD

MU

ST

LD0

LD1

AD0

LD2

AD1

MU0

LD3

AD2

MU1

ST0

Same Ops

@ Same Space

Functional Units

AD3

MU2

ST1

MU3

ST2

ST3

13

Instruction Stream

LD VR ← A[3:0]

ADD VR ← VR,1

MUL VR ← VR,2

ST A[3:0] ← VR

krraju.in

14 of 21

Array Processors

A synchronous array of parallel processors is called an array processor. They have many processor elements operating in parallel on many data elements.

  • Appear in two basic architectural organizations:
    • Array processors, using random access memory
    • Associative processors, using content addressable (or associative) memory.
  • They use a single operation to perform many actions
  • It depends on the massive size of the data sets to achieve its efficiency, with a typical array processor consisting of hundreds to tens of thousands of relatively simple processors operating together.

14

krraju.in

15 of 21

Array Processors

15

Array processors perform computations on a vast array of data. They execute one instruction at a time on an array of data.

  • Attached Array Processors
  • SIMD Array Processors
    • Array processors use RAM (Dedicated Memory Organisation)
      • ILLIAC-IV
      • CM-2
      • MP-1
    • Associative processors use associative memory (Global Memory Organisation).
      • BSP

krraju.in

16 of 21

Attached Array Processor

16

  • An auxiliary processor connected to a general-purpose computer to enhance the machine's performance.
    • Includes a common processor with an input/output interface and a local memory interface.
    • Main memory and the local memory are linked.

General-purpose Computer

I/O Interface

Attached array Processor

Local Memory

Main Memory

High-speed memory-to-memory bus

krraju.in

17 of 21

SIMD Array Processors

17

The processing units are designed to work together under the supervision of a single control unit, resulting in a single instruction stream and multiple data streams.

  • The program is stored in the main memory.
  • The control unit retrieves the instructions.
  • Vector instructions are sent to all PEs simultaneously, and the results are stored in memory.

Only good for numerical issues that can be stated as vectors or matrices;

  • Not suitable for other kinds of computations.

krraju.in

18 of 21

SIMD Array Processor

18

  • It comprises several identical processing elements (PEs), each with its local memory.
    • Each PE includes an ALU and registers.
  • Master control unit controls the processing elements' actions.
    • Decodes instructions and determines how they should be carried out.

Master Control Unit

M1

Main Memory

PE1

M2

PE2

M3

PE3

Mn

PEn

krraju.in

19 of 21

Usage of Array Processors

19

  • Array processors enhance the total speed of instruction processing.
  • Most array processors' design optimizes its performance for repetitive arithmetic operations, making it faster at vector arithmetic than the host CPU.
    • Since most Array processors run asynchronously from the host CPU, the system's overall capacity is thus improved.
  • Array Processors have their own local memory, providing additional extra memory to systems with limited memory.
    • This is an essential consideration for the systems with a limited physical memory or address space.

krraju.in

20 of 21

Recap

20

ILLIAC IV

- first massively parallel computer.

Vector and array processors are both designed for vector computations, but they differ in how they process data.

  • Vector processors use a single processor to perform the same operation on multiple data items simultaneously.
    • Sometimes called array processors, and they are pipelined SIMD computers.

  • Array processors use multiple processors to work on individual array elements in parallel.
    • Also known as multiprocessors, and they are simultaneous SIMD processors

krraju.in

21 of 21

Video Links

21

krraju.in