1 of 27

Parallel Processing

1

krraju.in

2 of 27

What you’ll learn

  • What are the drawbacks of von Neumann computers?
  • How to enhance system performance using parallel techniques?
  • Why to increase the computational speed of the computer system?

2

Parallel Processing, Pipelining, Arithmetic and Instruction Pipelines, RISK Pipeline, Vector Processing, Array Processors, Multiprocessors, Interconnection structures, Cache coherence.

Unit-5

krraju.in

3 of 27

3

Operating System Approaches

Operating System Approaches for Concurrency

krraju.in

4 of 27

Parallel Processing

  • Emphasizes the exploitation of concurrent events in the computing process.
  • An extensive redesign of algorithms and data structures is needed.
  • The overall processing speed depends on:
    • computing speed of the processors
    • communication speed of their interconnection structure.

4

krraju.in

5 of 27

Major issues in Parallel Architectures

  • Hardware: The hardware structure should allow scale up to a large number of processors, allowing fast computation and communication speeds.
    • With advances in hardware technology, it is now practical to build such structures.
  • Algorithms: In general, an algorithm that is efficient for serial computation need not be efficient for parallel implementation.
    • Development of parallel algorithms has been an active area of research.

5

krraju.in

6 of 27

Major issues in Parallel Architectures Contd..

  • Languages: They should allow programming of parallel algorithms.
    • New languages are being developed, and sequential languages are being modified to accommodate parallel programming constructs.
  • Compilers and other programming tools: Compilers that translate parallel language programs into object code while retaining the parallelism expressed in the programs are being developed.
    • Development of parallelizing compilers that extract the parallelism from serial programs.
    • Several programming environments and tools, such as simulators and debuggers, are being developed.

6

krraju.in

7 of 27

Major issues in Parallel Architectures Contd..

  • Operating systems: The multiplicity of processors makes the control of parallel processing architecture more complex.
    • Existing operating systems are being extended, and newer operating systems are being developed.
  • Performance evaluation: Methods that allow the evaluation of the speedup obtained, the scale-up characteristics, the algorithm efficiency and resource utilization are being developed.

7

krraju.in

8 of 27

Parallel Processing Mechanisms

  • Multiplicity of functional units
  • Parallelism and pipelining within the CPU
    • Pipelining of instruction fetch, decode, operand fetch, arithmetic and logic execution, and store result.
  • Overlapped CPU and I/O operations
    • DMA
  • Use of a hierarchical memory system
  • Balancing of subsystem bandwidths
  • Multiprogramming and time sharing

8

krraju.in

9 of 27

Parallelism

  • Types of Parallelism
    • Functional
    • Data
  • Levels of Parallelism
    • Job or program level
      • Algorithmically
    • Task or procedure level
    • Inter-instruction level
    • Intra-instruction level

9

krraju.in

10 of 27

Applications

Predictive modelling and simulations

  • Numerical weather forecasting
  • Oceanography and astrophysics
    • Climate predictive analysis
    • Fishery management
    • Ocean resource exploration
    • Ocean modelling: Large scale simulation of ocean activities and heat exchange with atmospheric flows
    • Coastal dynamics and tides
  • Socioeconomic and government use
    • Economic simulations
    • Crime control
    • National census

10

krraju.in

11 of 27

Applications Contd..

Engineering design and automation

  • Finite element analysis in structural design
  • Computational aerodynamics
  • Catalysis: Computer modelling of biometric catalysis to analyze enzymatic reactions in the manufacturing process.
  • Fuel combustion: Designing better engine models via chemical kinetics calculations to reveal fluid mechanical effects.
  • Artificial intelligence and automation
    • Image processing, Pattern recognition, Computer vision, Speech understanding, Machine inference
    • CAD/CAM/Computer-assisted instruction/Office automation
    • Intelligent robotics, Expert computer systems, Knowledge engineering
  • Remote sensing applications

11

krraju.in

12 of 27

Applications Contd..

Energy resource exploration

  • Seismic exploration
  • Reservoir modelling
  • Plasma fusion power
  • Nuclear reactor safety
    • Online analysis of reactor conditions
    • Automatic control for normal and abnormal operations
    • Simulation of operator training
    • A quick assessment of potential accident mitigation procedures

12

krraju.in

13 of 27

Applications Contd..

Medical research

  • Computer-Assisted Tomography
    • Digital anatomy: Real-time, clinical imaging, computed tomography, magnetic resonance imaging with computers.
  • Genetic engineering
    • Rational drug design: To develop drugs to cure cancer or AIDS by blocking the action of HIV protease.
    • Protein structure design: a 3D structural study of protein formation by using MPP system to perform computational simulations.

13

krraju.in

14 of 27

Applications Contd..

Military and Basic research

  • Weapon research and defence
  • Image understanding: Use large supercomputers for producing rendered images or animations in real-time.
  • Basic research problems
    • Ozone depletion: To study chemical and dynamical mechanisms controlling the ozone depletion process.
    • Air pollution: Simulated air quality models running on supercomputers to provide more understanding of atmospheric systems.

14

krraju.in

15 of 27

India’s Supercomputers in Top500.org

C-DAC (Center for Development of Advanced Computing), Pune

75- AIRAWAT (13.17 PF*): AMD EPYC 7742 64C 2.25GHz processor with 81,344 cores.

131- PARAM Siddhi (5.267 PF*): NVIDIA DGX A100, AMD EPYC 7742 64C 2.25GHz, NVIDIA A100, Mellanox HDR Infiniband, Atos.

IITM (Indian Institute of Tropical Meteorology), Pune

169- Pratyush (4.00 PF*): Cray XC40, Xeon E5-2695v4 18C 2.1GHz, Aries interconnect, HPE.

NCMRWF (National Centre for Medium Range Weather Forecasting), Noida

316- Mihir (2.8 PF*): Cray XC40, powered by Xeon E5-2695v4 18C 2.1GHz, Aries interconnect, HPE.

*Peta Flops

15

krraju.in

16 of 27

Classification of Computers

Taxonomy based on what drives the computational flow of the architecture:

1. Control driven (Control-flow) architectures

a. Reduced instruction set computers (RISC)

b. Complex instruction set computers (CISC)

c. High-level language architectures (HLL)

2. Data-driven (Data-flow) architectures

3. Demand-driven (Reduction) architectures

16

krraju.in

17 of 27

Control Flow (Control Driven)

  • Conventional Evaluation
    • Execution sequence is implicit in the order of the instructions and can be changed by explicit instructions.
    • Token of control indicates when a statement should be executed

Advantages:

  • Full control
  • Complex data and control structures are easily implemented.

Disadvantages:

  • Less efficient
  • Difficulty in programming and preventing run time errors.

17

krraju.in

18 of 27

Data Flow (Data Driven)

  • Eager evaluation
    • An operation is activated as soon as all the needed input data is available

Advantages:

  • Very high potential for parallelism
  • High throughput and free from side effects

Disadvantages:

  • High control overhead, complex hardware
  • Difficult in manipulating data structures
  • Overall performance/efficiency is still worse than modern processors.

18

krraju.in

19 of 27

Reduction (Demand Driven)

  • Lazy Evaluation
    • An operation is activated only when execution is needed for another computation

Advantages:

  • Only required instructions are executed
  • High degree of parallelism
  • Easy manipulation of data structures

Disadvantages:

  • Does not support sharing of objects with changing local state.
  • Time needed to propagate demand tokens

19

krraju.in

20 of 27

Flynn’s Classification

It is based on the number of instructions and data items that are manipulated simultaneously.

  • Instruction stream: Sequence of instructions read from memory
  • Data stream: Operations performed on the data in the processor

Computers are divided into

SISD: Single Instruction, Single Data stream

SIMD: Single Instruction, Multiple Data stream

MISD: Multiple Instruction, Single Data stream

MIMD: Multiple Instruction, Multiple Data stream

20

krraju.in

21 of 27

SISD

Single Instruction, Single Data stream

  • Traditional uniprocessor
    • One functional unit: IBM 701, PDP VAX11/780
    • Multiple functional units: IBM 360/91, CDC 6600, TI-ASC, Cray-1, CDC Cyber 205, Fujitsu VP-200

Von Neumann computers belong to this category

21

krraju.in

22 of 27

SIMD

Single Instruction, Multiple Data stream

  • Vector processors as well as massively parallel processors
    • Word slice processing: Illiac-IV, PEPE, BSP
    • Bit slice processing: STARAN, MPP, DAP

22

krraju.in

23 of 27

MISD

Multiple Instruction, Single Data stream

  • Systolic arrays

23

krraju.in

24 of 27

MIMD

Multiple Instruction, Multiple Data stream

  • Traditional multiprocessors and network of workstations
    • Loosely coupled: IBM 370/168 MP, Univac 1100/80, Tandem/16, C.m*
    • Tightly coupled: Burroughs D-825, C.mmp, Cray-2, Cray-X MP, Denelcor HEP

24

krraju.in

25 of 27

Amdahl’s Law

  • Expresses a fundamental relation that exists between an improvement to some part of a computer and the resultant improvement to the overall execution time of a program
  • System speedup is maximised when the performance of the most frequently used component of the system is maximised.

Overall system speedup (S) =1/((1-f)+f/k)

f represents the fraction of the work performed by the enhanced component

k is the speedup of the enhanced component

25

Gene Amdahl

krraju.in

26 of 27

Recap

26

krraju.in

27 of 27

Video Links

27

krraju.in