1 of 31

What is Parallel Computing?

Traditionally, software has been written for serial computation:

To be run on a single computer having a single Central Processing Unit (CPU);

A problem is broken into a discrete series of instructions.

Instructions are executed one after another.

Only one instruction may execute at any moment in time.

3 of 31

In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem:

To be run using multiple CPUs

A problem is broken into discrete parts that can be solved concurrently

Each part is further broken down to a series of instructions

Instructions from each part execute simultaneously on different CPUs

5 of 31

Why Use Parallel Computing?�

Save time and/or money:
Solve larger problem
Provide concurrency

6 of 31

Parallel Computer Memory Architectures�

Shared memory (UMA)

Shared memory (NUMA)

7 of 31

Continued…

Distributed Memory

8 of 31

Hybrid Distributed Shared Memory

9 of 31

Taxonomy of Architectures

Simple classification by Flynn:

(No. of instruction and data streams)

SISD - conventional
SIMD - data parallel, vector computing (type of parallel computer)
MISD - systolic arrays (Pipeline Archi)
MIMD - very general, multiple approaches, Shared memory multiprocessors, Distributed Shared Memory

Current focus is on MIMD model, using general purpose processors or multicomputers.

10 of 31

Single Instruction, Multiple Data (SIMD):

Single Instruction: All processing units execute the same instruction at any given clock cycle Multiple Data: Each processing unit can operate on a different data element

11 of 31

Multiple Instruction, Single Data (MISD):

Multiple Instruction: Each processing unit operates on the data independently via separate instruction streams.

Single Data: A single data stream is fed into multiple processing units.

12 of 31

Multiple Instruction, Multiple Data (MIMD):

Multiple Instruction: Every processor may be executing a different instruction stream

Multiple Data: Every processor may be working with a different data stream

13 of 31

Main HPC Architectures..1a

SISD - mainframes, workstations, PCs.
SIMD Shared Memory - Vector machines, Cray...
MIMD Shared Memory - Sequent, KSR, Tera, SGI, SUN.
SIMD Distributed Memory - DAP, TMC CM-2...
MIMD Distributed Memory - Cray T3D, Intel, Transputers, TMC CM-5, plus recent workstation clusters (IBM SP2, DEC, Sun, HP).

14 of 31

Cluster Computer Architecture

15 of 31

Definition of cluster computing

Collection of computers on a network that can function as a single computing resource through the use of additional system management software

Can any group of Linux machines dedicated to a single purpose can be called a cluster?

Dedicated/non-dedicated, homogeneous/non-homogeneous, packed/geographically distributed???

16 of 31

Clusters Classification

Based on Focus (in Market)

High performance (HP) clusters

Grand challenging applications

High availability (HA) clusters

Mission critical applications

17 of 31

Based on Workstation/PC Ownership

Dedicated clusters
Non-dedicated clusters

Adaptive parallel computing
Can be used for CPU cycle stealing

18 of 31

Non-dedicated clusters:

network of workstations (NOW)

use spare computation cycles of nodes

background job distribution

individual owners of workstations

Dedicated clusters:

joint ownership

dedicated nodes

parallel computing

19 of 31

Based on Node Architecture

Clusters of PCs (CoPs)
Clusters of Workstations (COWs)
Clusters of SMPs (CLUMPs)

20 of 31

Relationship among Middleware Modules

21 of 31

What is filesystem????

System that permanently stores data- that usually layered on top of a lower-level physical storage medium.
Divided into logical units called “files”

22 of 31

Parallelization Idea

Parallelization is “easy” if processing can be cleanly split into n units:

Worker

Wk1

Wk2

Wk3

23 of 31

Wk1

Wk2

Wk3

Spawn worker threads

Thread

Master thread

24 of 31

Multiple threads must communicate with one another, or access a shared resource.

Any memory that can be used by multiple threads must have an associated synchronization system.

Thread 1:

void foo(){

x++;

y=x;

}

Thread 2:

void bar() {

y++;

x+=3;

}

25 of 31

Cluster configuration

Passive standby
Active secondary
Separate servers
Servers connected to disks
Servers with share disks

26 of 31

Operating system design issues

Failure management
Load balancing
Parallelizing computation

Parallelizing compiler
Parallelizing application
Parametric computation

27 of 31

Two basic types of clusters

Non-dedicated clusters:

network of workstations (NOW)

use spare computation cycles of nodes

background job distribution

individual owners of workstations

Dedicated clusters:

joint ownership

dedicated nodes

parallel computing

28 of 31

Taxonomy of clusters

Network of workstations (e.g. SUN)

Beowulf clusters: (COTS pcs. With SAN)

Cluster farms: existing PCs on a LAN which when idle can perform work.

Supercluster: cluster of clusters (within a campus)

29 of 31

NOWs

Net. Interface HW

Unix

Workstation

Net. Interface HW

Unix

Workstation

Net. Interface HW

Unix

Workstation

Net. Interface HW

Unix

Workstation

Sequential Applications

Sockets, MPI, HPF, …

Parallel Applications

GLUnix (Global Layer Unix)

(Resource Management, Network RAM, Distributed Files, Process Migration

Myrinet

1 of 31

2 of 31

3 of 31

4 of 31

5 of 31

6 of 31

7 of 31

8 of 31

9 of 31

10 of 31

11 of 31

12 of 31

13 of 31

14 of 31

15 of 31

16 of 31

17 of 31

18 of 31

19 of 31

20 of 31

21 of 31

22 of 31

23 of 31

24 of 31

25 of 31

26 of 31

27 of 31

28 of 31

29 of 31

30 of 31

31 of 31