NSF_TCPP - extended
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

View only
 
ABCDEFGHIJK
1
2
NSF/IEEE-TCPP Parallel and Distributed Computing Curriculular Recommendations
3
4
Architecture Topics
5
6
BH
7
LO
8
TopicsOUWhere CoveredLearning Outcome
9
OR
10
MS
11
#
12
Classes
13
TaxonomyC0.5SystemsFlynn's taxonomy, data vs. control parallelism, shared/distributed memory
14
Data vs. control parallelism
15
Superscalar (ILP)K0.25 to 1, SystemsDescribe opportunities for multiple instruction issue and execution (different instructions on different data)
16
SIMD/Vector (e.g., SSE, Cray)K0.1 to 0.5SystemsDescribe uses of SIMD/Vector (same operation on multiple data items), e.g., accelerating graphics for games.
17
Pipelines
18
●      Single vs. multicycleK1 to 2SystemsDescribe basic pipelining process (multiple instructions can execute at the same time), describe stages of instruction execution
19
●      Data and control hazardsNCompilers (A), Arch 2 (C)Understand how one pipe stage can depend on a result from another, or delayed branch resolution can start the wrong instructions in a pipe, requiring forwarding, stalling, or restarting
20
●      OoO executionN Arch 2 (K)Understand how independent instructions can be rescheduled for better pipeline utilization, and that various tables are needed to ensure RAW, WAR, and WAW hazards are avoided.
21
Streams (e.g., GPU)K0.1 to 0.5SystemsKnow that stream-based architecture exists in GPUs for graphics
22
DataflowN Arch 2 (K)Be aware of this alternative execution paradigm
23
MIMDK0.1 to 0.5SystemsIdentify MIMD instances in practice (multicore, cluster, e.g.), and know the difference between execution of tasks and threads
24
Simultaneous Multi-ThreadingK0.2 to 0.5SystemsDistinguish SMT from multicore (based on which resources are shared)
25
Highly Multithreaded (e.g., MTA)NArch 2 (K)Have an awareness of the potential and limitations of thread level parallelism in different kinds of applications
26
MulticoreC0.5 to 1SystemsDescribe how cores share resources (cache, memory) and resolve conflicts
27
Heterogeneous (e.g., Cell, on-chip GPU)K0.1 to 0.5SystemsRecognize that multicore may not all be the same kind of core.
28
Shared vs. distributed memory
29
SMPNArch 2 (C)Understand concept of uniform access shared memory architecture
30
●      BusesC0.5 to 1SystemsSingle resource, limited bandwidth and latency, snooping, scalability issues
31
NUMA(Shared Memory)N
32
●      CC-NUMANArch 2 (K)Be aware that caches in the context of shared memory depend on coherence protocols
33
●      Directory-based CC-NUMANArch 2 (K)Be aware that bus-based sharing doesn’t scale, and directories offer an alternative
34
Message passing (no shared memory)NArch 2 (K)Shared memory architecture breaks down when scaled due to physical limitations (latency, bandwidth) and results in message passing architectures
35
●      TopologiesNAlgo 2 (C)Various graph topologies - linear, ring, mesh/torus, tree, hypercube, clique, crossbar
36
●      DiameterNAlgo 2 (C)Appreciate differences in diameters of various graph topologies
37
●      LatencyK0.2 to 0.5SystemsKnow the concept, implications for scaling, impact on work/communication ratio to achieve speedup
38
●      BandwidthK0.1 to 0.5SystemsKnow the concept, how it limits sharing, and considerations of data movement cost
39
●      Circuit switchingNArch 2 or NetworkingKnow that interprocessor communication can be managed using switches in networks of wires to establish different point-to-point connections, that the topology of the network affects efficiency, and that some connections may block others
40
●      Packet switchingNArch 2 or NetworkingKnow that interprocessor communications can be broken into packets that are redirected at switch nodes in a network, based on header info
41
●      RoutingNArch 2 or NetworkingKnow that messages in a network must follow an algorithm that ensures progress toward their destinations, and be familiar with common techniques such as store-and-forward, or wormhole routing
42
Memory Hierarchy
43
●      Cache organizationC0.2 to 1SystemsKnow the cache hierarchies, shared caches (as opposed to private caches) result in coherency and performance issues for software
44
●      AtomicityNArch 2 (K)Need for indivisible operations can be covered in programming, OS, or database context
45
●      ConsistencyNArch 2 (K)Models for consistent views of data in sharing can be covered in programming, OS, or database context
46
●      CoherenceNArch 2 (C)Describe how cores share cache and resolve conflicts - may be covered in programming. OS, or database context
47
●      False sharingNArch2 (K)/Awareness, examples of how it originates
48
ParProg (K)
49
●      Impact on softwareNArch2 (C)/ ParProg (A)Issues of cache line length, memory blocks, patterns of array access, compiler optimization levels
50
Floating point representationThese topics are supposed to be in the ACM/IEEE core curriculum already – they are included here to emphasize their importance, especially in the context of PDC.
51
RangeKCS1/CS2/SystemsUnderstand that range is limited, implications of infinities
52
PrecisionK0.1 to 0.5CS1/CS2/SystemsHow single and double precision floating point numbers impact software performance
53
Rounding issuesNArch 2 (K)/ Algo 2 (A)Understand rounding modes, accumulation of error and loss of precision
54
Error propagationK0.1 to 0.5CS2Understand NaN, Infinity values and how they affect computations and exception handling
55
IEEE 754 standardK0.5 to 1CS1/CS2/SystemsRepresentation, range, precision, rounding, NaN, infinities, subnormals, comparison, effects of casting to other types
56
Performance metrics
57
Cycles per instruction (CPI)C0.25 to 1SystemsNumber of clock cycles for instructions, understand the performance of processor implementation, various pipelined implementations
58
BenchmarksK0.25 to 0.5SystemsAwareness of various benchmarks and how they test different aspects of performance
59
●      Spec markK0.25 to 0.5SystemsAwareness of pitfalls in relying on averages (different averages can alter perception of which architecture is faster)
60
●      Bandwidth benchmarksNArch 2 (K)Be aware that there are benchmarks focusing on data movement instead of computation
61
Peak performanceC0.1 to 0.5SystemsUnderstanding peak performance, how it is rarely valid for estimating real performance, illustrate fallacies
62
●      MIPS/FLOPSK0.1SystemsUnderstand meaning of terms
63
Sustained performanceC0.1 to 0.5SystemsKnow difference between peak and sustained performance, how to define, measure, different benchmarks
64
●      LinPackNParProg (K)Be aware of the existence of parallel benchmarks
65
66
Programming Topics
67
68
BH
69
LO
70
TopicsOUWhere CoveredLearning Outcome
71
OR
72
MS
73
#
74
Parallel Programming paradigms and Notations
75
By the target machine model5
76
SIMDK0.5CS2; SystemsUnderstand common vector operations including element-by-element operations and reductions.
77
·      Processor vector extensionsK SystemsKnow examples - SSE/Altivec macros
78
·      Array language extensions NParProg (A)Know how to write parallel array code in some language (e.g., Fortran95, Intel’s C/C++ Array Extension[CEAN])
79
Shared memoryA2CS2; DS/A; LangBe able to write correct thread- based programs (protecting shared data) and understand how to obtain speed up.
80
·      Language extensionsKKnow about language extensions for parallel programming. Illustration from Cilk (spawn/join) and Java (Java threads)
81
·      Compiler directives/ pragmasCUnderstand what simple directives, such as those of OpenMP, mean (parallel for, concurrent section), show examples
82
·      LibrariesCKnow one in detail, and know of the existence of some other example libraries such as Pthreads, Pfunc, Intel's TBB (Thread building blocks), Microsoft's TPL (Task Parallel Library), etc.
83
Distributed memoryC1DS/A; Systems Know basic notions of messaging among processes, different ways of message passing, collective operations
84
·      Message passingNParProg(C) Know about the overall organization of an message passing program as well as point-to-point and collective communication primitives (e.g., MPI)
85
·      PGAS languagesNParProg (C)Know about partitioned address spaces, other parallel constructs (UPC, CoArray Fortran, X10, Chapel)
86
●    Client ServerC1DS/A; SystemsKnow notions of invoking and providing services (e.g., RPC, RMI, web services) - understand these as concurrent processes
87
HybridK0.5SystemsKnow the notion of programming over multiple classes of machines simultaneously (CPU, GPU, etc.)
88
By the control statement
89
Task/thread spawningA1CS2; DS/ABe able to write correct programs with threads, synchronize (fork-join, producer/consumer, etc.), use dynamic threads (in number and possibly recursively) thread creation - (e.g. Pthreads, Java threads, etc.) - builds on shared memory topic above
90
SPMDC1CS2; DS/A Understand how SPMD program is written and how it executes
91
·      SPMD notationsCKnow the existence of highly threaded data parallel notations (e.g., CUDA, OpenCL), message passing (e.g, MPI), and some others (e.g., Global Arrays, BSP library)
92
Data parallelA1CS2; DS/A; LangBe able to write a correct data-parallel program for shared-memory machines and get speedup, should do an exercise. Understand relation between different notations for data parallel: Array notations, SPMD, and parallel loops. Builds on shared memory topic above.
93
·      Parallel loops for shared memoryACS2; DS/A; LangKnow, through an example, one way to implement parallel loops, understand collision/dependencies across iterations (e.g., OpenMP, Intel's TBB)
94
·      Data parallel for distributed memoryNParProg (K)Know data parallel notations for distributed memory (e.g., High Performance Fortran)
95
Functional/logic languagesNParProg (K)Understanding advantages and disadvantages of very different programming styles (e.g., Parallel Haskell, Parlog, Erlang)
96
Semantics and correctness issues
97
Tasks and threadsK0.5CS2; DS/A; Systems, LangUnderstand what it means to create and assign work to threads/processes in a parallel program, and know of at least one way do that (e.g., OpenMP, Intel TBB, etc.)
98
SynchronizationA1.5CS2; DS/A; SystemsBe able to write shared memory programs with critical regions, producer- consumer communication, and get speedup; know the notions of mechanisms for concurrency (monitors, semaphores, etc. - [from ACM 2008])
99
·      Critical regionsABe able to write shared memory programs that use critical regions for synchronization
100
·      Producer-consumerABe able to write shared memory programs that use the producer-consumer pattern to share data and synchronize threads
Loading...