CS 4230: Parallel Programming

School of Computing, University of Utah, Fall 2014

Instructor: Ganesh Gopalakrishnan, Office: 3428 MEB, Ofc Hours: Email Appt

Co-Instructors: Mo, Simone (Mohammed Saeed Al-Mahfoudh and Simone Atzeni), Office: 3163 MEB, Ofc Hours: Email Appt

Class Location and Timings

WEB L 112, Tue and Thu 9:10am - 10:30am

Email to contact Teaching Staff

Ganesh Gopalakrishnan: ganesh AT cs.utah.ed

Mohammed Al-Mahfoudh: mahfoudh AT cs.utah.edu

Simone Atzeni: simone AT cs.utah.edu

Homework Submission

All assignments should be submitted to cs4230f14 AT gmail.com unless instructed otherwise.

TA Office Hours:

Mohammed Al-Mahfoudh: Mon and Wed 2:00pm - 3.30pm

Simone Atzeni: Tue and Thu 3:30pm - 5:00pm

(Send an email for different time arrangement)

Discussion Group (join)

cs4230f14 AT gmail.com

Discussions (“office hours”)

(1) We will hold weekly Google hangout sessions so that we can dispense with questions quickly. Discussions between students is encouraged except ONLY when said otherwise.

(2) Please email one of the instructors. We read email frequently, and will be generous with slots, provided you send me a summary of the purpose of the meeting.

 

LECTURE SCHEDULE, HOMEWORKS and READINGS

LECTURE 1

8/26/14

Lec1-cs4230.pdf      Lec1-audio   Real-Lec1-audio     hw1-new-version.pdf

hw1-new-version.zip

Go to site http://tinyurl.com/cs4230-HW1-new-version

 for a link to the WriteLatex Project Site.  But I’ve kept the zip file handy for you, above.

LECTURE 2

8/28/14

Lec2-cs4230.pdf      Lec2-audio   bPc.c (see README as well as the class email about this).

See the CADE machine listing also.

Amdahl’s Law in the Multicore Era - Online Calculator + Paper

LECTURE 3

9/2/14

Lec3-cs4230.pdf      Lec3-audio

Homework 2 : WriteLatex link

Homework 2 : zip file

Homework 2 : pdf file

See bPc.c listed against Lecture 2

File prodcons-bug2-big.c is here

LECTURE 4

9/4/14

KEPLER machine address: kepler.cs.utah.edu

More practice of thread programming. Here are the example files to be used: InspectExs.zip . These illustrate data races, deadlocks, and the futility of “testing by running a random number of times.”  See a PDF of most of these examples here.

Also see Dr. Yu Yang’s PhD defense slides to understand Inspect’s working

[.pptx]  and [.pdf]

Inspect Usage Instructions are HERE.

Lec4-cs4230.pdf  (Slides) and AUDIO Files for audiophiles

LECTURE 5

9/9/14

Lec5-cs4230.pdf  (Slides) and AUDIO Files for audiophiles

Hans Boehm’s paper on races

Some cool demos using Thread Sanitizer:

  * iP.c  -- test file

  * iPnl.c -- another test file

  * RaceSessions.txt -- explanation of what races we observe

  * TypeScript.txt -- the real evidence running Thread Sanitizer

  * Example of a Switch you may buy for your project group … the rest of the Pi parts are as in the user manual… (details in class on Tue)

LECTURE 6

9/11/14

This lecture presents some more PThread examples (e.g.  Dining Philosophers). Then it switches to an excellent collection of examples by Bogaerts and Stough (SC’12 tutorial)

Lec6-cs4230.pdf (Slides)  AUDIO (tbd)

parallelQuicksort.py from Stough and Bogaerts

parallelMergesort.c  from Stough and Bogaerts

Paper in Computer Magazine on pydron for astronomy (made available for class-room only). But this shows how important Python is emerging to be in HPC and supporting science research in general.

LECTURE 7

(no lecture notes; the TAs will demo TAU and help you set up access to TAU and TSan)

9/16/14

1. Homework-3.pdf

2. Homework-3.tex

3. erickson-freund-musuvathi-rv12.pdf

4. TAU performance evaluation tutorial

5. ThreadSanitizer Instructions

LECTURE 8

9/18/14

Lec8-cs4230.pdf

Lec8-cs4230-TAU-Notes-Mo.pdf

Makefile for TAU profiling.

Setting up TAU instructions.

LECTURE 9

9/23/14

These notes will last thru LECTURE 10 also

Lec9-cs4230.pdf

Lec9-cs4230.pptx

Audio Files for Lecture 9

Lecture 10

9/25/14

 I pretty much continued Lec9-cs4230.pptx

 (Also gave out Chapters 15 and 16 of Goetz’s book

 “Java Concurrency in Practice”)

LECTURE 11

9/30/14

HOMEWORK 4      and if you wanted homework-4.tex

These are demos of Java volatiles in action that I managed to create:

VolatileTest , Test1, Test2, Test3, Test4, Test5, TestGood

If you change the declaration of req and ack to not have volatiles, then the code will hang (by not finishing a four-phase handshake).

LECTURE 12

10/2/14

Paul Kelly’s excellent slides (thanks Paul)

Material to try: mm_Aikj.c and mm_Aijk.c

Java Volatile Mystery solved (or explained, at least; HW4)

LECTURE 13

10/7/14

Setting up your Pi clusters

LECTURE 14

10/9/14

Setting up your Pi clusters

BREAK

10/14/14

homework-5.pdf   cder-paper.pdf

LECTURE 15

10/21/14

Lecture 15 Slides and Other Material are HERE

OpenMP - material from

LECTURE 16

10/23/14

Homework-6 has been posted - Due 10/28 and 10/30

LECTURE 17

10/28/14

Project Suggestions [ pptx ]  [ pdf ]

LECTURE 18

10/30/14

Scala Basic Getting Started Tutorial

LECTURE 19

11/04/14

Scala Concurrency Models Tutorial

LECTURE 20

11/06/14

Scala+Concurrency+Akka all lectures in one.

The Pi calculation (needed for homework)

Homework.

LECTURE 21

11/11/14

Lec21-cs4230.pptx   Lec21-cs4230.pdf

Examples : NonBlockingSynchronization.tar.gz

LECTURE 22

11/13/14

LECTURE 25

11/20/14

Lec25-cs4230.pptx  Lec25-cs4230.pdf

Akka bare bone Sample Remote deployment (Programmatic approach mixed with config.) -- you need a scala-ide and to add your akka jars to dependencies to the project.

PROJECT DEADLINES

12/11/14

AND

12/18/14

  • 12/11/14 : Class presentation : For 3-member projects, I’d like to hear from each member for approx 5 mins (you can vary the timing; some may speak more/less). For a 1-member project, I’d like to hear a presentation of about 10 mins.

  • Plan on roughly 5 slides per project member.  Please send a link to your slides by 12/10/14 5pm,  putting the slides on Google Doc as a PPT. Also, send a code repo by this date.  

  • 12/18/14 : Please submit a project writeup on Google Doc of about 3 pages per project member, and a PPT presentation on Google Docs explaining your work. Please send these Google Doc links on 12/17/14.

  • The above two deadlines are the only ones remaining in this course. I plan to go thru the reports and assign grades. I really enjoyed having you all in class !! Thanks a ton to Mo and Simone for everything!

Course work: 

Machines and Resources:  Multicore machines at CADE are documented at

http://tinyurl.com/CADE-Multicore-Machines

Main Project: A “main project” is also required. Ideas + help will be provided in selecting and refining a project. Your main project will be presented via your posters and fetch you a significant percentage of your course grade.

Important Reading Material: We are providing you all the reading material via slides, audio accompanying slides, and a draft book under construction. So these are your most important reading material! Besides saving you $$, here is mainly why: There isn’t anything out there that addresses the GOALS outlined below.

That said, we are in the process of converting our slides to a draft book. Here is that book draft PracticalParallelProgramming.pdf  that will definitely be useful to read.

GOALS for the lectures:

We believe strongly in teaching you concepts as well as modern practice.  Concepts stand the test of time, and forms the framework for understanding practice at any given moment.

Important conceptual items include Sequential Consistency, Atomicity, Data Races, Linearizability, Happens-Before, Active Testing Methods through Schedule Control, etc.

Important items of practice include some basic ideas about these 10 topics :

(1) Practice of Parallel Programming, including Hardware and Software (2) Thread Programming (3) Data Race Checking, including learning about PThreads as well as practical tools such as the Thread Sanitizer (4) Performance Evaluation, including learning practical tools such as the Tau framework (5) Message Passing, including learning MPI (6) Annotation based Parallel Programming, including learning OpenMP (7) Lock-free Methods, including some simple Lock-free data structures such as stacks and queues (8) Hardware and Software Transactions (9) Actor-based programming, including an exposure to Scala (10) Emerging issues, including bit-flips and how to make systems more resilient.  

GOALS for Project-1: Our first hands-on project teaches you to build a Raspberry Pi cluster as well as administer it.

GOALS for Project-2: Allow you to pursue a few topics in-depth (out of our 10 topics).

ADA Policies:  The University of Utah is mandated by law (the Americans with Disabilities Act), and by policy, to provide reasonable accommodation to all qualified students who request an accommodation. The Center for Disability Services (CDS) is the only department that is authorized to determine whether or not a student is qualified for accommodation either based upon law or upon University policy. Upon request for accommodation by a student, the CDS obtains appropriate medical documentation, determines whether or not a student is qualified, and if so, what accommodation should be provided. For details, please visit  http://www.oeo.utah.edu/ada/guide/faculty/

COOL ARTIFACT: Take a look at the Raspberry Pi cluster built by your Co-Instructors here.

Prelim instructions on its assembly and administration are in their respective links (updated).  You’ll be building one such cluster as part of this course, and also administering it.

ADDITIONAL MATERIAL: Additional material include books, and reference material linked to various lecture slides.

  1. Norm Matloff’s book : Thanks to Prof. Norm Matloff, we have an excellent free book http://heather.cs.ucdavis.edu/~matloff/158/PLN/ParProcBook.pdf that you may refer to. Since Prof. Matloff plans to update this book online, please refer to this online copy even though you may stash away a PDF for convenience.
  2. How to Survive the Multicore Software Revolution (or at least Survive the Hype) : No longer a hype, this book lays out some of the foundational ideas, has a nice glossary of terms, and otherwise excellent in many ways. 
  3. Of course, our own draft book PracticalParallelProgramming.pdf (this book is bound to lag our slides, but will expand on that material).

 

ROUGH COURSE CONTENTS (this was the initially posted outline ; now, it might be the outline of our draft book -- soon to make its online appearance):

  1. Why parallel, Why a Practical Approach, and Why Now?
  2. A Brief Tour Through Existing Parallel Systems
  1. Familiarization with XSEDE Resources
  1. Let’s Build a Pi(co) Cluster
  2. Let’s Administer our Pi(co) Cluster
  3. Shared Memory in the Raw
  1. Threads, Locks, Atomics, CAS
  2. “Lock-free” data structures
  3. Fences, Barriers, Memory Consistency Models
  4. Data races, Happens-before
  5. How scheduling may affect progress
  6. Tools for Correctness and Performance
  1. Message Passing in the Raw
  1. Blocking versus Non-blocking calls
  2. Collectives
  3. Overlapping computation and communication
  4. Tools for Correctness and Performance
  1. Shared Memory: Safe, Sealed, and Sanitized
  1. Loop parallelization and use of Pragmas
  2. Structured parallelism
  3. Transactional Programming
  1. Message Passing: Safe, Sealed, and Sanitized
  1. Higher level messaging primitives
  2. Actors
  1. Projects and Poster Presentation

 

EXTENDED INTRO:

Parallel programming is in its infancy. Many parallel programming languages already exist, and many more will evolve based on what already exist. The number of parallel applications are skyrocketing! This course is an introduction to the exciting things happening and yet to come. It is designed to give you a practical hands-on introduction, in many cases going far beyond traditional books and online material. We will emphasize enduring concepts for sure!

Parallel programs are already controlling life-critical and resource-critical tasks. This means that we must strive to eliminate bugs from these programs. The almost total elimination of bugs is possible in many situations, but the educational and training ladder one must climb is arduous and long. In this basic course, we will strive to forewarn you against the kinds of parallel programming bugs that recur. Unfortunately, even the basics of avoiding these bugs are not being taught widely, as is evident from concurrency bugs found in existing works created by experts. We believe that the main reason for this situation is an excessive preoccupation with performance tuning - so much so that it takes the “Oxygen out of the air” in terms of time left to discuss correctness. We shall try to take a “basics of correctness first” approach.

Performance goals are essentially why we need to program things in parallel - else we might as well have written things sequentially. There are enduring principles behind performance improvement and scalability that we shall definitely address. We shall not have the time to address more advanced performance techniques.

It is important to say that the main performance metric being sought is reduced energy consumption and power consumption. Energy is akin to the amount of fuel in your tank and power akin to the burn rate. Reducing both is of importance. It so happens that if you do so, the overall computation time can also be reduced in most practical cases.

Correctness concepts covered will include non-determinism, data races, deadlocks, livelocks, safety, liveness, and progress. Performance concepts covered will include locality and caching, Amdahl’s law (strong scaling), Gustafson’s law (weak scaling), and the basics of latency and throughput. As silicon area and efficiency are important, we shall also address Moore’s law and Dennard’s scaling law.

The course will begin with a survey of the area and attempt to set up a glossary. We shall begin with building a simple parallel machine from scratch, and then look at MPI, OpenMP, GPU programming (CUDA and OpenCL), Java, Habanero Java, Scala, Erlang, Go, Transactional programming, Non-blocking data structures, and Work Stealing.

Societal Impact: All of us dream of impacting the society, especially the underprivileged in the US and all across the world. It occurred to the three of us that in order to learn 70% of the concepts mentioned above and to gain practical experience, we don’t need anything more than a $50 computer. It just is a different and uplifting experience when your computer is smaller than a mouse, does not have an “all sealed” appearance (you can see all its parts), and yet, is capable of elucidating most of the material in a course such as this. Yes, we are talking about the Raspberry Pi, Model B.  Each student will be required to have one for use in this course. We shall come up with innovative ideas to program them individually as well as to “gang them up” to do interesting new things.

It is my hope that we can all publish our projects using the Pi, thus providing students around the world great ideas for active learning. This part of this course will be required of all of you, and the credits will flow into some of your homeworks.