CS 4230: Parallel Programming
School of Computing, University of Utah, Fall 2014
Instructor: Ganesh Gopalakrishnan, Office: 3428 MEB, Ofc Hours: Email Appt
Co-Instructors: Mo, Simone (Mohammed Saeed Al-Mahfoudh and Simone Atzeni), Office: 3163 MEB, Ofc Hours: Email Appt
Class Location and Timings
WEB L 112, Tue and Thu 9:10am - 10:30am
Email to contact Teaching Staff
Ganesh Gopalakrishnan: ganesh AT cs.utah.ed
Mohammed Al-Mahfoudh: mahfoudh AT cs.utah.edu
Simone Atzeni: simone AT cs.utah.edu
All assignments should be submitted to cs4230f14 AT gmail.com unless instructed otherwise.
TA Office Hours:
Mohammed Al-Mahfoudh: Mon and Wed 2:00pm - 3.30pm
Simone Atzeni: Tue and Thu 3:30pm - 5:00pm
(Send an email for different time arrangement)
Discussion Group (join)
cs4230f14 AT gmail.com
Discussions (“office hours”)
(1) We will hold weekly Google hangout sessions so that we can dispense with questions quickly. Discussions between students is encouraged except ONLY when said otherwise.
(2) Please email one of the instructors. We read email frequently, and will be generous with slots, provided you send me a summary of the purpose of the meeting.
LECTURE SCHEDULE, HOMEWORKS and READINGS
Go to site http://tinyurl.com/cs4230-HW1-new-version
for a link to the WriteLatex Project Site. But I’ve kept the zip file handy for you, above.
See the CADE machine listing also.
Homework 2 : WriteLatex link
Homework 2 : zip file
Homework 2 : pdf file
See bPc.c listed against Lecture 2
File prodcons-bug2-big.c is here
KEPLER machine address: kepler.cs.utah.edu
More practice of thread programming. Here are the example files to be used: InspectExs.zip . These illustrate data races, deadlocks, and the futility of “testing by running a random number of times.” See a PDF of most of these examples here.
Also see Dr. Yu Yang’s PhD defense slides to understand Inspect’s working
Inspect Usage Instructions are HERE.
Hans Boehm’s paper on races
Some cool demos using Thread Sanitizer:
* iP.c -- test file
* iPnl.c -- another test file
* RaceSessions.txt -- explanation of what races we observe
* TypeScript.txt -- the real evidence running Thread Sanitizer
* Example of a Switch you may buy for your project group … the rest of the Pi parts are as in the user manual… (details in class on Tue)
This lecture presents some more PThread examples (e.g. Dining Philosophers). Then it switches to an excellent collection of examples by Bogaerts and Stough (SC’12 tutorial)
Lec6-cs4230.pdf (Slides) AUDIO (tbd)
parallelQuicksort.py from Stough and Bogaerts
parallelMergesort.c from Stough and Bogaerts
Paper in Computer Magazine on pydron for astronomy (made available for class-room only). But this shows how important Python is emerging to be in HPC and supporting science research in general.
(no lecture notes; the TAs will demo TAU and help you set up access to TAU and TSan)
Makefile for TAU profiling.
These notes will last thru LECTURE 10 also
I pretty much continued Lec9-cs4230.pptx
(Also gave out Chapters 15 and 16 of Goetz’s book
“Java Concurrency in Practice”)
These are demos of Java volatiles in action that I managed to create:
If you change the declaration of req and ack to not have volatiles, then the code will hang (by not finishing a four-phase handshake).
Paul Kelly’s excellent slides (thanks Paul)
Java Volatile Mystery solved (or explained, at least; HW4)
Setting up your Pi clusters
Setting up your Pi clusters
Lecture 15 Slides and Other Material are HERE
OpenMP - material from
Homework-6 has been posted - Due 10/28 and 10/30
Examples : NonBlockingSynchronization.tar.gz
Akka bare bone Sample Remote deployment (Programmatic approach mixed with config.) -- you need a scala-ide and to add your akka jars to dependencies to the project.
Machines and Resources: Multicore machines at CADE are documented at
Main Project: A “main project” is also required. Ideas + help will be provided in selecting and refining a project. Your main project will be presented via your posters and fetch you a significant percentage of your course grade.
Important Reading Material: We are providing you all the reading material via slides, audio accompanying slides, and a draft book under construction. So these are your most important reading material! Besides saving you $$, here is mainly why: There isn’t anything out there that addresses the GOALS outlined below.
That said, we are in the process of converting our slides to a draft book. Here is that book draft PracticalParallelProgramming.pdf that will definitely be useful to read.
GOALS for the lectures:
We believe strongly in teaching you concepts as well as modern practice. Concepts stand the test of time, and forms the framework for understanding practice at any given moment.
Important conceptual items include Sequential Consistency, Atomicity, Data Races, Linearizability, Happens-Before, Active Testing Methods through Schedule Control, etc.
Important items of practice include some basic ideas about these 10 topics :
(1) Practice of Parallel Programming, including Hardware and Software (2) Thread Programming (3) Data Race Checking, including learning about PThreads as well as practical tools such as the Thread Sanitizer (4) Performance Evaluation, including learning practical tools such as the Tau framework (5) Message Passing, including learning MPI (6) Annotation based Parallel Programming, including learning OpenMP (7) Lock-free Methods, including some simple Lock-free data structures such as stacks and queues (8) Hardware and Software Transactions (9) Actor-based programming, including an exposure to Scala (10) Emerging issues, including bit-flips and how to make systems more resilient.
GOALS for Project-1: Our first hands-on project teaches you to build a Raspberry Pi cluster as well as administer it.
GOALS for Project-2: Allow you to pursue a few topics in-depth (out of our 10 topics).
ADA Policies: The University of Utah is mandated by law (the Americans with Disabilities Act), and by policy, to provide reasonable accommodation to all qualified students who request an accommodation. The Center for Disability Services (CDS) is the only department that is authorized to determine whether or not a student is qualified for accommodation either based upon law or upon University policy. Upon request for accommodation by a student, the CDS obtains appropriate medical documentation, determines whether or not a student is qualified, and if so, what accommodation should be provided. For details, please visit http://www.oeo.utah.edu/ada/guide/faculty/
COOL ARTIFACT: Take a look at the Raspberry Pi cluster built by your Co-Instructors here.
Prelim instructions on its assembly and administration are in their respective links (updated). You’ll be building one such cluster as part of this course, and also administering it.
ADDITIONAL MATERIAL: Additional material include books, and reference material linked to various lecture slides.
ROUGH COURSE CONTENTS (this was the initially posted outline ; now, it might be the outline of our draft book -- soon to make its online appearance):
Parallel programming is in its infancy. Many parallel programming languages already exist, and many more will evolve based on what already exist. The number of parallel applications are skyrocketing! This course is an introduction to the exciting things happening and yet to come. It is designed to give you a practical hands-on introduction, in many cases going far beyond traditional books and online material. We will emphasize enduring concepts for sure!
Parallel programs are already controlling life-critical and resource-critical tasks. This means that we must strive to eliminate bugs from these programs. The almost total elimination of bugs is possible in many situations, but the educational and training ladder one must climb is arduous and long. In this basic course, we will strive to forewarn you against the kinds of parallel programming bugs that recur. Unfortunately, even the basics of avoiding these bugs are not being taught widely, as is evident from concurrency bugs found in existing works created by experts. We believe that the main reason for this situation is an excessive preoccupation with performance tuning - so much so that it takes the “Oxygen out of the air” in terms of time left to discuss correctness. We shall try to take a “basics of correctness first” approach.
Performance goals are essentially why we need to program things in parallel - else we might as well have written things sequentially. There are enduring principles behind performance improvement and scalability that we shall definitely address. We shall not have the time to address more advanced performance techniques.
It is important to say that the main performance metric being sought is reduced energy consumption and power consumption. Energy is akin to the amount of fuel in your tank and power akin to the burn rate. Reducing both is of importance. It so happens that if you do so, the overall computation time can also be reduced in most practical cases.
Correctness concepts covered will include non-determinism, data races, deadlocks, livelocks, safety, liveness, and progress. Performance concepts covered will include locality and caching, Amdahl’s law (strong scaling), Gustafson’s law (weak scaling), and the basics of latency and throughput. As silicon area and efficiency are important, we shall also address Moore’s law and Dennard’s scaling law.
The course will begin with a survey of the area and attempt to set up a glossary. We shall begin with building a simple parallel machine from scratch, and then look at MPI, OpenMP, GPU programming (CUDA and OpenCL), Java, Habanero Java, Scala, Erlang, Go, Transactional programming, Non-blocking data structures, and Work Stealing.
Societal Impact: All of us dream of impacting the society, especially the underprivileged in the US and all across the world. It occurred to the three of us that in order to learn 70% of the concepts mentioned above and to gain practical experience, we don’t need anything more than a $50 computer. It just is a different and uplifting experience when your computer is smaller than a mouse, does not have an “all sealed” appearance (you can see all its parts), and yet, is capable of elucidating most of the material in a course such as this. Yes, we are talking about the Raspberry Pi, Model B. Each student will be required to have one for use in this course. We shall come up with innovative ideas to program them individually as well as to “gang them up” to do interesting new things.
It is my hope that we can all publish our projects using the Pi, thus providing students around the world great ideas for active learning. This part of this course will be required of all of you, and the credits will flow into some of your homeworks.