Summary
This course is intended for graduate students who would like to learn about core programming tools and their theoretical underpinnings.
Description
Programming is difficult - some of the problem developers face include
-
How can a project be structured so that several developers can work on it concurrently?
-
How can the building of a project be automated?
-
How can a program be written to make it portable?
-
How can a program be prototyped efficiently?
-
How can a program be tested and debugged efficiently?
-
How can the performance of a program be increased?
Using the right tools can solve these problems. Examples include tools for version control, documentation, program building and configuration, automatic testing, program analysis, and integrated development.
Our approach will be to introduce a specific problem, show how a tool can solve the problem, and then develop the technical principles underlying the tool. We will have written homework problems as well as coding exercises for each concept.
The class will have a major design project that will begin at the start of the term. The design project will be to implement a complete web service - e.g., a geo-aware blogging site, a quote server, a price history tracker - on Google
AppEngine. Use of the tools will be a required part of the project.
We will use open-source tools to illustrate these concepts. The specific tool stack is described in the lectures section of this document. I selected these tools based on my experience at Google; they also power many state-of-the-art commercial projects.Rational for the course
A number of our courses (e.g., in CAD, and Networking) have a significant programming component, but tools are not their focus. As a result, our students tend to learn tools in an ad hoc fashion, and lack a deep understanding of their capabilities and how they work. The tools we will cover are used on a daily basis by developers, so it's important that they have real mastery over them.
There are relatively few classes at the graduate level which focus exclusively on programming.
EPL is one example - it's primary focus is object-oriented programming, rather than tools; it is an extremely popular class. Distributed computing is another - the emphasis there is on the theoretical foundations of concurreny programming, and less on tools/implementation.
Text
The course will be largely based on the text
Essential Open Source Toolset, by Andreas Zeller and Jens Krinke. It is short, to-the-point, and well-written. Each chapter includes examples, exercises, and further reading.
The course will include a series of guest lectures on tools flows in industrial settings and conclude with student project presentations.
Lectures
| Topic | Tools | Theory |
|
Version Control
|
svn, patch, diff
|
Edit distance - Miller-Myers algorithm; semantic diffs |
| Parsing
|
antlr |
LR parsing |
|
Building
|
ant |
Dependency analysis |
|
Documentation
|
Javadoc, doxygen |
Unified Markup Language, Literate programming |
|
Prototyping
|
Perl
|
Regular expressions & finite automata |
|
Testing & regression
|
Junit, Bugzilla |
Specification logics, test prioritization |
|
Debugging
|
Eclipse Java debugger, Findbugs, valgrind |
Static analysis, Delta debugging |
|
Performance analysis
|
Eclipse memeory analyzer, Visualvm |
Java collections, statistical sampling |
|
Coverage
|
Hansel, Jcoverage |
Contrast with formal methods, code inspections |
| Web Services |
Tomcat, MySQL, Google Web Toolkit |
Document Object Model, SQL, model-view-controller |
|
Guest lecturers
|
To Be Determined |
|
|
Student projects
|
N.A. |
|
(More details are given in the reading list below.)
Prerequisites
You should be very comfortable with Java and the Unix shell, and should have written some C/C++ code. At a minimum, you should have written programs in Java in which your contribution is at least 1000s of lines long. There will be no programming language review in this class. Almost all development will be in Java, with a few exceptions, e.g., Perl to illustrate interpreted languages, and C to illustrate memory checking with Valgrind.
To help you better gauge your preparedness, I will make available
several programming problems - if you can solve them with ease, you are ready for this class; if not, take
360C or
EPL first.
Rougly, these problems will be:
- Given a text file containing a name and a salary, write in one line a shell script which ouputs the names in sorted order, and has a running total on salaries
- Implement a program which gets stock quotes from finance.yahoo.com every 5 minutes and sends email to the address specified on the command line if the current quote is 5% greater than the average of the previous 10 quotes
Format/Evaluation
Assignments will consist of a written component as well as implementation; these will constitute 35% of your grade. One in-class midterm will be worth 25%. A final project will make up the remainder of your grade. There is no final exam.
Homeworks
There will be roughly 10 homeworks. Each will have both a written component as well as implementation. A written question might be "draw the DFA accepting the tokens specified by the following lex rules" or "how would you extend diff to take language semantics into account". Implementation questions will focus on getting you up to speed with the tool we covered that week, e.g., you may be asked to carry out program maintanence using SVN, or write a set of tests in Junit.
A sample homework is provided at the end of this document.
Project
You will design and implement a solution to a problem that I will specify at a fairly abstract level at the start of term. Examples of potential projects are
- Implement a personal blog using Google Appengine and Google Web Toolkit. The blog should allow for users to post messages on their own blogs, and comments on others blogs. Users should also be able to post images to their blog. On the client side, postings have the option of being annotated with GPS coordinates, derived automatically from the OS if possible (iPhone OS 3.0 has this option). The server should be able to call out to Google Maps to create a spatial and temporal history of the posts.
- Implement a web-based auction house. Inputs are offers to buy and offers to sell. Each offer includes a price and a quantity. The auction house has to match buyers and sellers, and ensure funds are updated.
Projects will be done in pairs; I may allow single- or three-person projects if appropriate.
Reading List
- Version Control - ET Chapter 2-4
- "Impact of software engineering research on the practice of software configuration management", J. Estublier, et al., ACM Trans. Softw. Eng. Methodoly, 2005
- Building - ET Chapter 8
- "A framework for selective recompilation in the presence of complex intermodule dependencies", C. Chambers, et al., ICSE 1995
- Documentation - ET Chapter 10
- Literate Programming, D. Knuth, 1992
- Testing and Regression - ET Chapters 12-14
- "Effectively prioritizing tests in development environment", A Srivastava and J. Thiagarajan, ISSTA 2002
- Prototyping
- "Scripting: higher level programming for the 21st century", J. Ousterhout, IEEE Computer 1998
- Debugging - ET Chapter 15
- "Yesterday, my program worked. Today, it does not. Why?", A. Zeller, SIGSOFT Softw. Eng. Notes, 1999
- "A Comparison of Bug Finding Tools for Java", N. Rutar, et al., ISSRE 2004
- Coverage - ET Chapter 16
- "An empirical investigation of program spectra", M. Harold, et al., PASTE 1998
- "A survey of coverage based testing tools", Q. Yang, et al., AST 2006
- Parsing - ET Chapters 5-7
- Programming Language Pragmatics, M. Scott, 3rd ed., 2009 - Chapter 2 (Programming Language Syntax)
Instructor
Adnan Aziz
http://www.ece.utexas.edu/~adnanACE 6.120, The University of Texas, Austin TX 78712Comments from earlier students
"It is a very useful class. For a wide variety of tools, it forces the student past that first learning hurdle so that he/she understands the tool's benefits and challenges."
"A working familiarity with the powerful tools available to programmers (which go way beyond the point-and-click interfaces many are used to) can dramatically boost performance, not to mention the sheer enjoyment of the job. This course is a great introduction to many of those tools, from an open source angle (because, let's face it, that's usually where the best tools are found). Some of the topics aren't for the faint of heart, like parsing with lex and yacc, but all will be useful building blocks in any serious programmer's toolbox."
"It was one of my favorites. And it's good to see I'm still on the email list! :-)"
Sample homework: version control
Written exercises
- List pros and cons for textual, syntactical and semantical integration. How can disadvantages of textual integration be avoided? (For example, think of comments in source code, or different formatting styles.)
- Why are deltas in the branching in the revision tree stored out forward and not backward? What advantages and disadvantages would a backward delta have?
- Your company would like to bring out the system you have run up to now under UNIX in a new WINDOWS variant. Four alternatives are available for organizing the new variant:
- Set up two separate RCS archives for both variants.
- Set up a common RCS archive, where a branch is used for the management of the WINDOWS variant.
- Merge both variants into a common file, controlled by RCS. One of these two variants is selected at compilation time via the C preprocessor (CPP) by ”#ifdef … #endif”.
- Encapsulate all system-relevant parts into RCS-controllsed subsystems selected for UNIX or WINDOWS when creating the program.
Consider the advantages and disadvantages of both approaches. To what extent can the schemes be extended by further variants? For example, think about the several UNIX variations or a variant for .NET as a further variant of WINDOWS.
- After nearly being eaten by a carnivorous plant, Seymour wants to rename the file plantshop.h into horrorshop.h. There are two ways of doing this: Audrey suggests to add a new file horrorshop.h with the latest contents of plantshop.h and to delete plantshop.h: “This way, we can always recreate any configuration.” His friend Arthur suggests to rename the file within the CVS archive from plantshopt.h,v to horrorshop.h,v “This way, the version history gets preserved”. What should Seymour do? Why?
Implemention
- Implement the Miller-Myers and LCD algorithms for diff and experiment on the provided benchmarks.
- Get up to speed with SVN
- Create an SVN repository for the given source code
- Change the source code by translating printed strings from English to German
- Use svn diff to investigate the differences between the version - investigate the effects of different options
- Commit the changes to the archive
- Create a patch for the changes, and confirm it works
- Create a new patch with translation into French
- Rename one of the source files in the English version