CSE 402: Design and Implementation of Domain-Specific Languages

Winter 2020, MWF 8:30am-9:20am, CSE2 G001

Why domain-specific languages

No revolution in computing occurred the help from a programming language.  For example, Unix was implemented with C, databases were programmed with SQL, and the Web was built on JavaScript.

Programming languages help programmers by placing a layer of abstraction above the messy hardware. Languages are most effective when they are tailored to a programming task:

  • One such task is compiling a project (here we may want to use make or Ant).  
  • Another task is creating a hardware design (here we may use VHDL or Verilog).

We call such specialized languages domain-specific, and we often choose them over general-purpose languages because they are easier to use or because their programs run faster, or both.

It's hard to find an area of computing that doesn't critically rely on a DSL or two. You will find DSLs in

  • machine learning (PyTorch and TensorFlow),
  • data analytics (SQL and LINQ),
  • parallel computing (Hadoop and Spark),
  • graphics and vision (Cuda and Halide),
  • scripting (bash and perl),
  • web programming (jquery and React),
  • document preparation (latex and postscript),
  • data visualization (d3 and vega),
  • and many other areas.

Some DSLs are compiled, others come with an interpreter, while others are implemented as libraries, frameworks, or marketed as an API.  No matter what their implementation, all DSLs have in common that they offer the programmer convenient programming constructs tailored to the domain.

What you will learn

You will learn how to design and implement a domain-specific language. We will cover several strategies for implementing a DSL, and practice most of them in programming assignments.  All told, you will implement a compiler, an interpreter, a parser, and an optimizer.

Like CSE401, this course is unique because it spans from foundations to their applications. In other words, you will learn about beautiful algorithms and apply them in programming assignments.

The course material is split into five thematic parts, each describing some DSL technology.  The lectures and homework are motivated by a real-world application.  For example, optimization (Part 2) is motivated by query processing, and parsing (part 3) is motivated by templating, which is the process of translating a data set into a formatted document.

While the programming assignments will focus on the domain-specific languages, you will learn principles useful in implementing a traditional compiler (for example, you will learn how to use the ANTLR parser).

Students interested in a course project that implements a general-purpose compiler should consider taking CSE401, typically taught by Prof. Perkins. Please note that students are allowed to take both CSE401 and CSE402.

Course format

Lectures and sections: The course will have three lectures per week. You will practice the material in recitation sections and in programming homeworks.

Exams: Instead of the two midterms, we will give four shorter in-class quizzes.  There will be roughly one quiz for each of the five thematic parts of the course. We will distribute practice questions from past quizzes.

Homeworks: Five programming homeworks will help you understand the material and acquire the skills needed for the final project. Except for the first homework, you will work on your homework in groups.

The final project: You will acquire the hands-on skills of designing a DSL in a group project. You will implement a language based on your own design, or based on a design offered by the course staff.  (See the gallery of past projects.) You will present your project in a demo/poster session, which replaces the final exam.

Homeworks

Quizzes

Final project

Participation

40%

30%

25%

5%

Late days: you can have up to three late days on homeworks HW2 to HW4. There is a 15% penalty for each additional late day.  There are no late days on HW5 (the final project).  

Syllabus and lecture materials

The following syllabus is preliminary. We will fine tune the schedule add add links to Winter 2020 slides as the course progresses. You can always download the up-to-date slides from these Dropbox folders: 20wi, 19sp.

Part I: Implementation Strategies / Regular Expressions (problem set)

Jan 6: Introduction

  • The revolutionary role of PLs
  • How to implement a language

Jan 8: cancelled

Jan 9: Section 1

  • fluent regexes (HW1)

Jan 10: designing DSL abstractions (assigned reading)

  • top-down and bottom-up design of DSL abstractions
  • embedding DSLs in a host language
  • the rake DSL for project builds

Part II: Optimizations / Query Languages (problem set)

Jan 13: cancelled (winter storm)

Jan 16: Constructing and optimizing ASTs 

  • Constructing the AST with call chaining
  • Optimization

Jan 17: Section 2: Optimizing ASTs, Homework 2 information

  • Handout 
  • Optimization with global dataflow analysis

Part III: Parsing / Templating (problem set and past quizzes)

  • Listeners and visitors
  • eliminating left recursion with grammar predicates
  • templating
  • how to make a DSL extensible
  • error checking
  • compile-time and run-time data structures
  • late binding  
  • Regular expressions are not regexes
  • syntax-directed translation of REs to NFAs

Part IV: Reactive Languages, Laziness / Working with Data  (problems)

  • Rx, a reactive language
  • continuation-passing style (CPS)
  • case study: call chaining for reactivity in JS
  • bootstrapping a compiler
  • tombstone diagrams
  • hiding an exploit in a compiler
  • links to optional reading are included in the slides
  • semantics and applications of PPLs
  • assigned reading is included in the slides  

Part V: Types and Objects, Garbage Collection, Staging (problems)

  • Implementing objects with meta-programming (read Ch.16)
  • Objects in a statically typed language
  • V8 JIT compiler
  • Garbage Collection
  • What to keep learning about DSLs

Sections

Two sections are offered:

Here you will find links to section materials.  You need to log in with your UW CSE google id (not your UW id).  Currently, the section topics are

Homeworks and the final project

There will be four programming homeworks. The first one will be individual, while the rest will be done in pairs.  

The final project is also known as the “fifth homework”.  You will design and implement a small language of your choice. Typically, this design will extend the four homeworks.

The homework handouts for HW1--4 will be “live” google docs on which students can ask clarification questions by adding comments to the text. The discussion about the homework will thus take place in the doc, not on Piazza. Use your <username>@cs.washington.edu google login to get to these documents.

  • HW5.1 (proposal, due 1/28/2020)
  • HW5.2 (final proposal, due 2/11/2020)
  • HW5.3 (implementation design, due 2/25/2020)
  • HW5.4 (report, due 3/16/2020)
  • HW5.5 (demo session, 3/17/2020, 8:30am, the G01 classroom)
  • HW5 checklist

Week

Mon

Tue

Wed

Thu

Fri

1

Jan 6
First Lecture

2

Jan 13


Jan 15
HW1 Due

3

Jan 20

Jan 22

Jan 24
HW2 Due

4

Jan 27

Jan 28
HW5.1 Due

Jan 29

Quiz 1

5

Feb 3

HW3.1 Due


Feb 7

Quiz 2

6

Feb 10

Feb 11
HW5.2 Due

Feb 14
HW3.2 Due

7

Feb 17
Presidents Day

Feb 19
Quiz 3

8

Feb 24

Feb 25

HW5.3 Due

9

Mar 2
HW4 Due

10

Mar 9

Mar 11
Quiz 4

Mar 13
Last lecture

11
Finals week

Mar 16
HW5.4 Due

Mar 17
demos
8:30-10:20am


Exams and key dates

Instead of midterms, we will test your knowledge with four quizzes.  They will be 25-minutes long and will be held during lectures. Quizzes will be given about a week after the relevant HW is due.  You should use this week to work on your problem sets and review the feedback for your HW.

The poster and demo session will replace the final exam and will be held during the assigned final exam slot (Tuesday March 17, 8:30-10:20am). Location: The lecture room (CSE2 G001).

Recommended Books

These five books are available on reserve for you in the Engineering Library:

JavaScript, the good parts

JavaScript, the Definitive Guide

  • JS is our implementation language

Mastering Regular Expressions

The Definitive ANTLR 4 Reference

Programming in Lua

  • how to implement objects; how to use coroutines

These online resources will also be useful:

Where to ask questions?

Questions about lectures?  Ask them on the Piazza 402 page.  

Questions about homeworks: Add comments to the handout docs.

Questions on anything else:  Also on Piazza.

A private message to the instructors:  Send it via Piazza.

Any of the above: Come to our office hours.

Where are 402 announcements?

You can find all announcements on Piazza, in the Piazza tab called Resources / Course Information. Check that page often or subscribe to email notifications from piazza.

The course staff and office hours

Ras Bodik:

  • M/W/F right after the lecture 9:20am.
  • by appointment, contact me by email.

Krzysztof Drewniak,

Email: krzysd@cs.washington.edu

Office hours: Wednesday, 1:30 PM - 2:30 PM, CSE2 150

Office hours also available by appointment

Academic Misconduct

This course follows the Allen School academic misconduct rules. Students are expected to be familiar with these rules and principles, and follow them.

https://www.cs.washington.edu/academics/misconduct