1 of 55

CoursePlan Requirement Computation

dev-sam

2 of 55

Some history

In 2019 FA, every old subteam present their architecture and interesting design decisions in DevSeshes.

CoursePlan and Carriage were still new and their codebases have not stabilized yet…

Now it’s a good time to reflect on 1.5 years of CoursePlan development.

3 of 55

Why you should care about the topic?

It’s usually good to know what other subteams are doing, if you are considering switching teams.

The journey of designing CoursePlan requirement computation algorithm teaches you something important about system design.

4 of 55

Your imagination

5 of 55

The reality: problem statement

Given

  • college, major, minor
  • user provided courses
  • AP/IB/Transfer credits

determine the requirement fulfillment progress.

function computeRequirementProgress(

college: College,

majors: readonly Major[],

minors: readonly Minor[],

coursesTaken: readonly Course[],

examCredits: readonly ExamCredit[],

transferClasses: readonly Course[],

): readonly FulfilmentProgress[] {

// ...

}

🤯

6 of 55

Act I: Start Simple

7 of 55

Observe the CS core requirements

Some requirements are easy to check:

Hardcode a list of classes that can satisfy it.

Actually we can even do better!

8 of 55

Include/Exclude Requirement JSON

9 of 55

Some requirements are not about class code!

Class Roster API to the rescue! ⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒

10 of 55

“search” field in requirement JSON

Direct us what kind of data to look for in the roster API.

11 of 55

Sub-requirements

Some requirements have multiple options to fulfill them.

Without this sub-array, if you take CS 3410 and CS 3420, your progress will be 2/5 instead of 1/5.

12 of 55

and we happily solved all problems 🎉

13 of 55

Sign-in Link

14 of 55

and we happily solved all problems 🎉

Fact check:

The above statement is COMPLETELY FALSE.

15 of 55

Act II: Oof

16 of 55

Oof 1: DDoS Roster API

Requirement checking happens on frontend.

To check whether a class satisfies a requirement (more complicated one like liberal studies), we fetch roster API.

With some additional bad code, we make O(m*n) calls to the roster API every time we check requirements.

17 of 55

Oof 2: More complicated requirements

Some requirements have different sub-requirements depending on your strategy.

e.g. in Biological Science, you can do CHEM 2070 + 2080 or just CHEM 2150.

Refactoring it means changing the 2000+ json.

(BTW, JSON doesn’t support comments)

18 of 55

Oof 3: No double counting detection

Many requirements can’t be double counted, but some can.

If we falsely report some requirements are done but actually they are not, and caused students to pay tuition for an extra semester, then …

The infra is really not ready!

19 of 55

Act III: Pre-computation

20 of 55

We really need to prevent DDoS-ing roster API

C1

C2

C3

R1

R2

R3

Total roster fetch: 9

Complexity: O(mn)

Every pair of C-R check requires a fetch.

Can we do better?

C1

C2

C3

R1

R2

R3

roster

Total roster fetch: 3

Complexity: O(n)

21 of 55

Now we have O(n) fetches? Can be do better?

  • Fetching info from roster has nothing to do with user courses!
  • We maintain a hardcoded list of requirements

We should pre-fetch all the info from roster!

Problem: All the roster info combined is > 100MB.

Solution: we should pre-compute a list of satisfying courses!

C1

C2

C3

R1

R2

R3

roster

Look at this picture more closely 👀

22 of 55

Pre-computing satisfying course list

R1

R2

R3

roster

R1

R2

R3

C1, C2

C3

C1, C2, C3

R1

R2

R3

You start with requirements.

Using the requirement spec to fetch relevant data from roster

Using fetched data from roster to decide a list of satisfying courses.

Fact: this computed courses list with requirements is less than 800K!

23 of 55

Act IV: Scaling requirement specs

24 of 55

Storing the spec in JSON doesn’t scale

v1: initial

include: [“CS 211*”, “CS 3110”, ...], exclude: [...]

v2: lookup data with “search”

search: [“code”], include: [...], exclude: [...]

search: [“description”], include: [“history”]

V3: sub-requirements

include: [[“CS 211*”], [“CS 3110”], ...], exclude: [...]

v4: ??? 🤯

v5: ??? 💀

Observation:

As we gradually make the JSON more expressive, it becomes closer and closer to a programming language, where the code that processes the JSON acts as an interpreter

25 of 55

Think beyond the current codebase

Now we start to pre-compute the requirements.

All we care about is a list of courses that can satisfy a requirement. i.e.

function getSatisfyingCourses(requirement: Requirement): readonly Course[] { /* … */ }

The inner workings of how you produce such list is irrelevant.

Instead of recording “search”, “includes”, “excludes” and other helper data, we only record:

checker”: (course: Course): boolean { /* … */ }

26 of 55

Examples of pre-computation under new setup

{ name: “Intro to programming”, checker: (c) => [“CS 1110”, “CS 1112”].includes(c.code) }

{ name: “Intro to programming”, courses: [“CS 1110”, “CS 1112”] }

{ name: “FWS”, checker: (c) => c.category.includes(“FWS”) }

{ name: “FWS”, courses: [“HIST 1200”, “ENGL 1234”, ...] }

Now we have a much stabler requirement JSON interface for the rest of the system!

27 of 55

and we happily solved all problems 🎉

28 of 55

and we happily solved all problems 🎉

Fact check:

The above statement is partially false.

29 of 55

Act V: Scaling requirement checking

30 of 55

Recap

We avoided DDoS-ing Cornell Roster API

We make requirement specification much more principled and maintainable.

Huge success!

But we only solved the problem for half of the system: requirement data generation part.

The requirement checking part is still in a big mess.

31 of 55

Hard problems with requirement checking

  • Detecting illegally double-counted courses
  • Requirements with multiple fulfillment strategies
  • AP/IB/Transfer Credits
  • Crosslisted courses

A simple structure like {“req1”: [“course1”, “course2”], “req2”: [“course2”], … } doesn’t scale.

We need better abstractions!

32 of 55

Problem of double counting

Suppose we already build some requirement to course mapping like

{“req1”: [“course1”, “course2”], “req2”: [“course2”], … }

But imagine req1 and req2 don’t allow double-counting. So we need to report that course2 is illegally double counted.

Simple solution: build a reversed map!

{“course1”: [“req1”], “course2”: [“req1”, “req2”], … }

Now we can clearly see course2 is double-counted!

33 of 55

Let’s look more closely to this structure...

{“req1”: [“course1”, “course2”], “req2”: [“course2”], … }

{“course1”: [“req1”], “course2”: [“req1”, “req2”], … }

Does this remind you some data structure you learned in CS classes?

e.g. CS 2110/2

This is a graph! More specifically, a bipartite graph between requirements and courses!

course1

course2

req1

req2

34 of 55

Graph is a good abstraction for our problem

An edge between requirement R and course C means that C can be used to satisfy requirement R.

We can first build a coarse graph, and gradually refine it.

35 of 55

Setup

We have three requirements to consider:

  • CS3410/CS3420, which can be satisfied by two strategies:
    • Strategy 1: [CS 3410]
    • Strategy 2: [CS 3420]
  • Probability
    • Can be satisfied by MATH 4710
  • Elective
    • Can be satisfied by everything
  • NO DOUBLE COUNTING

User takes: CS 3410, CS 3420, MATH 4710

36 of 55

Example Walkthrough

37 of 55

Phase 1: Building the graph naively

CS 3410/CS3420

Probability

Elective

CS 3410

CS 3420

MATH 4710

Now we completed phase 1

38 of 55

Phase 2: Consider User’s Fulfillment Strategy Choice

CS 3410/CS3420

Probability

Elective

CS 3410

CS 3420

MATH 4710

Suppose user chooses the [CS 3410] strategy for CS 3410/CS3420.

39 of 55

Phase 2: Consider User’s Fulfillment Strategy Choice

CS 3410/CS3420

Probability

Elective

CS 3410

CS 3420

MATH 4710

Suppose user chooses the [CS 3410] strategy for CS 3410/CS3420.

40 of 55

Phase 2: Consider User’s Fulfillment Strategy Choice

CS 3410/CS3420

Probability

Elective

CS 3410

CS 3420

MATH 4710

Suppose user chooses the [CS 3410] strategy for CS 3410/CS3420.

x

41 of 55

Phase 2: Consider User’s Fulfillment Strategy Choice

CS 3410/CS3420

Probability

Elective

CS 3410

CS 3420

MATH 4710

Now we complete phase 2.

42 of 55

Phase 3: Detect Illegal Double Counting

CS 3410/CS3420

Probability

Elective

CS 3410

CS 3420

MATH 4710

User’s double-counting breaking choice:

(cs 3410, cs 3410/cs 3420)

Illegal double-counting detected!

43 of 55

Act VI: Remaining Problems

44 of 55

Representing the graph

You might do something like this in Java:

class Graph {

private final Map<Requirement, Set<Course>> req2CourseMap = new HashMap<>();

private final Map<Course, Set<Requirement>> course2ReqMap = new HashMap<>();

}

Potential problem: how to implement equals and hashCode for course and requirements?

Aka: Define the the notion of equality between two requirements/courses.

45 of 55

Equality of requirements

Easy answer: every field of two requirement object must be completely equal.

Concern: expensive to check equality and compute hashCode, bad for performance.

Better idea: give every requirement an unique ID, and compare ID directly.

46 of 55

Equality of courses

Give every course an unique ID?

Actually course roster already has it for us.

What’s more, crosslisted courses share the same course ID!

If we use course ID from the roster, the problem of accounting for cross listed courses is automatically solved!

For AP/IB/Transfer credits, just generate a dummy course object with the same course ID as an equivalent course!

47 of 55

Recap

48 of 55

Recap in words

  1. Write down the specification of each requirement using checker
  2. Run the checkers on fetched roster data to get a list of satisfying courses
  3. Use the satisfying courses data to build a initial coarse requirement fulfillment graph
  4. Use user’s choices to refine the requirement fulfillment graph
  5. Now we have all the requirement fulfillment status, and a list of double counted courses!

49 of 55

Recap in picture

R1

R2

R3

R1

R2

R3

roster

R1

R2

R3

C1, C2

C3

C1, C2, C3

R1

R2

R3

C1

C3

R1 is being satisfied by C1

R2 is being satisfied by C1

R3 is being satisfied by C3

C1 is being double counted!

R1

R2

R3

C1

C3

50 of 55

and we happily solved all problems 🎉

51 of 55

and we happily solved all problems 🎉

Fact check:

The above statement is still partially false.

52 of 55

Unsolved problems

  • How to check all aspects of liberal arts requirement in engineering
    • 18 credits, 6 courses, 3 categories
  • How to handle information science concentration requirement
  • How to make self-check requirements more useful
  • etc

53 of 55

Most important takeaway

54 of 55

Don’t just add more if-else branches to hack around problems.

Instead, find better abstractions!

55 of 55

Sign-in Link