CoursePlan Requirement Computation
dev-sam
Some history
In 2019 FA, every old subteam present their architecture and interesting design decisions in DevSeshes.
CoursePlan and Carriage were still new and their codebases have not stabilized yet…
Now it’s a good time to reflect on 1.5 years of CoursePlan development.
Why you should care about the topic?
It’s usually good to know what other subteams are doing, if you are considering switching teams.
The journey of designing CoursePlan requirement computation algorithm teaches you something important about system design.
Your imagination
The reality: problem statement
Given
determine the requirement fulfillment progress.
function computeRequirementProgress(
college: College,
majors: readonly Major[],
minors: readonly Minor[],
coursesTaken: readonly Course[],
examCredits: readonly ExamCredit[],
transferClasses: readonly Course[],
): readonly FulfilmentProgress[] {
// ...
}
🤯
Act I: Start Simple
Observe the CS core requirements
Some requirements are easy to check:
Hardcode a list of classes that can satisfy it.
Actually we can even do better!
Include/Exclude Requirement JSON
Some requirements are not about class code!
Class Roster API to the rescue! ⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒
“search” field in requirement JSON
Direct us what kind of data to look for in the roster API.
Sub-requirements
Some requirements have multiple options to fulfill them.
Without this sub-array, if you take CS 3410 and CS 3420, your progress will be 2/5 instead of 1/5.
and we happily solved all problems 🎉
Sign-in Link
and we happily solved all problems 🎉
Fact check:
The above statement is COMPLETELY FALSE.
Act II: Oof
Oof 1: DDoS Roster API
Requirement checking happens on frontend.
To check whether a class satisfies a requirement (more complicated one like liberal studies), we fetch roster API.
With some additional bad code, we make O(m*n) calls to the roster API every time we check requirements.
Oof 2: More complicated requirements
Some requirements have different sub-requirements depending on your strategy.
e.g. in Biological Science, you can do CHEM 2070 + 2080 or just CHEM 2150.
Refactoring it means changing the 2000+ json.
(BTW, JSON doesn’t support comments)
Oof 3: No double counting detection
Many requirements can’t be double counted, but some can.
If we falsely report some requirements are done but actually they are not, and caused students to pay tuition for an extra semester, then …
The infra is really not ready!
Act III: Pre-computation
We really need to prevent DDoS-ing roster API
C1
C2
C3
R1
R2
R3
Total roster fetch: 9
Complexity: O(mn)
Every pair of C-R check requires a fetch.
Can we do better?
C1
C2
C3
R1
R2
R3
roster
Total roster fetch: 3
Complexity: O(n)
Now we have O(n) fetches? Can be do better?
We should pre-fetch all the info from roster!
Problem: All the roster info combined is > 100MB.
Solution: we should pre-compute a list of satisfying courses!
C1
C2
C3
R1
R2
R3
roster
Look at this picture more closely 👀
Pre-computing satisfying course list
R1
R2
R3
roster
R1
R2
R3
C1, C2
C3
C1, C2, C3
R1
R2
R3
You start with requirements.
Using the requirement spec to fetch relevant data from roster
Using fetched data from roster to decide a list of satisfying courses.
Fact: this computed courses list with requirements is less than 800K!
Act IV: Scaling requirement specs
Storing the spec in JSON doesn’t scale
v1: initial
include: [“CS 211*”, “CS 3110”, ...], exclude: [...]
v2: lookup data with “search”
search: [“code”], include: [...], exclude: [...]
search: [“description”], include: [“history”]
V3: sub-requirements
include: [[“CS 211*”], [“CS 3110”], ...], exclude: [...]
v4: ??? 🤯
v5: ??? 💀
Observation:
As we gradually make the JSON more expressive, it becomes closer and closer to a programming language, where the code that processes the JSON acts as an interpreter
Think beyond the current codebase
Now we start to pre-compute the requirements.
All we care about is a list of courses that can satisfy a requirement. i.e.
function getSatisfyingCourses(requirement: Requirement): readonly Course[] { /* … */ }
The inner workings of how you produce such list is irrelevant.
Instead of recording “search”, “includes”, “excludes” and other helper data, we only record:
“checker”: (course: Course): boolean { /* … */ }
Examples of pre-computation under new setup
{ name: “Intro to programming”, checker: (c) => [“CS 1110”, “CS 1112”].includes(c.code) }
⇒
{ name: “Intro to programming”, courses: [“CS 1110”, “CS 1112”] }
{ name: “FWS”, checker: (c) => c.category.includes(“FWS”) }
⇒
{ name: “FWS”, courses: [“HIST 1200”, “ENGL 1234”, ...] }
Now we have a much stabler requirement JSON interface for the rest of the system!
and we happily solved all problems 🎉
and we happily solved all problems 🎉
Fact check:
The above statement is partially false.
Act V: Scaling requirement checking
Recap
We avoided DDoS-ing Cornell Roster API
We make requirement specification much more principled and maintainable.
Huge success!
But we only solved the problem for half of the system: requirement data generation part.
The requirement checking part is still in a big mess.
Hard problems with requirement checking
A simple structure like {“req1”: [“course1”, “course2”], “req2”: [“course2”], … } doesn’t scale.
We need better abstractions!
Problem of double counting
Suppose we already build some requirement to course mapping like
{“req1”: [“course1”, “course2”], “req2”: [“course2”], … }
But imagine req1 and req2 don’t allow double-counting. So we need to report that course2 is illegally double counted.
Simple solution: build a reversed map!
{“course1”: [“req1”], “course2”: [“req1”, “req2”], … }
Now we can clearly see course2 is double-counted!
Let’s look more closely to this structure...
{“req1”: [“course1”, “course2”], “req2”: [“course2”], … }
{“course1”: [“req1”], “course2”: [“req1”, “req2”], … }
Does this remind you some data structure you learned in CS classes?
e.g. CS 2110/2
This is a graph! More specifically, a bipartite graph between requirements and courses!
course1
course2
req1
req2
Graph is a good abstraction for our problem
An edge between requirement R and course C means that C can be used to satisfy requirement R.
We can first build a coarse graph, and gradually refine it.
Setup
We have three requirements to consider:
User takes: CS 3410, CS 3420, MATH 4710
Example Walkthrough
Phase 1: Building the graph naively
CS 3410/CS3420
Probability
Elective
CS 3410
CS 3420
MATH 4710
Now we completed phase 1
Phase 2: Consider User’s Fulfillment Strategy Choice
CS 3410/CS3420
Probability
Elective
CS 3410
CS 3420
MATH 4710
Suppose user chooses the [CS 3410] strategy for CS 3410/CS3420.
Phase 2: Consider User’s Fulfillment Strategy Choice
CS 3410/CS3420
Probability
Elective
CS 3410
CS 3420
MATH 4710
Suppose user chooses the [CS 3410] strategy for CS 3410/CS3420.
Phase 2: Consider User’s Fulfillment Strategy Choice
CS 3410/CS3420
Probability
Elective
CS 3410
CS 3420
MATH 4710
Suppose user chooses the [CS 3410] strategy for CS 3410/CS3420.
x
Phase 2: Consider User’s Fulfillment Strategy Choice
CS 3410/CS3420
Probability
Elective
CS 3410
CS 3420
MATH 4710
Now we complete phase 2.
Phase 3: Detect Illegal Double Counting
CS 3410/CS3420
Probability
Elective
CS 3410
CS 3420
MATH 4710
User’s double-counting breaking choice:
(cs 3410, cs 3410/cs 3420)
Illegal double-counting detected!
Act VI: Remaining Problems
Representing the graph
You might do something like this in Java:
class Graph {
private final Map<Requirement, Set<Course>> req2CourseMap = new HashMap<>();
private final Map<Course, Set<Requirement>> course2ReqMap = new HashMap<>();
}
Potential problem: how to implement equals and hashCode for course and requirements?
Aka: Define the the notion of equality between two requirements/courses.
Equality of requirements
Easy answer: every field of two requirement object must be completely equal.
Concern: expensive to check equality and compute hashCode, bad for performance.
Better idea: give every requirement an unique ID, and compare ID directly.
Equality of courses
Give every course an unique ID?
Actually course roster already has it for us.
What’s more, crosslisted courses share the same course ID!
If we use course ID from the roster, the problem of accounting for cross listed courses is automatically solved!
For AP/IB/Transfer credits, just generate a dummy course object with the same course ID as an equivalent course!
Recap
Recap in words
Recap in picture
R1
R2
R3
R1
R2
R3
roster
R1
R2
R3
C1, C2
C3
C1, C2, C3
R1
R2
R3
C1
C3
R1 is being satisfied by C1
R2 is being satisfied by C1
R3 is being satisfied by C3
C1 is being double counted!
R1
R2
R3
C1
C3
and we happily solved all problems 🎉
and we happily solved all problems 🎉
Fact check:
The above statement is still partially false.
Unsolved problems
Most important takeaway
Don’t just add more if-else branches to hack around problems.
Instead, find better abstractions!
Sign-in Link