1
CS61B, Fall 2023 @ UC Berkeley
Software Engineering I
Lecture 27
Background
Lecture 27, CS61B, Fall 2023
Background
Complexity
Strategic vs Tactical Programming
Retool
Commit History Traversal
Summary
Background
I’m the guest lecturer for today, Aniruth Narayanan.
Q & A will be at the very end.
Disclaimers
Some important disclaimers:
Scale
In 61A, you were focused on the correctness of a program.
In 61B, you are focused on larger scale projects, but still have to abide by many of our requirements and specifications (e.g. functions, runtime, etc.).
Working on small projects isn’t the same as working on large scale, design-oriented programs where the task is defined by you.
The best way to learn these differences is through experience; this lecture will have some examples and some light theory as a sampling.
Project 3 allows you to embrace the challenge of large scale yourself.
Complexity
Lecture 27, CS61B, Fall 2023
Background
Complexity
Strategic vs Tactical Programming
Retool
Commit History Traversal
Summary
Restrictions of Engineering
In other disciplines, we’re limited by imperfect materials.
Objects can’t be infinitely dense, or infinitely light.
Many fields are constrained:
In computer science, we’ve solved most of the underlying material constraints decades ago.
The Power of Software
Early computers were used for limited tasks.
This is a picture of ENIAC, the first general-purpose programmable computer that was 1,000 times faster than anything else at the time.
The Power of Software
Rollercoaster Tycoon was an early game where players would design amusement parks.
99% of the code was written in Assembly, with 1% written in C.
This is very hard for programmers to reason about.
The game was designed to run on Intel Pentium CPUs, which were 66 MHz (66 million operations per second).
The Intel i9 14th Gen of today can run at up to 6GHz (6 billion operations per second). 100x more
The Power of Software
Most video games today require at least 3.5 GHz and 8 GB of RAM.
In 1996, the average computer had only 8 MB or 16 MB of RAM.
The average software product doesn’t require the computing power we have available today.
The limitation comes from the creative ways we plan and design what we’re building.
An individual programmer is no longer able to effectively manage the entire software system for a large project.
Any one programmer should only need to understand a fraction of the codebase.
Empirical Proof
Here’s an example from Spotify’s Engineering Blog of their approach to software. They have >1 billion lines of code, 60 million used in production.
They must carefully target and orchestrate changes using internal mechanisms and thorough test automation.
A Definition of Complexity
“Anything related to the structure of a software system that makes it hard to understand and modify it” - John Ousterhout, “A Philosophy of Software Design”
As programs become more feature-rich, their complexity increases. Our goal is to keep software simple.
Why? Complex systems require a lot of effort to make small improvements.
The Runtime of Complexity
Complexity scales exponentially with respect to functionality.
Each new piece of functionality has to interact with all the existing functionality in various ways, for all possible combinations.
Managing Complexity
There are two kinds of complexity:
In response to avoidable complexity, we can:
Strategic vs Tactical Programming
Lecture 27, CS61B, Fall 2023
Background
Complexity
Strategic vs Tactical Programming
Retool
Commit History Traversal
Summary
Tactical Programming
The focus is on getting something working quickly; workarounds are leveraged.
Ex: Code that contains many if statements to handle many separate cases that is hard to explain
Prototypes, proof-of-concepts leverage tactical programming. The goal is to show that something could theoretically work.
However:
Strategic Programming
A different form of programming that emphasizes long term strategy over quick fixes. The objective is to write code that works elegantly - at the cost of planning time.
Code should be:
If the strategy is insufficient, go back to the drawing board before continuing work.
Helper method strategy is key to leverage throughout projects, especially when we have written comprehensive tests to ensure that these methods are correct.
Retool
Lecture 27, CS61B, Fall 2023
Background
Complexity
Strategic vs Tactical Programming
Retool
Commit History Traversal
Summary
Retool
Retool is a platform to help build internal tools much faster.
I worked as a software engineer intern on the Connect team in Fall 2022.
One of the projects I worked on was incorporating missing fields into the resource creation system.
Building a Retool App
Image from Retool’s Documentation
Adding a Query
Image from Retool’s Documentation
Picking a Resource
Image from Retool’s Resources Documentation
Configuring a Resource
Image from Retool’s Resources Documentation
Designing Resource Configurations
Disclaimer: the method to implement resources was decided before I joined, but I needed to both understand and upgrade functionality.
The goal is to design a system for users to specify configuration information that allows Retool to connect to and retrieve information from a resource on the user’s behalf.
The naive implementation is to build a new page for each resource in its entirety.
Problems with this solution:
Designing Resource Configurations
The ultimate solution was to build a ResourceConfig - a way to describe properties of a resource with a few parameters using common patterns to generate inputs.
Not all resources were valid.
Missing Fields
When users put in their credentials, there’s an option to Test Connection with a sample request. If the validation function failed, this was grayed out.
My objective was to add a way to track missing fields and incorporate some validation for fields.
This way, a user can be informed what needs to be changed when making a resource fails.
A strategy question: How can we expand functionality to include this? What do we need to change?
Solution and Refactoring
The solution was to rethink validation as passing the missing fields test.
The fields were upgraded to integrate and generalize validation.
Testing
Testing in the real world is very important! There is no autograder to leverage to gain confidence in a solution.
Would the right missing fields be displayed?
This pattern is powerful - updating the utility updates the tests for all resources.
Resources (and libraries/dependencies) update all the time, sometimes with breaking changes. This is future-proof code.
Commit History Traversal
Lecture 27, CS61B, Fall 2023
Background
Complexity
Strategic vs Tactical Programming
Retool
Commit History Traversal
Summary
The Problem
As a hypothetical: let’s say I want to find which versions of my project contain a certain change. Some of these versions (represented by git commits) are special - they were deployed into the real world.
It’s a common practice in industry - it makes more sense to do predictable rollouts instead of deploying every commit.
Would you want to have 5 or 6 updates to your phone every day?
Apple and Google release new versions of their OS, browsers, apps, etc. in a cycle, with minor changes in between. They don’t release every single time a new commit is pushed internally.
There are many applications of this: rollbacks, adoption rate, metrics comparison, etc.
Some Constraints
For simplicity:
Manually splitting the commits into two sets by hand is slow.
The Git Graph
Consider the following graph of commits, where arrows are drawn to a commit’s parents. Some of these are the deployed commits, marked in green.
Which deployed commits contain the change in commit C?
Time of Commit
A
B
D
E
C
F
H
G
The Git Graph
Consider the following graph of commits, where arrows are drawn to a commit’s parents. Some of these are the deployed commits, marked in green.
Which deployed commits contain the change in commit C?
Time of Commit
A
B
D
E
C
F
H
G
The Git Graph
Consider the following graph of commits, where arrows are drawn to a commit’s parents. Some of these are the deployed commits, marked in green.
Which deployed commits contain the change in commit C?
C, D, and G!
Time of Commit
A
B
D
E
C
F
H
G
An Approach
Consider two approaches:
Approach 2
This approach requires us to work backwards from all commits after the time of the given commit.
It’s also a bit more challenging to implement: we need timestamps and there might be a lot of branches (each with its own ending commit).
Time of Commit
A
B
D
E
C
F
H
G
The Solution
The solution comes in the way that we store data!
We normally think about Git in a commit/parent relationship. For example:
What if the direction of the edges was reversed? That way, a BFS from the commit can be done forward in time to find deployments that contain it.
This could be implemented using an adjacency list as opposed to storing just one (or two) parent(s) since a commit can only have one parent (two if it’s a merge commit), but it could have infinite children.
The Solution, Continued
We can incorporate more time-based optimizations to restrict the search, but here’s the general idea:
This is much faster! No more manual binary search! Obvious and modular!
This problem itself is a subproblem of a larger project which is simpler to solve.
Summary
Lecture 27, CS61B, Fall 2023
Background
Complexity
Strategic vs Tactical Programming
Retool
Commit History Traversal
Summary
Summary
Good code is more than just working code. Code (and complexity) scales with functionality and poses more challenges to effectively maintain.
Practice good design principles in your classes!
The real world will give you ambiguous problems with loosely defined constraints; it is your duty to make sense of these and then leverage your skills to select the best solutions given constraints while considering tradeoffs.
If this kind of content sounded interesting to you - check out engineering blogs! I also write about similar topics.