1 of 40

1

CS61B, Fall 2023 @ UC Berkeley

Software Engineering I

Lecture 27

2 of 40

Background

Lecture 27, CS61B, Fall 2023

Background

Complexity

Strategic vs Tactical Programming

Retool

Commit History Traversal

Summary

3 of 40

Background

I’m the guest lecturer for today, Aniruth Narayanan.

  • Senior, majoring in EECS and Business
  • Teaching Assistant for 61B
    • First taught 61B in Summer 2021
  • From Florida
  • Took a gap year to work in industry
    • The subject of this lecture!
    • Later, I’ll talk about some of the problems I’ve worked on

Q & A will be at the very end.

4 of 40

Disclaimers

Some important disclaimers:

  • Nothing I say should be interpreted as anyone’s opinion but my own.
  • Nothing contained in these slides is confidential or proprietary information of any company.
  • The sections on complexity and strategic/tactical programming are inspired by Josh Hug’s slides (think of these as less detailed versions).

5 of 40

Scale

In 61A, you were focused on the correctness of a program.

In 61B, you are focused on larger scale projects, but still have to abide by many of our requirements and specifications (e.g. functions, runtime, etc.).

Working on small projects isn’t the same as working on large scale, design-oriented programs where the task is defined by you.

The best way to learn these differences is through experience; this lecture will have some examples and some light theory as a sampling.

Project 3 allows you to embrace the challenge of large scale yourself.

6 of 40

Complexity

Lecture 27, CS61B, Fall 2023

Background

Complexity

Strategic vs Tactical Programming

Retool

Commit History Traversal

Summary

7 of 40

Restrictions of Engineering

In other disciplines, we’re limited by imperfect materials.

Objects can’t be infinitely dense, or infinitely light.

Many fields are constrained:

  • Chemical engineers have to worry about temperature.
  • Material scientists have to worry about how brittle a material is.
  • Civil engineers have to worry about the strength of concrete.

In computer science, we’ve solved most of the underlying material constraints decades ago.

  • The sum power of Apollo missions is less than the computing power of your phone.

8 of 40

The Power of Software

Early computers were used for limited tasks.

This is a picture of ENIAC, the first general-purpose programmable computer that was 1,000 times faster than anything else at the time.

9 of 40

The Power of Software

Rollercoaster Tycoon was an early game where players would design amusement parks.

99% of the code was written in Assembly, with 1% written in C.

This is very hard for programmers to reason about.

The game was designed to run on Intel Pentium CPUs, which were 66 MHz (66 million operations per second).

The Intel i9 14th Gen of today can run at up to 6GHz (6 billion operations per second). 100x more

10 of 40

The Power of Software

Most video games today require at least 3.5 GHz and 8 GB of RAM.

In 1996, the average computer had only 8 MB or 16 MB of RAM.

The average software product doesn’t require the computing power we have available today.

The limitation comes from the creative ways we plan and design what we’re building.

An individual programmer is no longer able to effectively manage the entire software system for a large project.

Any one programmer should only need to understand a fraction of the codebase.

11 of 40

Empirical Proof

Here’s an example from Spotify’s Engineering Blog of their approach to software. They have >1 billion lines of code, 60 million used in production.

They must carefully target and orchestrate changes using internal mechanisms and thorough test automation.

12 of 40

A Definition of Complexity

“Anything related to the structure of a software system that makes it hard to understand and modify it” - John Ousterhout, “A Philosophy of Software Design”

As programs become more feature-rich, their complexity increases. Our goal is to keep software simple.

Why? Complex systems require a lot of effort to make small improvements.

  • It takes longer to understand how code works
  • It is more difficult to fix bugs independently and with confidence
  • It it harder to find what needs to change to modify functionality
    • Unknown unknowns: it’s not clear what you need to know to make modifications
    • Common in large codebases

13 of 40

The Runtime of Complexity

Complexity scales exponentially with respect to functionality.

Each new piece of functionality has to interact with all the existing functionality in various ways, for all possible combinations.

14 of 40

Managing Complexity

There are two kinds of complexity:

  • Unavoidable (Essential) Complexity
    • Inherent complexity caused by the underlying functionality
  • Avoidable Complexity
    • Complexity that we can address with our choices

In response to avoidable complexity, we can:

  • Make code simpler and more obvious
    • Using sentinel nodes in Project 1
  • Modules
    • Abstraction: the ability to use a piece without understanding how it works based on some specification
    • Interfaces are an example - HashMap, BSTMap both are Maps

15 of 40

Strategic vs Tactical Programming

Lecture 27, CS61B, Fall 2023

Background

Complexity

Strategic vs Tactical Programming

Retool

Commit History Traversal

Summary

16 of 40

Tactical Programming

The focus is on getting something working quickly; workarounds are leveraged.

Ex: Code that contains many if statements to handle many separate cases that is hard to explain

Prototypes, proof-of-concepts leverage tactical programming. The goal is to show that something could theoretically work.

However:

  • There’s no time spent on overall design
  • Things are complicated with workarounds to get things working
  • Refactoring takes time and potentially means restarting
    • Consider Project 2 runtime requirements and planning the constructor
  • Proof of concepts are deployed in the real world due to lack of time

17 of 40

Strategic Programming

A different form of programming that emphasizes long term strategy over quick fixes. The objective is to write code that works elegantly - at the cost of planning time.

Code should be:

  • Maintainable
  • Simple
  • Future-proof
    • 61B projects have deadlines; afterwards, you can throw it away

If the strategy is insufficient, go back to the drawing board before continuing work.

Helper method strategy is key to leverage throughout projects, especially when we have written comprehensive tests to ensure that these methods are correct.

18 of 40

Retool

Lecture 27, CS61B, Fall 2023

Background

Complexity

Strategic vs Tactical Programming

Retool

Commit History Traversal

Summary

19 of 40

Retool

Retool is a platform to help build internal tools much faster.

  • Instead of building a frontend and a backend from scratch, you can write queries, linked to drag and drop components
  • Much of this power comes from being able to connect to nearly any API or database
    • API: An interface for working with data; we don’t care about how it works underneath the hood, just that we can use the specifications.
    • We can connect different sources of data and code together to make building easier.

I worked as a software engineer intern on the Connect team in Fall 2022.

One of the projects I worked on was incorporating missing fields into the resource creation system.

20 of 40

Building a Retool App

21 of 40

Adding a Query

22 of 40

Picking a Resource

23 of 40

Configuring a Resource

24 of 40

Designing Resource Configurations

Disclaimer: the method to implement resources was decided before I joined, but I needed to both understand and upgrade functionality.

The goal is to design a system for users to specify configuration information that allows Retool to connect to and retrieve information from a resource on the user’s behalf.

The naive implementation is to build a new page for each resource in its entirety.

Problems with this solution:

  • Making a modification (such as wording) would require updating every page
  • Breaking changes from updated requirements requires updating every page
    • What is the List interface changed from size() to .length?
  • Adding a new resource is complicated (and requires a lot of copy-paste)

25 of 40

Designing Resource Configurations

The ultimate solution was to build a ResourceConfig - a way to describe properties of a resource with a few parameters using common patterns to generate inputs.

  • This leverages modularity and obviousness when designing a config.
  • Most databases have similar requirements: a host, a port, a name, and sometimes connection options.
  • For non-database resources (i.e. Elastic Search), a similar pattern of using inputs can be used with custom elements for properties to accept.

Not all resources were valid.

  • If a user did not submit the necessary information for a resource, they wouldn’t be allowed to make the resource.
    • Java prevents you from writing knowingly bad code with compiler errors.
  • This was specified in a validation function included in the resource config.

26 of 40

Missing Fields

When users put in their credentials, there’s an option to Test Connection with a sample request. If the validation function failed, this was grayed out.

My objective was to add a way to track missing fields and incorporate some validation for fields.

This way, a user can be informed what needs to be changed when making a resource fails.

A strategy question: How can we expand functionality to include this? What do we need to change?

27 of 40

Solution and Refactoring

The solution was to rethink validation as passing the missing fields test.

  • Parameters would be cycled through, and if one was missing, the accompanying “missing field” is added to the list.
  • If the list of missing fields is empty, then the resource can be tested/created!

The fields were upgraded to integrate and generalize validation.

  • For example, every port number for a database should be a number.

28 of 40

Testing

Testing in the real world is very important! There is no autograder to leverage to gain confidence in a solution.

Would the right missing fields be displayed?

  • For the “patterned” resources, such as a database, I wrote a utility that tested every possible combination of inputs and outputs.
    • If there were mock resources, this could be expanded into queries.
  • For the other resources, select test cases were chosen and manually written.

This pattern is powerful - updating the utility updates the tests for all resources.

Resources (and libraries/dependencies) update all the time, sometimes with breaking changes. This is future-proof code.

29 of 40

Commit History Traversal

Lecture 27, CS61B, Fall 2023

Background

Complexity

Strategic vs Tactical Programming

Retool

Commit History Traversal

Summary

30 of 40

The Problem

As a hypothetical: let’s say I want to find which versions of my project contain a certain change. Some of these versions (represented by git commits) are special - they were deployed into the real world.

It’s a common practice in industry - it makes more sense to do predictable rollouts instead of deploying every commit.

Would you want to have 5 or 6 updates to your phone every day?

Apple and Google release new versions of their OS, browsers, apps, etc. in a cycle, with minor changes in between. They don’t release every single time a new commit is pushed internally.

There are many applications of this: rollbacks, adoption rate, metrics comparison, etc.

31 of 40

Some Constraints

For simplicity:

  • Each change is a git commit - the same git commits generated when you use git commit -m “submitting lab1” and pushed using git push origin main.
  • There is easy access to all of the git commits that can be loaded on any computer in a reasonable amount of time.
  • There are a lot of commits - caching the results (i.e. storing which commits are in every single deployment) is infeasible.
  • Once a change is made, we don’t undo or modify that change (in practice, we can use git blame).

Manually splitting the commits into two sets by hand is slow.

32 of 40

The Git Graph

Consider the following graph of commits, where arrows are drawn to a commit’s parents. Some of these are the deployed commits, marked in green.

Which deployed commits contain the change in commit C?

Time of Commit

A

B

D

E

C

F

H

G

33 of 40

The Git Graph

Consider the following graph of commits, where arrows are drawn to a commit’s parents. Some of these are the deployed commits, marked in green.

Which deployed commits contain the change in commit C?

Time of Commit

A

B

D

E

C

F

H

G

34 of 40

The Git Graph

Consider the following graph of commits, where arrows are drawn to a commit’s parents. Some of these are the deployed commits, marked in green.

Which deployed commits contain the change in commit C?

C, D, and G!

Time of Commit

A

B

D

E

C

F

H

G

35 of 40

An Approach

Consider two approaches:

  1. Work backwards in the reverse direction of the commits storing parents
    1. We don’t have access to this information; git is designed to tell us parents of commits (such as through git log)
    2. This is what we did previously!
  2. Work in the direction of the commits storing parents
    • We have to grab every single “ending commit” and then work through all the parents until we hit the change we’re checking for in commit C.
    • If we don’t hit the change, we can only terminate once we work backwards enough to find commits from all parents that have timestamps before commit C.

36 of 40

Approach 2

This approach requires us to work backwards from all commits after the time of the given commit.

It’s also a bit more challenging to implement: we need timestamps and there might be a lot of branches (each with its own ending commit).

Time of Commit

A

B

D

E

C

F

H

G

37 of 40

The Solution

The solution comes in the way that we store data!

We normally think about Git in a commit/parent relationship. For example:

What if the direction of the edges was reversed? That way, a BFS from the commit can be done forward in time to find deployments that contain it.

This could be implemented using an adjacency list as opposed to storing just one (or two) parent(s) since a commit can only have one parent (two if it’s a merge commit), but it could have infinite children.

38 of 40

The Solution, Continued

We can incorporate more time-based optimizations to restrict the search, but here’s the general idea:

  1. Build a git commit graph in reverse order, storing mappings from commits to a list of children
  2. Generate a set of the deployed commits (separate from the graph)
    1. Why might I want to use a set instead of a list?
    2. Runtime! A HashSet gives me amortized constant runtime as opposed to a LinkedList or an ArrayList.
  3. Do a BFS from the input commit and check if commits traversed are in the deployed commits.
    • We could take multiple input commits here and compute only step 3.

This is much faster! No more manual binary search! Obvious and modular!

This problem itself is a subproblem of a larger project which is simpler to solve.

39 of 40

Summary

Lecture 27, CS61B, Fall 2023

Background

Complexity

Strategic vs Tactical Programming

Retool

Commit History Traversal

Summary

40 of 40

Summary

Good code is more than just working code. Code (and complexity) scales with functionality and poses more challenges to effectively maintain.

Practice good design principles in your classes!

The real world will give you ambiguous problems with loosely defined constraints; it is your duty to make sense of these and then leverage your skills to select the best solutions given constraints while considering tradeoffs.

If this kind of content sounded interesting to you - check out engineering blogs! I also write about similar topics.