[Edit: 2023-06-04: The post is now up at https://explog.in/notes/elephants/index.html.

This was a draft blog post for explog.in – I experimented with writing & publishing on Google Docs first and typesetting with Emacs & Org-mode later.]

Eating Elephants

Ramping up on large software projects

As a software engineer, you'll almost certainly ramp up on large projects: anything from massive proprietary software stacks to popular open source projects. This note covers the tricks I’ve accumulated over the years to onboard quickly[1]: mechanical programming tactics, recommendations on quickly building and refining your mental model, and strategies on learning effectively from others.

To set expectations, a “large” project in this context means one where you cannot realistically expect to read all the code, particularly not in time to deliver anything meaningful with it. I should also warn you that there will be several bad puns along the way to make this post easier to digest.

And just in case you were wondering: no elephants—virtual, or otherwise—were harmed in the making of this post.

> How do you eat an elephant? One bite at a time.

First, maintain your sense of Purpose

There’s generally a reason you’re eating your elephant: and if there’s just one thing to take away from this note – always keep your purpose in mind along the way. It’s easy to get distracted by noise while tackling large and seemingly intractable work and fall down unproductive rabbit holes.

The context in which you’re working defines the constraints you deal with and the approaches you should take. For example:

Of course, nothing is really as black and white and there’s a full spectrum of approaches you could take: potentially starting out with emergency “patch” fixes and follow up with real solutions once you understand the system deeply.

Ideally, you can set some reasonable goals to achieve in the near and short term: precise, tactical goals for your near future; and a general direction for what you broadly want to achieve over a larger period of time. These goals will be your primary compass as you explore.

Orient yourself: construct a mental model

With your goals giving you a sense of direction, it’s time to orient yourself in the system: and that means building a mental model of how things work, where the code is, how it happens to be running in production, and building mechanical sympathy with the system in general.

Build a first approximation by thinking through how you would build it

To get started, I take whatever I understand of the project, and imagine how I would have built it myself. There are generally a lot of assumptions I’ll have to make, as someone completely new to the area; but at least I’ll have taken a stab at reasoning through the consequences of those assumptions. Most of us do this implicitly anyways as we make assumptions on how things work; I just like to make it a little more explicit.

Reading about the architectures of different systems including their corresponding tradeoffs, conducting – and preparing for – system design interviews, and simply experiencing different software stacks will help build your muscles here. I highly recommend looking up the AOSA series of books, Beautiful Code, and engineering blogs by different companies (or their summaries).

Then see what is, and contrast it with what you expected

The risk with building the first approximation is confirmation bias: you might start exploring looking for proof that your first approximation is correct. Instead, I want you to do the exact opposite: look for places where your mental model does not match reality at all. The more surprising the difference, the more you have to learn in that area, and the more carefully you want to test your assumptions.

I’ll remind you again later in this note, but you must respect Chesterton’s fence and understand why things are the way they are, and how they got to the point they’re at.

For example, your solution to the problem might have assumed that certain data could be served from cache; but you find that the actual code simply recomputes it each time – digging deeper you might find that the cache hit rate was remarkably low and the added cost & complexity of the cache simply weren’t worth it. Or you might find that the original authors had planned to add a cache, but simply never got around to it.

Only believe empirical evidence

> Anybody who thinks “Just read the code and think about it” – that's an insane statement – you can't even read all the code in a big system, you have to do experiments on the system.John Carmack: Best programming setup and IDE | Lex Fridman Podcast Clips

One of my most frequent mistakes – also repeated constantly by candidates I’ve interviewed – is to assume the program we wrote (or read) does what we think it does without actually running it. You need to take a look at how the system is actually running, preferably in production, before believing you understand it in any meaningful way.

The best antidote to confirmation bias is to actively look for counter-examples; and looking for real metrics, numbers, core-dumps and logs is easily the best way to identify where things just don’t match your expectations. Though if I’m honest, I generally distrust instrumentation till I’ve had a chance to validate it for myself.

Making sure you’re looking at behavior in production is also very important: “it runs on my computer” is necessary, but nowhere near sufficient for claiming understanding of your software. For something like a mobile app, the complexity isn’t necessarily in the code within the app itself but in how it interacts with the surrounding operating system. Unit tests can help, but integration tests and production runs are much more truthful.

 

Test for conscious competence

I feel comfortable with my mental model of a system when I can generally zero in on bugs simply based on a description of where they happen and the symptoms observed; at that point I have some confidence in being consciously competent and start believing my intuition. A little.

Further reading: Copy Construct has written extensively on effective mental models for programmers.

Do the work: Mechanical tactics

The previous section was somewhat philosophical and possibly disappointing in its abstractness: in this section we’ll go in the opposite way and talk through some of the more tactical approaches that have worked for me.

Hack it up any which way

A.k.a try to swallow the elephant in a single bite. I realize this sounds counter-intuitive (and impossible), but trying to quickly prototype a working end-to-end solution can generally pay outsized rewards. There are a couple of important parts I’d like to highlight:

There are a lot of valuable outcomes from this approach:

Make small changes to build confidence

This is the part where you start nibbling at the edges of your (virtual) elephant to build up confidence that you’ll actually be able to eat it all some day. Learn to work in your new project, the build and release process, how tests work, simply getting through code review and other processes to get your changes deployed.

I recommend making small changes and simply tidying up your workplace. Clean up existing lints, add more test coverage, improve the quality of the tests that do exist; fix spellings; add instrumentation; improve the documentation – particularly by adding examples. Make simple, easy to understand – preferably non functional – changes that will run in production. Make sure you can find out if they work correctly once they reach customers.

Along the way, you'll gain goodwill from your team, have something to show for all the time you've been spending ramping up on the system, and get that glow of satisfaction at actually landing something instead of rotating in place for months.

Take small steps from a known good state to a known good state

This part is a reminder to take small bites that you can chew (in complete contrast to the very first piece of advice I gave you). Because there are so many unknowns in this system, once you’re actually working on a solution – move carefully. Make sure the system runs as you expect it to; make a small change, validate that it works again (similar to – but not necessarily the same as – test driven development).

If the results diverge from your mental model at any point you can immediately backtrack and identify what changed in the small step you took, instead of having to bisect a large swathe of changes (and possibly having to determine that two changes overlapped to cause issues). This simple mechanical behavior can save you hours of debugging and working backwards to identify unintentional consequences.

At times, it can take a little bit of humility to constrain yourself to small changes instead of giant leaps; you're most likely to learn the same way I did – which is to fail often enough till you learn to make small but confident steps. (Hat tip to Kent, who helped me work through this several times at the start of my career.)

Look around the boundaries

(Credit for this section goes to Ivan Savov who pointed out that I never covered the importance of looking at the data.)

Your project almost certainly interacts with several other systems, some of which may be significantly easier to learn from. Look at the inputs, outputs, and persisted state of your program to see what’s going on.

Data

Accessing the databases, flat file storage and looking at what’s saved can make the details of the system significantly more concrete. If you don’t have access to the raw data, try to look up the schema of the different tables to see what’s available where, and you can work backwards.

In certain types of applications: eg. browser applications – client side data storage is extremely         standardized and very inspectable (look through the Chrome DevTools tab labeled “Application” if you haven’t.)

RPCs

On a similar note, any remote calls – particularly those that involve structured code-generation or serialization like Thrift and protocol buffers can be other more easily accessible entry points.

This works both ways as well: you should sneak a peek at the calls your system makes on others, and the calls other systems make on yours to get a sense of the inputs & outputs.

Reading code is insufficient

From a tactical perspective, simply reading code is not empirical evidence: you need to run it, and see exactly what it’s doing. Simply reading code can be very misleading – you can never be sure what’s overriding behavior in production. It could be something as cartoonish as an “#define false true” to something more realistic like compiler optimizations eliding code paths you care about.

Reaching for an example from something I've been working on recently: PyTorch supports several transforms to make your easy-to-modify but slow-to-run model into something that's extremely fast. But if you didn't know that it's going to be traced, you're going to have completely broken assumptions on what runs. Here’s an example from torch.fx where my beloved debugging mechanism – a print statement – gets elided:

```python

import torch

class MyModule(torch.nn.Module):

    def __init__(self):

        super().__init__()

        self.param = torch.nn.Parameter(torch.rand(3, 4))

        self.linear = torch.nn.Linear(4, 5)

    def forward(self, x):

        print(f"A wild print appeared! {x=}")

        return self.linear(x + self.param).clamp(min=0.0, max=1.0)

```

The traced equivalent that will run (notebook for proof):

```python

def forward(self, x):

    param = self.param

    add = x + param; x = param = None

    linear = self.linear(add); add = None

    clamp = linear.clamp(min = 0.0, max = 1.0); linear = None

    return clamp

```

It’s not just PyTorch: I’ve run into similar surprises across ART (while running the debugger), C optimizations, and other brilliant magic.

Look for fingerprints

While ideally you’d be able to navigate a project simply based on a good folder structure, consistent naming schemes and obvious class naming patterns, reality tends to be significantly messier. It can be much easier to find an entry point by looking up fingerprints of other engineers and pulling on a thread starting from that point.

Some common fingerprints I use to quickly find a starting point to navigate:

Looking for these generally takes me directly to a string or a constant that I can then follow through all the way.

Of course, this is more an art than a science: any of these strings could be generated dynamically; or you might be extremely unlucky and the phrase might end up being split up across multiple lines instead.

Pay close attention to the logs and program outputs

Be extremely conscientious while reading logs: it's too easy to miss something obvious right in front of you. Most of the logs correspond to something someone else thought was interesting about the program's behavior; you’re likely to benefit by paying a little bit of attention.

Sadly, more often than not logs tend to be unfortunately spammy, occasionally out of order (if parallelism or concurrency is involved): don’t hesitate to pull them together into an editor for quick manipulation. You’d be surprised at just how readable logs can be when formatted and highlighted correctly.

One of my favorite tactics is to collect all logs or compiler output into a file that I then manipulate in Vim: I delete everything extraneous, and reformat the bits I care about. Versioning these abbreviated log files makes change much easier to observe and dissect. In extreme cases, consider post processing the logs and visualizing them manually – a few lines of python in a notebook could save you hours of spelunking.

Look out for any instrumentation

Any live instrumentation that captures the system's execution is the last set of data I'd recommend pulling up. Instrumentation should help you build empathy with how the system executes: do requests take milliseconds, seconds, or minutes? What does p99, p99.9, p99.99 latency look like? How many resources are consumed? Is this code CPU bound, I/O bound, or neither?

I almost never trust instrumentation and metrics that are handed to me: at the very least, I’d like to trace a single data point (that I generated manually) through the pipelines, and then I can accept that it’s at least partially working. If you can triangulate metrics through multiple different sources, it’s worth the investment. If you run across complex metrics that are hard to explain, don’t rely on them for anything but sanity checks: look for simpler, direct metrics to actually guide your decisions as you explore the code.

Learn to speed read code

Even though relying solely on reading code is misleading, you still need to be able to skim it really quickly. The same rules for reading books fast also apply to reading code: focus on the interfaces instead of digging deeply into the implementations, spend your time on critical pieces.

Incrementally building your mental model (as described above) and taking thorough notes along the way (as described near the end) should help you move faster and remember what’s going on.

You should figure out ways to quickly navigate your code base for this to work really well. Some editors also offer bookmarks to be able to jump back and forth between interesting points quickly.

Read the commit history

Less obviously, see how the code evolved: go spelunking into the commit history and look for major decisions for the parts of the code base you're interested in. I like to jump to the commit that introduced that particular file/class/function to see the original intent for adding it, without the cruft that might have grown up around it over the years. Then you can skip along to any major refactors or structural changes that render it unrecognizable.

Commit messages should ideally include links to other discussions, tasks, comments and reviews that you might also find incredibly valuable to gain context.

Look for design documents & discussions

Look for any old design documents or discussions that might be hanging around: this can also act as an excellent reference to find more people to talk to. One of the most important pieces of context you should gather is the set of problems that this piece of code was meant to solve, and then to determine which of those problems is still applicable or if there are new ones that must be addressed.

Lean on your tools

As you ramp up, don’t ignore the tools available: you’ll probably bring some along based on your past experience, but also take the time to learn the ones built for your current problem. For example, Android Studio encodes (and automates away) a tremendous amount of knowledge and conventions you’d otherwise need to manually look up and follow to build anything with Android – I couldn’t recommend using Emacs for it.

Ideally, you should understand the strengths and weaknesses of the tools you have available, and build some intuition on when to use what – this knowledge inevitably compounds over time. One path is to use them on a trivial problem and identify how the tool works. Understanding how – and why – they work will help you be that much more effective with them.

I optimistically started writing this section and later realized that there was no realistic way to cover all the tools (or types of tools) available; I’m just covering some of the tools you might find useful to apply the tactics covered in the previous sections.

Tools to understand programs while building them

The humble print statement

> Seibel: When you’re debugging, what tools do you use?

> Thompson: Mostly I just print values. [...] Whatever I need; whatever is dragging along. Invariants. But mostly I just print while I’m developing it.

The very first tool I’ll reach for while exploring and debugging is still a print statement. I generally use stacked commits at work and will keep an unpublished commit at the top of the stack that simply adds log statements everywhere.

Most of the time you’ll find that print statements generally don’t change the nature of the program’s execution, slow things down too much or carry unpleasant surprises; and given how fundamental input/output are to any programming environment, they’re much less likely to be broken.

Sometimes you might run into scenarios where you can’t safely write to stdout/stderr: either because they’ve been co-opted for other purposes (like the actual point of the program), or are so full of spam that you can’t do anything meaningful with them (more on this later). In that case, I’ll simply open a custom log file in a fixed folder (generally prefixed with a uuid or timestamp to distinguish different runs) and write to it directly.

Print-based approaches also work remarkably well for me while dealing with concurrency or distributed environments: once the logs are stitched together reasonably well things become far more readable and understandable.

Another trick I like to rely on is to log stack traces explicitly from interesting points just to see how the function I care about is getting called. If you’re using python, you can also include arguments along all the frames, or use something even more sophisticated like Panopticon.

Use a debugger

If you have a good enough setup to easily attach a debugger and run the program fast enough, step through it, save your breakpoints, and try playing with the program as it runs. Being able to quickly walk through the full state of the program without having to choose what to log upfront can speed up changes a lot.

Sometimes, though: debuggers can be flaky, misleading (by disabling optimizations) or simply slow down execution enough to be impractical. Ideally, you should have some idea of how your particular debugger works – and how it causes the observed program to change.

I generally dislike the ephemerality of a debugger: my breakpoints and state get lost across program restarts, crashes, disconnects and other banalities of life. Wherever possible, I prefer to insert breakpoint calls directly into my code so I can maintain them across debugging sessions. For example:

Python

Python lets you insert the snippet `breakpoint()` to trigger PDB; you can also use `import ipdb; ipdb.set_trace()` if you have ipdb as a dependency. There’s also the option to use the environment variable `PYTHONBREAKPOINT` to customize the function called through `breakpoint()`.

JavaScript

Directly use the `debugger;` statement to trigger debugging.

Use a REPL

Even more than a debugger, I particularly enjoy access to a REPL to load and execute code quickly with behavior that seems even closer to how the program would actually run. This works particularly well for Python code – it is the basis for Notebooks – and leads to incredible interactive experiences.

Try to get a REPL that you can easily hot-load code into: `%autoreload` in IPython is incredibly valuable when combined with `pip install -e` – or development mode means you can edit code and run it live instantaneously. The other approach is to `eval` the code you care about to redefine functions, though that feels somewhat more kludgy to me.

If you're lucky (and nearby security engineers maybe not so much) you might get a REPL into a live production system to poke around in and explore.

Tools to navigate your codebase

Index based navigation

The ideal is always a system that lets you jump to the precise function or class you care about, or even with a regular expression. A search engine or IDE that understands the code well is always amazing – IntelliJ shines, and VSCode generally does a good job too with its LSPs. Emacs and Vim users can rely on TAGS.

Github's new code search is another great example, and with a similar URL but different implementation so is Android’s repo hosted by Google. Meta recently open sourced Glean if you’d like to host and run your own.

Unfortunately, I should also point out that sometimes index based tools can be misleading – because your codebase will inevitably have code generation, dynamic dependencies, surprising build procedures or some strange edge cases; trust but also verify at runtime. I’ll often spin up a REPL in Python and then use Jupyter’s ?? magic command to look up the source code that’s actually being run.

String based navigation

At the same time (particularly to navigate using fingerprints) I find simple text (or regex) based search invaluable. Ideally you have a system that lets you grep through your code base quickly and painlessly.

At its simplest, hopefully you have a checkout that you can `find` and `grep` your way through with ease; pair it with `sed` and `xargs` to quickly perform simple code-mods. For larger systems that you can’t simply grep quickly, ideally you have something like OpenGrok that can let you search for simple strings quickly.

Specific turns of phrase in logs or user facing messages, style names, colors are just some examples of strings that can be surprisingly valuable to pattern match for to effectively look for source code – and generally don’t show up in indexes.

Tools to observe your program

Log readers

There are several programs that can help you navigate logs more easily: while there are some excellent UIs and CLIs like The LogFile Navigator, I often find myself dropping into Vim with a copy of the logs. That gives me the ability to search and delete anything I don't care to see, reformat some logs; occasionally copy out a few specific instances and explicitly diff them.

I haven’t found anything as convenient as Pandas DataFrames for converting logs for post-processing: starting with a column that simply contains the raw text, it’s easy enough to add columns with values parsed out of the raw logs, and then drop them into matplotlib for easy custom visualizations.

Instrumentation: metrics, traces, etc.

Speaking of empirical evidence, there’s nothing that can inspire as much confidence – or horror – as seeing live metrics of your code running in production. I’ve been lucky to use Scuba at work, and I can’t overstate the value of being able to quickly aggregate data live as I explore raw samples: hopefully you can arrange for similar flexibility.

The final pillar of observability is traces: these can be very valuable in seeing what’s running, and occasionally how fast it’s running. Depending on the sophistication of the tracing tools you have available, you might be able to get interesting stack traces extremely quickly. I’ll also plug the `@probe` mechanism in Panopticon to trace functions you’re interested in learning more about.

Working with other people

Happily enough, you'll generally work with people who've already consumed this particular elephant and can significantly speed you up along the way. There are several tactics to learn effectively from others (to the horror of the elephant community).

Ask questions well

While there are no stupid questions, you can get much more from your colleagues by asking good questions well.

You want to show that you value their time: demonstrate that 1) you did your research, 2) show what worked and what didn't; 3) and minimize the time they need to answer your question – but still get the most information you can.

If someone hands you a fish, also follow up on how to fish – "How did you solve this?", and try to never ask the same question twice. If you must ask the same question again, explain what you understood the first time around.

Of course, there's also a cost to spending too much time digging before asking questions. A rule of thumb along the lines of "one hour of digging" before reaching out can be very valuable instead.

I’ll also bring up a pet peeve: if you’re reaching out asynchronously, please include the question as part of the initial message to save on a lot of unnecessary back and forth.

Communicate your progress!

Setting and updating expectations constantly as you ramp up can be one of the most valuable things you can do. Explain how far you've gotten, how long you think it'll take and why.

Writing down and sharing (*much* more on this in a minute) what you've found will also help validate that you're building a good understanding of the system and act as a bridge for whoever happens to come after you.

Regular communication on an automatic cadence: as opposed to self identifying specific milestones also significantly reduces pressure on sharing updates: if you do it every week, then people can rely on understanding your progress regularly when you push the information; instead of having to pull it when they get worried.

More recently I’ve been experimenting with a single status doc anyone can use to quickly catch up on the status of the projects I’m driving directly, and it seems to work well. A lot of color helps.

(This section is also an example of Do as I say, not as I do because I regularly forget to make timely updates.)

Set expectations that things will break

One of the obvious – but not necessarily obvious – aspects of onboarding onto a working system is that things will break as you go around making changes (possibly without helpful mentors nearby to de-risk your work). Try to manage up and set appropriate expectations up front: it’s entirely likely that some parts of the system might go down, but with everyone on board try to make sure you can bring it back up quickly.

Ship small wins as soon as you can

Never under-estimate the value of shipping several small wins quickly: you'll build momentum and confidence in yourself, and inspire confidence in others at the same time.

You're also actively fixing and improving the project you're working on, doing good work sooner along the way.

Respect Chesterton's fence

As satisfying as it is to rant about past design decisions – and they may definitely be horrifying – the engineers before you did the best they could with the constraints they were working under.

You need to figure out which of those constraints still apply and which ones are obsolete, and do this without running headlong into them. Try to understand the rationale behind questionable design decisions to make sure they weren't solving a problem you just haven't faced yet.

But don't be afraid to break it

Once you're sure you have context on past decisions, you should feel empowered to go and change them. It can be just as damaging to blindly accept past decisions as it is to change them without thinking through the consequences.

As someone new to the system, you’re more likely than most to have an opinion on what doesn’t work at all.

RTFM

Learning from the engineers who came before you isn't necessarily restricted to directly *talking* to people. There's a lot of information available in any reasonable code base.

Most obviously, you should quickly navigate and understand the documentation that's available: particularly valuable is documentation that explains the why. I should emphasize that I generally recommend reading or running the code itself to understand the how; documentation describing the how tends to bit-rot the fastest and is often the least reliable.

Pair program, or shadow engineers

It can be both inspiring and extremely informative to simply shadow others who are comfortable with the system. You can see what it's going to be like once you build mastery, and ask live questions along the way. Pay attention to how they think, the tools they use, and the order in which they explore different avenues. Ask for historical context – particularly for systems that don’t yet match your mental model.

If pair programming isn’t feasible – try simply shadowing experts at their work. Take notes along the way, and try to follow along on your own computer if you can. There’s also the pleasure of watching masters at work.

This may or may not be possible depending on your setup, but can be one of the fastest ways to get better at ramping up and solving problems.

Test yourself: Do it as others have done it

See if you understand things if you can do it in a way that fits in with the rest of the system. I try to write code that blends in comfortably with all the other decisions taken along the way: following naming schemes, unit tests, but also the look & feel of the code (respecting Chesterton’s fence).

Once you have a good feel for how things are done – and can demonstrate your familiarity with them – you’ll also be able to make a much stronger case if you plan to change them.

Write a lot

The single most valuable tool I've found for maintaining my orientation and sense of progress as I work through something complex is to have a written log of everything I'm trying, what's worked, what's not, links and screenshots. Which is why writing gets a full section just for itself.

There's a certain structure to take notes that works fairly well for me; I recommend adapting it to one that works well for you:

What's the goal?

Start by writing out the problem you're solving at the top: why are you dealing with this? What do you hope to achieve? Scrolling past the note every morning is extremely useful for avoiding rabbit holes and focusing on the most valuable ways to spend your time.

Open Questions

Second, I try to have a list of major open questions I want answers to that I haven't found yet: either why a certain subsystem works a certain way, or where I should make certain changes as examples. This reminds me to look out for signals I would otherwise ignore.

Daily Log

A daily log to take notes on progress: I tend to write out how much time I expect to have today to work on this, then the actual work I hope to accomplish as headings. I fill this in as I go through the day, including stack traces, observations, other TODOs that pop up (particularly second-order tools I could build to make all of this painless, and why-oh-why has no one else implemented it yet).

The daily log tends to be incredibly valuable as an excellent replacement for my memory: I can easily return to the project and start right where I left off. It also helps maintain momentum and understand my own progress – without concrete results or code it can be hard to remember the sheer amount of work that goes into ramping up on something complex or brand new.

If you’re unfortunate enough to have a manager’s schedule instead of a maker’s schedule, your daily log acts as an excellent way to regain context and continue from where you left off.

References

Then I keep quick links to references I need to access frequently that are related to that project. This list can become extremely large so I'd recommend keeping it to only the most important references. The rest can live in the daily log.

Diagrams!

A complementary skill to build out is to build mind maps of the code base: I strongly recommend using something electronic like Kinopio or Scapple. That allows you to paste in links and code instead of writing them down and can help navigate the most confusingly named of code bases (Android, I'm looking at you!).

Be selective about how you structure your diagram: resist the urge to include every single class (which is something that can be automated) – prioritize what's important and build a map into the code base that highlights the parts that you actually care about.

I used to lean on Scapple a lot while I was working on Android. As an experiment I'd written a web-based Scapple renderer to be able to share these: you can also pan around a full Scapple I’d made while exploring how ANRs work in AOSP.

Bon Appetit!

Once you're done with your project, come back to your notes and diagrams and use them to document the work you’ve done and blaze a trail for others to come after you. Help fix the pieces that were the hardest to learn, and now that you have significant context, confidently adjust the system.

I hope you find great success and build valuable things!


Thanks to Ivan Savov, Aditya Athalye and Manuel Odendahl for their reviews and valuable feedback; without which this post would be significantly more jumbled, incomprehensible and badly written.

Conversations

Updates


[1] To establish my own credentials: over the course of my career, I’ve switched stacks and domains several times as a professional engineer; from server side reliability to android to jupyter notebooks to ML tools, including deep dives into Pytorch.