1 of 26

Diff Delta

And how fusing commits together reduces tech debt & code review time

A visual guide to

LineImpacts

1

2 of 26

GitClear is an engineering insights tool built to create incentives to reduce tech debt and ship faster. We do it by identifying the rate at which code evolves (using our proven, proprietary metric Diff Delta) and weaving it together with a tapestry of Google DORA, pull request, and related stats.

By making code easier to read, and progress easier to follow, we help teams prevent the tech debt that saps developer enthusiasm on so many large projects.

GitClear went live to customers in 2020 and has been used by thousands of teams to improve their code quality & velocity.

Watch a 3 minute GitClear explainer video.

3 of 26

The Average Git Repo: Branches

A development team might have 100+ commits made across their git repos every day. Nobody can keep track of everything. The average repo is a black box that periodically kicks out PRs when it’s working right.

At best, a couple developers know how the big picture connects.

4 of 26

Critical Moments Lost in the Jumble

A new bug is being introduced

A developer is refactoring legacy code that will prevent tremendous headaches down the road

A new dependency is being added that will create “app size” and security problems

When you’re lucky, you find out about these issues before your customers do. But just as often, critical moments happen and nobody knows about it until weeks or months later, if at all.

Somewhere in this commit jumble...

5 of 26

Certainly not in the commit list

Where Do We Look for Critical Commits?

You’re not going to catch critical moments in a flat list of hundreds of commits, separated by repo. Nobody in their right mind will use this to browse commit activity.

6 of 26

Not in PRs

Where Do We Look for Critical Commits?

PRs are supposed to clean up the mess that happens in development, but you’ve seen how that goes...

  • The change is too overwhelming. Unless your team always keeps its PRs below 20 commits, they become overwhelming and are glossed over
  • Bugs and tech debt abound by the time PR stage reached. Especially for new team members or Junior Developers, when weeks may pass between PRs.
  • The top people don’t have time to read them. Senior Architects are too busy to read PRs. If it’s small, expect a quick glance. If it’s more than 20 commits, you don’t want to know...
  • On small teams, much work happens outside of the PR process. Such work usually gets reviewed by no one.

7 of 26

GitClear Opening the “Repo Black Box”

Let’s zoom into a single commit to see how GitClear’s rich commit parsing engine uses Diff Delta to simplify even the most intricate commit activity

8 of 26

The Old Way: Binary Diff

By treating every change as either an “add” or “delete,” other git tools put the onus on code reviewers to piece together the real story. Is an apparent block of "added code" is truly new, or just some legacy code moved in from another file?

Since 30% of all line changes are moved code, developers could be 30% faster understanding code by using a more refined diff tool.

  • When git tools like Github interpret a commit, they relate a list of green and red changes. Everything is binary.

Other diff viewers classify

every code change as either

File 1

File 2

or

9 of 26

The GitClear Way: Diff Like a Developer

File 1

File 2

Commit ABCD

Meanwhile, GitClear recognizes...

RICH DIFF PARSING DIFFERENTIATES SUBSTANTIVE CHANGES

By classifying code using the full set of code operations that developers recognize, we can extract semantic meaning to differentiate between big vs trivial changes

10 of 26

Here are seven types of operations we recognize in commits. Each operation is accompanied by a screenshot of how

the operation looks in the GitClear diff viewer.

Technical Details Optional In-Depth Reading

Operations Recognized by Diff Delta

Each added line of code counts for up to 10 points.

Each deleted line of code can count for up to 25 points.

By default, Diff Delta prizes code deletion most of all, for its role in reducing long-term tech debt.

Moved code (about 30% of all changed lines) is assigned no Diff Delta

One of the most common types of code change is the "no-op." This encompasses all changes to white space, blank lines added, and lines whose only change was their line number.

1 of 3

11 of 26

Technical Details Optional In-Depth Reading

Operations Recognized by Diff Delta

When part of a line changes, we label this an "update." Updates usually earn around 10 points, but can collect up to 30 points when they implement durable (i.e., not subsequently churned) changes to legacy code.

When a developer repeatedly adds ("pastes") the same line or block in multiple locations, across one or more commits. Copy/pasted code is assigned no Diff Delta, since it implies code duplication that’s likely to vex maintainers.

When a developer applies the same change to several lines en masse, this is detected

as "Find & replace." Such lines are worth up to 3 points.

2 of 3

12 of 26

Assuming you're using one of the 50 programming languages (including every modern language), Diff Delta will know a few more tricks out of the box. Below, a subset of language-specific idioms we recognize.

About 10% of lines changes are language keywords: these are transparent to Diff Delta.

Multi-line statements have their value assigned to the first line

Comments are assigned negligible Diff Delta

“Include statements” can comprise up to 5% of changed lines in React repos. Diff Delta treats them as trivial changes.

Technical Details Optional In-Depth Reading

Language Idioms Recognized by Diff Delta

3 of 3

Function definitions are not given special treatment for Diff Delta, but they are recognized to show documentation & hints in the diff viewer

13 of 26

Diff Delta ≈ Cognitive energy to make a change

Initial estimation phase

File 1

File 2

Commit ABCD

Once commit parsing is complete, we assign a provisional score based on the estimated cognitive energy needed to produce the changes in the commit [1].

Scores range from 0-30 per line, with the highest scores reserved for updates and deletions to legacy code, where tech debt tends to accumulate and only experts dare tread.

Changes like no-ops, moved code, and copy/paste get no Diff Delta. Most non-trivial changes earn 5-10 per changed line, less for short lines. New code earns relatively low value since it implies perpetual maintenance costs. Upgrading production-tested code (by removing and rewriting it) is how Senior Devs maximize their Diff Delta.

[1] See Diff Delta Factors or Calculating Diff Delta from First Principles

+100 Diff Delta

Initial Estimate

14 of 26

Diff Delta: Incorporating Churn

Secondary refinement phase

File 1

File 1

Commit ABCD

Commit EFGH

File 2

File 2

+ 50 Diff Delta

+ 20 Diff Delta

Commit Group ABCD + EFGH

WORK

RENDERED

OBSOLETE

by

Commit EFGH

100 - 50 (obsolete)

Diff Delta

= 50 New Delta

WORK

RENDERED

OBSOLETE

by

Commit ABCD

30 - 10 (obsolete)

Diff Delta

= 20 New Delta

15 of 26

Can Diff Delta be Trusted?

To evaluate whether Diff Delta could empirically outperform other common git metrics, GitClear collected 2,800 data points to compare how well Diff Delta correlated to “developer effort.” We found “Diff Delta” correlated up to 61% in large repos with effort. Read what academics had to say about our research.

Line Impact: Incorporating Churn

Secondary refinement phase

At right, the blue bar indicates the degree of correlation between Diff Delta and effort spend by the development team. Diff Delta matches effort better than conventional metrics “commits” or “lines of code.”

16 of 26

Commit Activity Browser: 2d Visual Map

GitClear rearranges the flat list of commits you’re used to into a colorful, 2d browsable map of commits

Commits are grouped by issue and condensed to hide irrelevant changes

Learn more about Commit Activity Browser

17 of 26

Commit Groups Expose Critical Moments

What Happens by Fusing Commits ABCD + EFGH?

In the real world, commits don't happen in isolation. Diff Delta is built to get more accurate as more commits are added.

By fusing together related commits into Commit Groups, we cancel out noise from churn and get the ground truth about what precisely changed over the course of an issue’s implementation.

Fusing commits into commit groups is key to spotting bugs before they reach production.

File 1

File 1

Commit ABCD

Commit EFGH

File 2

File 2

18 of 26

Technical Details

Making a Commit Group

File 1

Commit Group ABCD + EFGH

+70 Diff Delta

File 2

File 1

File 1

File 2

File 2

Commits ABCD and EFGH

+130 Diff Delta

19 of 26

The Difference is (Git)Clear

Without GitClear:

With GitClear:

  • Code and Jira together from first commit
  • Real-time developer status, no interruption
  • Team crafts more durable code together

  • Bugs are created and nobody knows about it
  • Developers interrupted to get updates
  • Junior Developers repeat same mistakes

20 of 26

Thousands of startup and enterprise teams trust GitClear to make sense of their development activity

Don’t Take Our Word for It...

Trusted By

fff

21 of 26

Competitive Pricing for Every Size Team

22 of 26

Bonus Content

Encore reports for overachievers. This is a sliver of GitClear’s 100+ reports available to subscribers.

23 of 26

Historic Stats: Long-Term Diff Delta

Historic stats show how temporal factors impact developer output

24 of 26

Historic Stats: Diff Delta by Team

How are various teams at the company progressing on their tasks?

25 of 26

Issue Stats: How Much Work Comes From Jira?

Issue stats make clear when bugs increase, or when the team is working on undocumented work

26 of 26

Your Team Has Experts, So Who Are They?

A customer favorite, our Domain Experts report uses git data to identify who in the company has the most experience writing every type of code across your projects.

These people can form the interview team if you need to hire more experts. They’re also prime candidates if you need to mentor aspiring Junior Developers.

Actual developer data, aggregated across 20 of the largest open source repos over past 12 months