Diff Delta
And how fusing commits together reduces tech debt & code review time
A visual guide to
LineImpacts
1
GitClear is an engineering insights tool built to create incentives to reduce tech debt and ship faster. We do it by identifying the rate at which code evolves (using our proven, proprietary metric Diff Delta) and weaving it together with a tapestry of Google DORA, pull request, and related stats.
By making code easier to read, and progress easier to follow, we help teams prevent the tech debt that saps developer enthusiasm on so many large projects.
GitClear went live to customers in 2020 and has been used by thousands of teams to improve their code quality & velocity.
The Average Git Repo: Branches
A development team might have 100+ commits made across their git repos every day. Nobody can keep track of everything. The average repo is a black box that periodically kicks out PRs when it’s working right.
At best, a couple developers know how the big picture connects.
Critical Moments Lost in the Jumble
A new bug is being introduced
A developer is refactoring legacy code that will prevent tremendous headaches down the road
A new dependency is being added that will create “app size” and security problems
When you’re lucky, you find out about these issues before your customers do. But just as often, critical moments happen and nobody knows about it until weeks or months later, if at all.
Somewhere in this commit jumble...
Certainly not in the commit list
Where Do We Look for Critical Commits?
You’re not going to catch critical moments in a flat list of hundreds of commits, separated by repo. Nobody in their right mind will use this to browse commit activity.
Not in PRs
Where Do We Look for Critical Commits?
PRs are supposed to clean up the mess that happens in development, but you’ve seen how that goes...
GitClear Opening the “Repo Black Box”
Let’s zoom into a single commit to see how GitClear’s rich commit parsing engine uses Diff Delta to simplify even the most intricate commit activity
The Old Way: Binary Diff
By treating every change as either an “add” or “delete,” other git tools put the onus on code reviewers to piece together the real story. Is an apparent block of "added code" is truly new, or just some legacy code moved in from another file?
Since 30% of all line changes are moved code, developers could be 30% faster understanding code by using a more refined diff tool.
Other diff viewers classify
every code change as either
File 1
File 2
or
The GitClear Way: Diff Like a Developer
File 1
File 2
Commit ABCD
Meanwhile, GitClear recognizes...
RICH DIFF PARSING DIFFERENTIATES SUBSTANTIVE CHANGES
By classifying code using the full set of code operations that developers recognize, we can extract semantic meaning to differentiate between big vs trivial changes
Here are seven types of operations we recognize in commits. Each operation is accompanied by a screenshot of how
the operation looks in the GitClear diff viewer.
Technical Details Optional In-Depth Reading
Operations Recognized by Diff Delta
Each added line of code counts for up to 10 points.
Each deleted line of code can count for up to 25 points.
By default, Diff Delta prizes code deletion most of all, for its role in reducing long-term tech debt.
Moved code (about 30% of all changed lines) is assigned no Diff Delta
One of the most common types of code change is the "no-op." This encompasses all changes to white space, blank lines added, and lines whose only change was their line number.
1 of 3
Technical Details Optional In-Depth Reading
Operations Recognized by Diff Delta
When part of a line changes, we label this an "update." Updates usually earn around 10 points, but can collect up to 30 points when they implement durable (i.e., not subsequently churned) changes to legacy code.
When a developer repeatedly adds ("pastes") the same line or block in multiple locations, across one or more commits. Copy/pasted code is assigned no Diff Delta, since it implies code duplication that’s likely to vex maintainers.
When a developer applies the same change to several lines en masse, this is detected
as "Find & replace." Such lines are worth up to 3 points.
2 of 3
Assuming you're using one of the 50 programming languages (including every modern language), Diff Delta will know a few more tricks out of the box. Below, a subset of language-specific idioms we recognize.
About 10% of lines changes are language keywords: these are transparent to Diff Delta.
Multi-line statements have their value assigned to the first line
Comments are assigned negligible Diff Delta
“Include statements” can comprise up to 5% of changed lines in React repos. Diff Delta treats them as trivial changes.
Technical Details Optional In-Depth Reading
Language Idioms Recognized by Diff Delta
3 of 3
Function definitions are not given special treatment for Diff Delta, but they are recognized to show documentation & hints in the diff viewer
Diff Delta ≈ Cognitive energy to make a change
Initial estimation phase
File 1
File 2
Commit ABCD
Once commit parsing is complete, we assign a provisional score based on the estimated cognitive energy needed to produce the changes in the commit [1].
Scores range from 0-30 per line, with the highest scores reserved for updates and deletions to legacy code, where tech debt tends to accumulate and only experts dare tread.
Changes like no-ops, moved code, and copy/paste get no Diff Delta. Most non-trivial changes earn 5-10 per changed line, less for short lines. New code earns relatively low value since it implies perpetual maintenance costs. Upgrading production-tested code (by removing and rewriting it) is how Senior Devs maximize their Diff Delta.
[1] See Diff Delta Factors or Calculating Diff Delta from First Principles
+100 Diff Delta
Initial Estimate
Diff Delta: Incorporating Churn
Secondary refinement phase
File 1
File 1
Commit ABCD
Commit EFGH
File 2
File 2
+ 50 Diff Delta
+ 20 Diff Delta
Commit Group ABCD + EFGH
WORK
RENDERED
OBSOLETE
by
Commit EFGH
100 - 50 (obsolete)
Diff Delta
= 50 New Delta
WORK
RENDERED
OBSOLETE
by
Commit ABCD
30 - 10 (obsolete)
Diff Delta
= 20 New Delta
Can Diff Delta be Trusted?
To evaluate whether Diff Delta could empirically outperform other common git metrics, GitClear collected 2,800 data points to compare how well Diff Delta correlated to “developer effort.” We found “Diff Delta” correlated up to 61% in large repos with effort. Read what academics had to say about our research.
Line Impact: Incorporating Churn
Secondary refinement phase
At right, the blue bar indicates the degree of correlation between Diff Delta and effort spend by the development team. Diff Delta matches effort better than conventional metrics “commits” or “lines of code.”
Commit Activity Browser: 2d Visual Map
GitClear rearranges the flat list of commits you’re used to into a colorful, 2d browsable map of commits
Commits are grouped by issue and condensed to hide irrelevant changes
Commit Groups Expose Critical Moments
What Happens by Fusing Commits ABCD + EFGH?
In the real world, commits don't happen in isolation. Diff Delta is built to get more accurate as more commits are added.
By fusing together related commits into Commit Groups, we cancel out noise from churn and get the ground truth about what precisely changed over the course of an issue’s implementation.
Fusing commits into commit groups is key to spotting bugs before they reach production.
File 1
File 1
Commit ABCD
Commit EFGH
File 2
File 2
Technical Details
Making a Commit Group
File 1
Commit Group ABCD + EFGH
+70 Diff Delta
File 2
File 1
File 1
File 2
File 2
Commits ABCD and EFGH
+130 Diff Delta
The Difference is (Git)Clear
Without GitClear:
With GitClear:
Thousands of startup and enterprise teams trust GitClear to make sense of their development activity
Don’t Take Our Word for It...
Trusted By
fff
Competitive Pricing for Every Size Team
Bonus Content
Encore reports for overachievers. This is a sliver of GitClear’s 100+ reports available to subscribers.
Historic Stats: Long-Term Diff Delta
Historic stats show how temporal factors impact developer output
Historic Stats: Diff Delta by Team
How are various teams at the company progressing on their tasks?
Issue Stats: How Much Work Comes From Jira?
Issue stats make clear when bugs increase, or when the team is working on undocumented work
Your Team Has Experts, So Who Are They?
A customer favorite, our Domain Experts report uses git data to identify who in the company has the most experience writing every type of code across your projects.
These people can form the interview team if you need to hire more experts. They’re also prime candidates if you need to mentor aspiring Junior Developers.
Actual developer data, aggregated across 20 of the largest open source repos over past 12 months