git-exfiltrate: rescue large branches
Transform large pull-requests into smaller manageable ones.
(to go directly to the tool click here)
Small 200 line Pull Requests ought to be the default and most common mode of development. Evidence of this has been clearly demonstrated by many companies and researchers. It’s proven to improve development velocity, quality of code, quality of the final product, and provide the business with the agility to change direction without significant waste or over-investing.
Unfortunately, we don’t all live in the world of small pull requests. Yes, you may know some individual developers who do this, or perhaps you work in one of the few companies effectively working in small PRs, but the reality is that the “industry default mode” of development is submitting PRs for review that are thousands of lines changed across multiple areas of an application. Here’s some examples I’ve collected from real life.
Ways to break up a large PR
Traditionally this has been an exhausting process of removing files, creating more branches, adding files back, and a lot of cut+paste work. This is error-prone and can lead to missing code or even copying over outdated code that was not “final”. To be frank, it’s a massive pain and the developers who create large PRs also don’t like breaking up their large PRs after they were written.
I’ve created a tool for you called “the git exfiltrator” (exfiltrate means the opposite of infiltrate). It will break your massive branch apart into smaller branches based on file paths.
Who is working with large PRs?
A typical development process that causes large PRs can look like this:
If this process was pushing fine-grained tasks (200 LOC) it might be manageable, but that’s less common than we would like. We have this problem for any number of reasons, but all of those reasons usually require a long cultural transformation, the kind of thing you can’t change with a single email to the team.
We know that in large PRs:
Detailed code reviews aren’t happening
Adding reviewers actually reduces quality
Unrelated changes are easily overlooked.
It takes longer to ship the complete feature
It demands a longer work backlog
It reduces the company’s ability to quickly change goals
It reduces the possibility for rapid learning and innovation.
If 9 out of 10 parts of a PR are perfect, but a single flaw exists in 1 part, then all 10 elements are held back and wait for that single part to get fixed.
Possible ways to break up a large PR
Move out all the noise: readme, meta files, dependencies, etc.
Align on the functionality: if you break up functionality by folder, this could be trivial.
Align on the high-level design: a module’s API interfaces, database schema, controller paths and URLs, cross-domain-calls, etc.
Focus on tests: Review the tests first, if they’re incorrect or missing you can start here and require development to incrementally fulfill tests (kind of a strange wormhole style of TDD).
Check out this concept piloted at Microsoft a few years ago: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ClusterChages-ICSE2015.pdf and https://www.microsoft.com/en-us/research/publication/helping-developers-help-themselves-automatic-decomposition-of-code-review-changesets/
Further reading about why long PRs are bad for everyone
https://google.github.io/eng-practices/review/developer/small-cls.html
https://www.slideshare.net/RodrigoMiguel14/the-size-of-the-pull-request-is-more-important-than-you-think-2
https://smartbear.com/learn/code-review/best-practices-for-peer-code-review/
https://www.process.st/littles-law/
https://codeclimate.com/blog/most-impactful-software-metrics/
https://www.atlassian.com/blog/archives/creating_optimal_reviews
https://smallbusinessprogramming.com/optimal-pull-request-size/
https://egoless.tech/code-review-essentials/
https://buttercms.com/blog/5-things-your-team-should-do-to-make-pull-requests-less-painful