Maintaining linear commit history in git
Merging is one of git’s most powerful abilities, but with great power, comes great responsibility. I use merging very sparingly, as I strongly prefer having linear history in my repositories.
Here’s how (and why).
Pulling with rebase
It’s easy to create unnecessary merge commits accidentally, because the
default behavior of git pull
is
to merge. You can specify git pull --rebase
on the command line, but
this is quite easy to forget, and let’s not kid ourselves, most people
use git with a frontend.
You can change this default globally, or per-repository:
git config --global pull.rebase true
If your frontend overrides these defaults (e.g. specifies some flags directly, or has a custom implementation of git that ignores your config), you can usually configure it separately.
Avoiding merge commits when incorporating branches
You can include all (or some!) commits from a different branch, by using
git rebase
. My favourite thing
git is rebase with the --interactive
flag, which will pop up a text
editor (or work with your git frontend) to let you choose which commits,
and in what order, you want to take in. It will also allow you to
squash, edit (for amending), reword commits, etc. so it’s a very useful
tool, also to help you keep your own history clean.
If you’re interested in only one commit, you can also use git cherry-pick
instead.
These are good strategies, but only if the branch history is relatively clean (but you could still just take all the commits, or squash them), and if it hasn’t diverged too far.
However long-lived branches that just continue diverging are a bad sign anyway: it will increase the burden on the reviewers (and, by extension, on whoever authored the branch, as the feedback comes in), once the work is actually ready for merging. I see the situation as a variant of “release early, release often” principle; you don’t have to actually ship a feature until it’s ready - feature flags give you much better control of when and how to roll out!
Sometimes it’s not feasible to sync the branches early, like, for example, with Sam Gross’ work on removing the Python GIL, which required Herculean effort (and which, in turn, has put it two major releases behind), but is still considered too experimental to merge (and too complex to put it behind a flag).
In these kinds of situations, it’s perfectly reasonable to resolve any remaining merge conflicts through a merge commit.
Bug? Bisect!
If you’ve ever hunted a regression (aka “at what point did this thing break”), you know how daunting it can be to find the exact line of code or commit that caused it.
With git bisect
, the VCS can do
the boring half of the work for you - literally, it will keep splitting
your commit history in halves, until you find the culprit.
You just need to test for the bug, and keep telling it git bisect good
, git bisect bad
or git bisect skip
(in case the revision is
untestable). If you have a shell script to reproduce the issue, it can
even do this for you 100% automatically!
But it only works, if the particular range of commits (where the bug was likely introduced) is linear - a good reason to avoid merge commits wherever they’re not necessary.
Git revision history is not history - it’s a series of changes
The purpose of the git revision history is NOT to record snapshots of your work as you arrive at a solution, but to manage changesets. A changeset (a commit, or a series of commits), is a first-class concept that git allows you to work with, the same way a text editor allows you to work with text files. In much the same manner as you can keep a separate file with a shopping list for your next trip, you can keep a changeset that adds rounded corners to all of your buttons. You don’t have to go on that trip this week, and you don’t have to ship rounded corners in this release.
Being able to edit the branch history makes it easy to include (and revert) these changesets as necessary, and retain the readability of the resulting history - so that when you decide the rounded corners are actually ugly, you don’t even have to mention their existence, much less litter the development branch with apply, revert, apply, revert.
One exception that I maintain to all of these rules is the master
branch. On every project I work on, I’ve configured Github to reject
force-pushes to master, so whatever goes to master, becomes permanent
record. This way I can always feel free to mess with the history, until
I decide it’s good enough.