Limbo: Why users are more error-prone with git than other VCSes
Limbo is a term I use but VCS authors don't. However, that's because
they tend to ignore a certain state that exists in all major VCSes
(and give it no name because they tend to ignore it) despite the fact
that this state seems to be the largest source of errors. I call this
state limbo.
How to make git behave like other VCSes
Most potential git users probably don't want to read this whole page,
and would like their knowledge from usage of other VCSes to apply
without learning how the index and limbo are different in git than
their previous VCS (despite the really cool extra functionality it
brings). This can be done by
- Always using git diff HEAD instead of git diff
and
- Always using git commit -a instead of git commit
Either make sure you always remember those extra arguments, or
come back and read this page when you get a nasty surprise.
The concept of Limbo
VCS users are accustomed to thinking of using their VCS in terms of
two states -- a working copy where local changes are made, and the
repository where the changes are saved. However, the working copy is
split into three sets (see also VCS
concepts):
- (explicitly) ignored -- files inside your working copy
that you explicitly told the VCS system to not track
- index -- the content in your working copy that you asked
the VCS to track; this is the portion of your working copy that
will be saved when you commit (in CVS, this is done using the
CVS/Entries files)
- limbo -- not explicitly ignored, and not explicitly
added. This is stuff in your working copy that won't be
checked in when you commit but you haven't told the VCS to
ignore, which typically includes newly created files.
The first state is identical across all major VCSes. The second two
states are identical across cvs, svn, bzr, hg, and likely others. But
git splits the index and limbo differently.
One could imagine a VCS which just automatically saves all
changes that aren't in an explicitly ignored file (including newly
created files) whenever a developer commits, i.e. a VCS where there is
no limbo state. None of the major VCSes do this, however. There are
various rationales for the existence of limbo: maybe developers are
too lazy to add new files to the ignored list, perhaps they are
unaware of some autogenerated files, or perhaps the VCS only has one
ignore list and developers want to share it but not include their own
temporary files in such a shared list. Whatever the reason, limbo is
there in all major VCSes.
Changes in limbo are a large source of user error
The problem with limbo is that changes in this state are, in my
experience, the cause of the most errors with users. If you create a
new file and forget to explicitly add it, then it won't be included in
your commit (happens with all the major VCSes). Naturally, even those
familiar with their VCS forget to do that from time to time. This
always seems to happen when other changes were committed that depend
on the new files, and it always happens just before the relevant
developers go on vacation...leaving things in a broken state for me to
deal with. (And sure, I return the favor on occasion when I simply
forget to add new files.)
A powerful feature of git
Unlike other VCSes, git only commits what you explicitly tell it to.
This means that without taking additional steps, the command "git
commit" will commit nothing (in this particular case it typically
complains that there's nothing to commit and aborts). git also gives
you a lot of fine-grained control over what to commit, more than most
other VCSes. In particular, you can mark all the changes of a given
file for subsequent committing, but unlike other VCSes this only means
that you are marking the current contents of that file for
commit; any further changes to the same file will not be included in
subsequent commits unless separately added. Additionally, recent
versions of git allow the developer to mark subsets of changes in an
existing file for commit (pulling a handy feature from darcs). The
power of this fine-grained choose-what-to-commit functionality is made
possible due to the fact that git enables you to generate three
different kinds of diffs: (1) just the changes marked for commit
(git diff --cached), (2) just the changes you've made to files
beyond what has been marked for commit (git diff), or (3) all
the changes since the last commit (git diff HEAD).
This fine-grained control can come in handy in a variety of special
cases:
- When doing conflict resolution from large merges (or even just
reviewing a largish patch from a new contributor), hunks of
changes can be categorized into known-to-be-good and
still-needs-review subsets.
- It makes it easier to keep "dirty" changes in your working copy
for a long time without committing them.
- When adding a feature or refactoring (or otherwise making
changes to several different sections of the code), you can mark
some changes as known-to-be-good and then continue making
further changes or even adding temporary debugging snippets.
These are features that would have helped me considerably in some
GNOME development tasks I've done in the past.
How git is more problematic
This decision to only commit changes that are explicitly added, and
doing so at content boundaries rather than file boundaries, amounts to
a shift in the boundary between the index and limbo. With limbo being
much larger in git, there is also more room for user error. In
particular, while this allows for a powerful feature in git noted
above, it also comes with some nasty gotchas in common use cases as
can be seen in the following scenarios:
- Only new files included in the commit
- Edit bar
- Create foo
- Run git add foo
- Run git commit
In this set of steps, users of other VCSes will be surprised
that after step 4 the changes to bar were not included in the
commit. git only commits changes when explicitly asked. (This
can be avoided by either running git add bar before
committing, or running git commit -a. The -a flag to
commit means "Act like other VCSes -- commit all changes in any
files included in the previous commit".)
- Missing changes in the commit
- Create/edit the file foo
- Run git add foo
- Edit foo some more
- Run git commit
In this set of steps, users of other VCSes will be surprised
that after step 4 the version of foo that was commited was the
version that existed at the time step 2 was run; not the version
that existed when step 4 was run. That's because step 2 is
translated to mean mark the changes currently in the file
foo for commit. (This can be avoided by running git add
foo again before committing, or running git commit -a
for step 4.)
- Missing edits in the generated patch
- Edit the tracked file foo
- Run git add foo
- Edit foo some more
- Run git diff
In this set of steps, users of other VCSes will be surprised
that at step 4 they only get a list of changes to foo made in
step 3. To get a list of changes to foo made since the last
commit, run git diff HEAD instead.
- Missing file in the generated patch
- Create a new file called foo
- Run git add foo
- Run git diff
In this set of steps, users of other VCSes will be surprised
that at step 3 the file foo is not included in the diff (unless
changes have been made to foo since step 2, but then only those
additional changes will be shown). To get foo included in the
diff, run git diff HEAD instead.
These gotchas are there in addition to the standard gotcha exhibited
in all the major VCSes:
- Missing file in the commit
- Edit bar
- Create a new file called foo
- Run vcs commit (where vcs is cvs, svn, hg,
bzr, git, or likely most any other VCS)
In this set of steps, the edits in step 1 will be included in
the commit, but the file foo will not be. The user must first
run vcs add foo (again, replacing vcs with the
relevant VCS being used) before committing in order to get foo
included in the commit.