Git Workflows, Part 2: Crafting Commits

Posted in Git

permalink

Summary

  • Make your commits small and atomic, and recombine them into larger commits later; it's easier to combine smaller commits than to split large commits.

  • Make use of git add -p and git add -e to stage changes selectively and atomically.

  • Make use of git rebase and git cherry-pick to edit your commits and assemble them in the order you want.

  • Once commits have been combine and the history is satisfactory, push to a remote to share the work.

  • Think about ordering your commits to "tell a story". (What that means will depend on the people you are collaborating with!)

What is a commit

Before we get into the good stuff, let's talk about the anatomy of a git commit.

When you add files to your git repository, it's a two-step process: git add and git commit. The first step stages your changes, the second step memorializes those staged changes into a commit that can now be shared with others by pushing it to git remotes.

git add

It is important to know that git does not keep track of changes at the file level, it keeps track of changes at the character/line level.

What that means is, when you modify a line in a file that is in your git repository, and run git add to stage your change, git has created an object under the hood called a blob to represent that one line change.

If you change two lines in two different parts of a file, and stage those changes using git add, git will treat this as two separate changes, and represent the changes with two different blobs.

git commit

As you use git add to prepare your changes, the changes are added to a staging area. Think of this staging area as a draft commit. Each change being added to the staging area changes how the commit will look. When the changes are complete and the user runs git commit, it turns the staging area into a real commit, creates the metadata, and calculates hashes.

When a commit is created, it receives a name, which is the hash of the contents of the commit. The hash is computed from the contents of the blobs, plus the metadata about the commit, plus the hash of the prior commits. Changing a commit changes its hash, and will change the hashes of all subsequent commits.

Commits in your local repository can be easily rewritten and edited, and their hashes changed. A common workflow is to make many small commits, and recombine them later.

Because the commit hash is how the commit is named, modifying commits after you've shared them is bad practice and will create extra work for your collaborators. For that reason, don't git push until you're ready to share your work.

git rebase

The git rebase command allows you to edit your commit history. We will cover some usage patterns in the sections below.

git fetch and git pull

Before pushing changes to the remote, first check if there have been any commits since you began your branch.

rebase, merge, branch, pass

If a feature branch is created off of the master branch, and some time passes, the feature branch base commit may grow far out of sync with the master branch. (Note that master indicates the primary branch.)

This leaves the developer of the feature branch (which is out of sync with master) with a few choices:

  • rebase - continue to rebase all commits on the feature branch from the (old) original feature branch base commit onto the (new) head commit of the master branch.

    • Pros: clean history, easy for one-branch-one-developer workflow
    • Cons: requires continual force-pushes, requires coordination between developers to prevent squashing others' work, not scalable, some people hate this method
  • merge - occasionally merge work from the master branch into the feature branch.

    • Pros: simple to understand, simple to carry out, low cognitive load
    • Cons: any changes added to the branch via the merge commit will show up in the PR as new code, cluttering PR reviews by mixing features with merged changes; can also make the commit history messy and harder to understand.
  • branch - by making heavy use of throwaway branches and integration branches, it is easier to test out how the integration of a feature branch based on an old commit on master will do when merging it in with a newer version of master. Use throwaway integration branches to test out merging the two branches together, testing its functionality, etc. You can also rebase or cherry pick commits onto the throwaway integration branch, and figure out how to arrange the commits on a branch to "rebuild" it into a working, mergeable branch.

    • Pros: easy to do, encourages local use of throwaway branches
    • Cons: clutters branches, integration process has to be repeated (can be mitigated with git rerere), merge commits must wait until PR is approved
  • pass - best combined with the branch approach mentioned above, the pass approach is to leave the branch history clean, avoid force-pushes, and rely on throwaway branches to test out merge strategies once the inevitable PR merge needs to happen. It can also be useful to wait for code reviews to finish, then create a merge commit to make the merge happen smoothly.

    • Pros: easy to do
    • Cons: merge commits must wait until PR is approved

git push

Once you run git push, all of the commits on the branch that you pushed will end up on the remote, where others can access them. The purpose of a git push is to share commits, so generally you don't push branches until they are ready to share. This also allows more flexibility in crafting, rewriting, and combining commits.

force pushing

If you pushed a branch (which is a collection of commits) to a remote, and then you have edited those commits, you will run into a problem when you try and git push the new, edited versions of the commits to the same remote. The remote will detect that there are conflicting versions of the branch and will reject the changes.

That's where git push --force comes in. The --force flag tells the remote to discard its version of the branch and use the version of the branch that you are pushing.

We will cover more about force pushing - when to do it, when not to, and why some people hate it - in a later post. For now, we will only say that you should not force push often, since you can risk deleting others' work and creating additional confusion and work for all of your collaborators.

Commit Workflow

Principles

Here are some principles for your git commit workflow:

  • Commit small changes often.

  • Don't sweat the commit messages - they can be fixed up later.

  • Related - nobody will see your commits until you push your branch, so think of your branch as a scratch space. You have the ultimate freedom to use it however you want.

  • Branches are easy to create, so make liberal use of branches!

  • Be wary of force pushing, and of rewriting history.

Making Small Commits

Two essential git commands to help with making small commits are git add (patch mode) and git add (interactive mode).

git add patch mode

How to use:

git add -p <name-of-file>

The git add -p command allows the user to interactively stage individual changes made (in what is called patch mode). This means users can stage certain changes for one commit, then stage other changes for a different commit.

This solves the problem of making a long sequence of changes to a single file that should be logically separated into different steps. (For example, changing the import statements versus changing the name of a variable throughout a file).

For example, suppose we have the following changes to a file named doit.sh:

$ git diff doit.sh
diff --git a/doit.sh b/doit.sh
index 3b938a1..6c1aec8 100644
--- a/doit.sh
+++ b/doit.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
 #
-# This script lists the 40 largest files in the git repo history
+# This script lists the 50 largest files in the git repo history

 $ git rev-list --all --objects | \
      sed -n $(git rev-list --objects --all | \
@@ -9,9 +9,9 @@ $ git rev-list --all --objects | \
      grep blob | \
      sort -n -k 3 | \
      \
-     tail -n40 | \
+     tail -n50 | \
      \
      while read hash type size; do
           echo -n "-e s/$hash/$size/p ";
      done) | \
-     sort -n -r -k1
+     sort -nru -k1

There are two related changes and one unrelated change, respectively: the two related changes are the change to the comment and the change to the tail command; the unrelated change is adding the -u flag to the sort command.

We can split these changes into two commits using git add -p doit.sh, which will walk through each change in the file and ask if we want to stage it:

$ git add -p doit.sh
diff --git a/doit.sh b/doit.sh
index 3b938a1..6c1aec8 100644
--- a/doit.sh
+++ b/doit.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
 #
-# This script lists the 40 largest files in the git repo history
+# This script lists the 50 largest files in the git repo history

 $ git rev-list --all --objects | \
      sed -n $(git rev-list --objects --all | \
Stage this hunk [y,n,q,a,d,j,J,g,/,e,?]? y

@@ -9,9 +9,9 @@ $ git rev-list --all --objects | \
      grep blob | \
      sort -n -k 3 | \
      \
-     tail -n40 | \
+     tail -n50 | \
      \
      while read hash type size; do
Stage this hunk [y,n,q,a,d,j,J,g,/,e,?]? y

@@ -14,14 +14,14 @@        echo -n "-e s/$hash/$size/p ";
      done) | \
-     sort -n -r -k1
+     sort -nru -k1
Stage this hunk [y,n,q,a,d,j,J,g,/,e,?]? n

Now the two related changes are staged, and the unrelated change is not staged. This is reflected in git status:

$ git status
On branch master
Your branch is ahead of 'gh/master' by 2 commits.
  (use "git push" to publish your local commits)

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    modified:   doit.sh

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   doit.sh

Now git commit will commit only the staged portions.

Do not provide any filenames to git commit, so that git will only commit the staged changes.

To use this in your workflow, think about how you can group different changes together into different commits. If you get a portion of a feature working, you can commit the changes in groups so that related changes get committed together.

Also remember that if your commit history ends up being excessively long or overly detailed, you can always examine what changes different commits made with git diff, and reorder them with git cherry-pick or modify/combine them with git rebase.

git add editor mode

How to use:

git add -e <name-of-file>

Like the interactive patch mode, git add -e allows you to selectively stage certain changes in a file. But it's much better for keyboard jockeys and those that love their text editor, because you can choose which changes to stage or not using the text editor.

A sidebar:

If you have not yet set the text editor that git uses, you should do that now. Modify your git configuration with this command:

git config --global core.editor vim

Alternatively, put the following in your ~/.gitconfig:

[core]
    editor = vim

(Or, you know, whatever your text editor of choice is.)

End of sidebar.

When you pass the -e flag to git add, it will open a new editor window with the full diff:

diff --git a/doit.sh b/doit.sh
index 326273c..14e4059 100644
--- a/doit.sh
+++ b/doit.sh
@@ -1,17 +1,17 @@
 #!/bin/bash
 #
-# This script lists the 50 largest files in the git repo history
+# This script lists the 10 largest files in the git repo history

 $ git rev-list --all --objects | \
      sed -n $(git rev-list --objects --all | \
      cut -f1 -d' ' | \
      git cat-file --batch-check | \
      grep blob | \
      sort -n -k 3 | \
      \
-     tail -n50 | \
+     tail -n10 | \
      \
      while read hash type size; do
           echo -n "-e s/$hash/$size/p ";
      done) | \
-     sort -nru -k1
+     sort -nr -k1

Editing this file requires some care!

Fortunately there is a section in the documentation for git add called Editing Patches.

Two things to remember:

  • Lines starting with + indicate new, added content. To prevent this content from being added, delete the line.

  • Lines starting with - indicate removed content. To keep this content, replace - with a space ().

Once you are finished, make sure you review the changes that are staged, particularly if this is the first time seeing patch files or the diff syntax.

Modifying Commits

There is always some reason or another to modify the commit history of a repository - perhaps someone's work was lost, or the wrong issue or pull request number was referenced, or a username was misspelled.

You can always modify a commit, but it will also modify every commit that came after it. Think of it like replaying the changes recorded in each commit onto the new branch. The contents of each commit changes slightly, so the hash (the name) of every commit changes.

git rebase

To do a git rebase, an interactive rebase (the -i flag) is recommended.

The rebase action takes two commits, and will replay the commits.

IMPORTANT: The first commit given (the start commit) is not included in the rebase. To include it, add ~1 to the start commit. (For example, 0a1b2c3d~1 refers to the commit before commit 0a1b2c3d.

rebasing a range of commits

To rebase from the start commit hash to the end commit hash, and include the start commit in the rebase, the rebase command is:

git rebase -i START_COMMIT_HASH~1 END_COMMIT_HASH

This does not indicate a destination branch. The default behavior is for the branch to move and the new pile of commits to retain the same branch name.

rebasing onto another branch

To rebase a range of commits onto a different branch (for example, onto a master branch that has the latest changes from the remote), use the --onto flag:

git rebase -i START_COMMIT_HASH END_COMMIT_HASH --onto TARGET_BRANCH

IMPORTANT: The above rebase commands will leave your repo in a headless state - unlike the behavior of the prior command, the branch label will not move with you to the new pile of commits.

Run git checkout -b <branchname> to give your new rebased branch a meaningful name. This creates a branch wherever HEAD is, which is pointing to the top of the pile of rebased commits.

If you want the old branch label to move to the new pile of commits, it requires a bit of branch housekeeping - you have to delete the old branch, then create a new branch from where HEAD is (the end of the rebase), then check out that branch.

git branch -D <branchname> && git checkout -b <branchname>

Rearranging Commits

Where rebasing allows for editing commits en masse, cherry picking allows the changes made in individual commits to be applied anywhere - including other branches. This makes the atomic commit principle from the beginning of this post much easier - groups of related commits that happened out of order can be rearranged by cherry picking them onto a new branch, and the new branch is a better "story".

Combining Commits

The cherry pick operation can also be combined with a rebase - once multiple small commits are arranged together chronologically, a git rebase operation enables squashing those tiny commits into a small number of larger commits, all carrying related changes.

Tags:    git    rebase    cherry-pick    branching    version control