charlesreid1.com blog

Incorporating Terraform Commands into Makefiles

Posted in Terraform

permalink

Summary

This blog post covers a useful pattern for incorporating terraform commands into a Makefile.

This is useful for cases where terraform is being used to manage infrastructure. In the end you will be able to run a command like

make plan-infra
make deploy-infra

and and have this call the corresponding terraform commands to plan and deploy your terraformed cloud infrastructure.

The post is divided into a few steps:

  • Directory and file layout - how we lay out the files for this tutorial
  • Top level Makefile - make commands to add to the top level Makefile
  • Infra level Makefile - make commands to add to the infra level Makefile
  • Writing Terraform component - how to write a configurable component that is ready to terraform
  • Initializing Terraform component - script to initialize terraform components
  • Workflow - plan, deploy, update, destroy

Step 0: Directory and File Layout

This tutorial presumes you have a top level directory corresponding to a git repository. We will use the following directory structure for this example:

my-project/
    Readme.md
    environment
    Makefile
    infra/
        Makefile
        component-1/
            variables.tf
            main.tf

Environment Variables

In order to keep track of environment variables used in the terraform process, we use the file envronment in the top level project directory to keep all environment variable values under version control.

SOURCE="${BASH_SOURCE[0]}"
while [ -h "$SOURCE" ] ; do SOURCE="$(readlink "$SOURCE")"; done
export PROJECT_HOME="$(cd -P "$(dirname "$SOURCE")" && pwd)"

set -a
PROJECT_DEPLOYMENT_STAGE="dev"

# bucket name
PROJECT_S3_BUCKET="my-organization-my-project-my-bucket"

# aws tags
PROJECT_INFRA_TAG_PROJECT="my-project"
PROJECT_INFRA_TAG_SERVICE="my-service"
PROJECT_INFRA_TAG_OWNER="whoami@email.com"

# aws settings
AWS_DEFAULT_OUTPUT=json
AWS_DEFAULT_REGION=us-east-1
set +a

Optionally, a local environment file can contain environment variable values that are sensitive or should not be kept under version control, so add this to the bottom of the environment file too:

if [[ -f "${PROJECT_HOME}/environment.local" ]]; then
    source "${PROJECT_HOME}/environment.local"
fi

Step 1: Top Level Makefile

Start by creating the plan-infra and deploy-infra commands in your top-level Makefile. These commands will, in turn, call make commands defined in infra/Makefile:

plan-infra:
    $(MAKE) -C infra plan-all

deploy-infra:
    $(MAKE) -C infra apply-all

The -C infra flag indicates make should use a Makefile in a subdirectory.

Step 2: Infra Level Makefile

Next we define infra/Makefile. This Makefile will have two parts:

  • terraform commands for a single component (example: init, plan, apply, destroy)
  • wrapper commands to run the above commands for every component (example: for each component, run the plan terraform command)

We cover the Makefile from the bottom up.

Fake Targets

Start by defining "fake" targets or rules, that is, make rules whose names are not file names:

.PHONY: init-all plan-all apply-all clean-all plan apply destroy init clean

Commands for Single Components

Next, above that, we define terraform commands for a single component:

init:
    rm -rf $(COMPONENT)/.terraform/*.tfstate
    ./build_deploy_config.py $(COMPONENT)
    cd $(COMPONENT); terraform init;

plan: init
    cd $(COMPONENT); terraform plan -detailed-exitcode

apply: init
    cd $(COMPONENT); terraform apply

destroy: init
    cd $(COMPONENT); terraform destroy

clean:
    cd $(COMPONENT); rm -rf .terraform

Note that the init command is running a build_deploy_config.py script, which we will cover in a moment. This script creates the terraform variables file variables.tf and populates the variable values using environment variables.

Commands for All Components

Above that, we have commands to perform each action on all components:

all: init-all

init-all:
    @for c in $(COMPONENTS); do \
        $(MAKE) init COMPONENT=$$c || exit 1; \
    done

plan-all:
    @for c in $(COMPONENTS); do \
        $(MAKE) plan COMPONENT=$$c || exit 1; \
    done

apply-all:
    @for c in $(COMPONENTS); do \
        $(MAKE) apply COMPONENT=$$c || exit 1; \
    done

destroy-all:
    @for c in $(COMPONENTS); do \
        $(MAKE) destroy COMPONENT=$$c || exit 1; \
    done

clean-all:
    @for c in $(COMPONENTS); do \
        $(MAKE) clean COMPONENT=$$c || exit 1; \
    done

Variables

Last but not least, we define a few variables at the top of the Makefile: most importantly, the list of infrastructure components. This is created by extracting the names of subdirectories in infra/ containing *.tf files:

DIRS=${shell find . -name "*.tf" -exec dirname {} \; | sort --unique}
COMPONENTS=${shell for d in $(DIRS); do basename $$d; done}

Final Makefile

Here is the final infra/Makefile:

infra/Makefile:

DIRS=${shell find . -name "*.tf" -exec dirname {} \; | sort --unique}
COMPONENTS=${shell for d in $(DIRS); do basename $$d; done}

all: init-all

init-all:
    @for c in $(COMPONENTS); do \
        $(MAKE) init COMPONENT=$$c || exit 1; \
    done

plan-all:
    @for c in $(COMPONENTS); do \
        $(MAKE) plan COMPONENT=$$c || exit 1; \
    done

apply-all:
    @for c in $(COMPONENTS); do \
        $(MAKE) apply COMPONENT=$$c || exit 1; \
    done

destroy-all:
    @for c in $(COMPONENTS); do \
        $(MAKE) destroy COMPONENT=$$c || exit 1; \
    done

clean-all:
    @for c in $(COMPONENTS); do \
        $(MAKE) clean COMPONENT=$$c || exit 1; \
    done

plan: init
    cd $(COMPONENT); terraform plan -detailed-exitcode

apply: init
    cd $(COMPONENT); terraform apply

destroy: init
    cd $(COMPONENT); terraform destroy

init:
    rm -rf $(COMPONENT)/.terraform/*.tfstate
    ./build_deploy_config.py $(COMPONENT)
    cd $(COMPONENT); terraform init;

clean:
    cd $(COMPONENT); rm -rf .terraform

.PHONY: init-all plan-all apply-all clean-all plan apply destroy init clean

Step 3: Writing Terraform Components

As an example, we will consider an example of terraform-managed buckets.

Start by creating a directory called infra/buckets/ to store terraform files for creating and managing the buckets.

We can create one file per cloud provider. As an example, here is s3.tf:

s3.tf:

data "aws_caller_identity" "current" {}

locals {
  common_tags = "${map(
    "project"   , "${var.PROJECT_INFRA_TAG_PROJECT}",
    "env"       , "${var.PROJECT_DEPLOYMENT_STAGE}",
    "service"   , "${var.PROJECT_INFRA_TAG_SERVICE}"
  )}"
  aws_tags = "${map(
  "Name"      , "${var.PROJECT_INFRA_TAG_SERVICE}-s3-storage",
  "owner"     , "${var.PROJECT_INFRA_TAG_OWNER}",
  "managedBy" , "terraform"
  )}"
}

resource aws_s3_bucket dss_s3_bucket {
  count = length(var.PROJECT_S3_BUCKET) > 0 ? 1 : 0
  bucket = var.PROJECT_S3_BUCKET
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }
  tags = merge(local.common_tags, local.aws_tags)
}

Note that this requires several environment variables to be defined in environment and requires the operator to run:

source environment

Step 4: Initializing Terraform Components

The following script will automatically generate terraform files for our component that are populated with the correct environment variable values.

It is called build_deploy_config.py.

Start with a simple argument parser that just accepts a single argument, the component to make terraform files for:

parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("component")
args = parser.parse_args()

Next, we define several terraform file templates using Python's bracket template syntax. Start with a template for defining a terraform variable:

terraform_variable_template = """
variable "{name}" {{
  default = "{val}"
}}
"""

Next, define a template for the terraform backend buket:

terraform_backend_template = """# Auto-generated during infra build process.
# Please edit infra/build_deploy_config.py directly.
terraform {{
  backend "s3" {{
    bucket = "{bucket}"
    key = "{comp}-{stage}.tfstate"
    region = "{region}"
    {profile_setting}
  }}
}}
"""

Next define terraform cloud providers:

terraform_providers_template = """# Auto-generated during infra build process.
# Please edit infra/build_deploy_config.py directly.
provider aws {{
  region = "{aws_region}"
}}
"""

Provide a list of environment variables that should also be defined as terraform variables:

env_vars_to_infra = [
PROJECT_DEPLOYMENT_STAGE="dev"

# bucket name
PROJECT_S3_BUCKET="my-organization-my-project-my-bucket"

# aws tags
PROJECT_INFRA_TAG_PROJECT="my-project"
PROJECT_INFRA_TAG_SERVICE="my-service"
PROJECT_INFRA_TAG_OWNER="whoami@email.com"

# aws settings
AWS_DEFAULT_OUTPUT=json
AWS_DEFAULT_REGION=us-east-1

    "AWS_DEFAULT_REGION",
    "PROJECT_DEPLOYMENT_STAGE",
    "PROJECT_S3_BUCKET",
    "PROJECT_INFRA_TAG_PROJECT",
    "PROJECT_INFRA_TAG_SERVICE",
    "PROJECT_INFRA_TAG_OWNER"
]

Finally, substitute environment variable values into the templates, and write the templated content to the appropriate *.tf file. First, the backend:

# Write backend.tf
with open(os.path.join(infra_root, args.component, "backend.tf"), "w") as fp:
    caller_info = boto3.client("sts").get_caller_identity()
    if os.environ.get('AWS_PROFILE'):
        profile = os.environ['AWS_PROFILE']
        profile_setting = f'profile = "{profile}"'
    else:
        profile_setting = ''
    fp.write(terraform_backend_template.format(
        bucket=os.environ['DSS_TERRAFORM_BACKEND_BUCKET_TEMPLATE'].format(account_id=caller_info['Account']),
        comp=args.component,
        stage=os.environ['DSS_DEPLOYMENT_STAGE'],
        region=os.environ['AWS_DEFAULT_REGION'],
        profile_setting=profile_setting,
    ))

Next, the variables.tf for the component:

# Write variables.tf
with open(os.path.join(infra_root, args.component, "variables.tf"), "w") as fp:
    fp.write("# Auto-generated during infra build process." + os.linesep)
    fp.write("# Please edit infra/build_deploy_config.py directly." + os.linesep)
    for key in env_vars_to_infra:
        val = os.environ[key]
        fp.write(terraform_variable_template.format(name=key, val=val))

Finally, the cloud providers file providers.tf:

with open(os.path.join(infra_root, args.component, "providers.tf"), "w") as fp:
    fp.write(terraform_providers_template.format(
        aws_region=os.environ['AWS_DEFAULT_REGION'],
        gcp_project_id=GCP_PROJECT_ID,
    ))

Workflow

We now have a top-level Makefile that wraps the plan and apply commands directly, and we have an infra-level Makefile with additional commands for managing infrastructure (plan, apply, destroy).

Plan

The terraform plan step assembles the various templated terraform files and substitutes environment variables into them, creating a new version of them with up-to-date values.

The plan step (make plan-infra) calls the build_deploy_config.py script (detailed above) to regenerate the terraform files when environment variables are changed.

make plan-infra

This script will iterate over each cloud infrastructure component in infra/, use terraform to plan the changes it would make to cloud resources, and print a summary of those changes to the screen.

The make plan-infra command does not change any cloud infra.

Deploy

The terraform deploy step makes the changes summarized in the make plan-infra step. This command automates the terraform commands, but still requires interactive "yes" responses to commands.

Update

Using the make plan-infra command will remake the terraform files using environment variable values, and will display any changes that will be made to cloud infra. This includes updates to existing infrastructure.

When you finish deploying infrastructure, store the version of your environment file in version control and tag it as the current deployed infra. This will make it easier to delete infra later.

If you need to rename infra, use the following workflow:

  1. Source the old environment file

  2. Destroy the old infra with the old names using:

    text make -C infra COMPONENT=buckets destroy

    Or destroy all infra with the destroy-all command:

    text make -C infra destroy-all

  3. Update the environment file with the new names, and source the new environment file

  4. Plan the new infra with

    text make plan-infra

  5. Deploy the new infra with

    text make deploy-infra

Destroy

As seen above, infrastructure components can be deleted with the delete command in the infra Makefile, and all infrastructure components can be deleted with the delete-all command.

To delete a particular component:

make -C infra COMPONENT=buckets destroy

To destroy all infra:

make -C infra destroy-all

Tags:    terraform    makefile    make    python   

Automatically Generating Up-To-Date requirements.txt for Python Projects

Posted in Python

permalink

Summary

In this post, we cover a pattern for automatically generating a requirements.txt file that has the latest compatible versions of required software, and that specifies the full and exact version of each package to make the Python environment reproducible.

This will turn a requirements input file (called requirements.txt.in for example) that looks like

numpy

into a requirements file that specifies the exact version of numpy and all dependencies, like

numpy==1.18.1

By the end of this post, you'll be able to do this to refresh and update the versions of all the software your project depends on:

make requirements.txt

All of this code comes from the Human Cell Atlas data-store project!

What is requirements.txt?

When developing a Python project, the requirements.txt file is a plain text file that contains a list of Python software packages that need to be installed for the current Python software package to work. The software can be installed using the command

pip install -r requirements.txt

For example, if a package foobar has import numpy at the top of a Python file in the project, the numpy package must be installed before importing foobar. In this case, the requirements.txt could just contain

numpy

or it could specify a particular version of numpy, or a minimum version of numpy:

numpy >= 1.10

Start by creating a requirements.txt.in, which should look like a normal requirements.txt file, listing software packages for pip to install (and optionally version information - but version information does not need to be specified).

This file is a looser set of specifications of software versions.

Example requirements.txt.in:

numpy
pandas > 0.22
sphinx

Converting requirements.txt.in to requirements.txt

Next, we use the requirements.txt.in file to install the latest versions of each software package (and all dependent software packages) into a virtual environment.

From that virtual environment, we can use pip freeze to output the names of each software package installed in the virtual environment, along with its exact version. This can be used to make a requirements.txt file.

The manual steps are

virtualenv -p $(which python3) venv
venv/bin/pip install -r requirements.txt
venv/bin/pip install -r requirements.txt.in
venv/bin/pip freeze >> requirements.txt
rm -fr venv

Using pip freeze means the resulting results.txt contains detailed version numbers:

alabaster==0.7.12
Babel==2.7.0
certifi==2019.11.28
chardet==3.0.4
docutils==0.15.2
idna==2.8
imagesize==1.1.0
Jinja2==2.10.3
MarkupSafe==1.1.1
numpy==1.17.4
packaging==19.2
pandas==0.25.3
Pygments==2.5.2
pyparsing==2.4.5
python-dateutil==2.8.1
pytz==2019.3
requests==2.22.0
six==1.13.0
snowballstemmer==2.0.0
Sphinx==2.2.2
sphinxcontrib-applehelp==1.0.1
sphinxcontrib-devhelp==1.0.1
sphinxcontrib-htmlhelp==1.0.2
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.2
sphinxcontrib-serializinghtml==1.1.3
urllib3==1.25.7

This is automated with a make rule next.

Automating the step with a make rule

We have a nice Makefile rule that can be dropped into any Makefile that allows users to run

make requirements.txt

and it will use requirements.txt.in, perform the above steps, and output an updated requirements.txt with the latest compatible versions of software.

Here is the Makefile rule:

requirements.txt: %.txt : %.txt.in
    [ ! -e .requirements-env ] || exit 1
    virtualenv -p $(shell which python3) .$<-env
    .$<-env/bin/pip install -r $@
    .$<-env/bin/pip install -r $<
    echo "# You should not edit this file directly.  Instead, you should edit $<." >| $@
    .$<-env/bin/pip freeze >> $@
    rm -rf .$<-env

Summary of the make rule:

  • The first two lines create a virtual environment at .requirements-env/

  • The next two lines run pip install, first on requirements.txt (the existing version), then requirements.txt.in (which installs/updates any software packages in requirements.txt.in)

  • A comment is added to the top of the requirements.txt file to help give users a hint about where to update software requirements.

  • The pip freeze command is used to create a requirements.txt file from the current virtual environment

Refreshing requirements

To update the requirements, update the requirements.txt with these manual steps:

refresh_all_requirements:
    @cat /dev/null > requirements.txt
    @if [ $$(uname -s) == "Darwin" ]; then sleep 1; fi  # this is require because Darwin HFS+ only has second-resolution for timestamps.
    @touch requirements.txt.in
    @$(MAKE) requirements.txt

Now requirements.txt can be updated with

make refresh_all_requirements

This can be done periodically, and the new requirements.txt updated in the version control system.

Tags:    python    pip    version control    make    makefile   

Git Workflows, Part 3: Refactoring Large Branches and Pull Requests

Posted in Git

permalink

Summary

  • If a feature branch or pull request gets too complicated and should be refactored into simpler pieces:
    • Create a new feature branch from the original destination branch
    • Turn commits into patches, or cherry-pick commits (leaving changes unstaged)
    • Apply patches or cherry-picks to the feature branch
    • Use git add --patch or git add --edit to selectively split out changes into separate commits

This post contains many common patterns applied to different workflows.

Managing Complexity

When collaborating on software, especially large software with people who are not the primary developers, it is important to limit the complexity of features and proposed changes. Why is it bad practice to propose large, complex changes?

  • It is harder to review the proposed changes
  • Bugs increase in likelihood, and increase in likelihood far faster than the amount of code.
  • More complex changes usually combine

Refactoring Large Branches

Consider the case of a large feature branch that is suffering from feature creep (trying to cram too many changes into one branch.) For example, in the process of implementing a feature, you may also implement significant fixups, refactoring of functions, and code cleanup that is in the same file but not entirely related. While writing tests for the new feature, you may also refactor tests to be cleaner, or to use context manager foobar, or etc.

To illustrate: suppose you are on a branch called feature (created off of master) that consists of three sets of changes, DEF:

A - B - C (master)
    \
     D1 - E1 - D2 - F1 - E2 - F2 - D3 - F3 - E3 (feature)
  • D corresponds to implementing the new feature and writing tests for it
  • E corresponds to fixups to the same file that was changed to implement the feature
  • F corresponds to fixups to tests unrelated to the new feature

Now, in reality, if things were really so clean, if you had a time machine or the patience to to rebase commits one at a time and split them into atomic changes only to the features in their scope (which would be super easy because of course your git logs are filled with helpful, concise commit messages) you could use git cherry-pick to replay commits D1, D2, D3 onto a new D branch, and so on.

But in reality, commit F1 contains a little bit of E1, and D2, and vice versa, and so on. It's much easier to navigate a diff and select pieces from it. That's were git add -e (or --edit) will help.

We also have to turn a set of commits into a single set of unstaged changes (meaning, replay the changes each commit made but not replay the commits themselves). There are a few ways to do this, we'll cover two ways: squashing and rolling back a set of commits, and converting a set of commits into a set of patch files.

Once the commits have been rolled back and unstaged, particular changes can be staged for each split commit using git add -e and using the editor to select which changes to include or exclude from the commit. As each commit is created, branches can be created that are linked to the group of changes in each new commit.

Converting a Set of Commits to Unstaged Changes

We are trying to untangle a set of unrelated changes into separate commits that group related changes together. For the example, we want to convert this:

A - B - C (master)
    \
     D1 - E1 - D2 - F1 - E2 - F2 - D3 - F3 - E3 (feature)

to this:

A - B - C (master)
    \
     D - E - F (feature)

so that the changes in commits D, E, and F are simpler, more limited in scope, and easier to review.

We cover three strategies for turning a sequence of commits like D1-E1-...-E3 into a set of unstaged changes. Then, particular changes can be selectively added to commits using git add -e (--edit) or git add -p (--patch).

git format-patch

To create a set of patches, one per commit, to edit them or apply them in various orders, you can use git format-patch with a commit range:

git format-patch D1..E3

This will create a series of patches like

patches/0001-the-D1-commit-message.patch
patches/0001-the-E1-commit-message.patch
patches/0001-the-D2-commit-message.patch
patches/0001-the-F1-commit-message.patch
patches/0001-the-E2-commit-message.patch
patches/0001-the-F2-commit-message.patch
patches/0001-the-D3-commit-message.patch
patches/0001-the-F3-commit-message.patch
patches/0001-the-E3-commit-message.patch

Patches can be further split or modified, and can be applied in the desired order (although changes in line numbers happening out of order may confuse the program applying the patch).

Start by creating a branch from the desired commit (commit B in the diagram above):

git checkout B

(where B should be either the commit hash for commit B, or a tag or branch that is associated with commit B). Now create a branch that will start from that commit (we'll start with our branch for feature D here):

git checkout -b feature-d

Now apply patches to the new branch, which will start from commit B.

To apply a patch, use patch -p1:

patch -p1 < 0001-the-D1-commit-message.patch

The -p1 strips the prefix by 1 directory level, which is necessary with patches created by git. We use patch rather than git am to apply the patch, because we want to apply the changes independent of git, and only stage the changes we want into our next commit.

If you have a series of commits that you want to squash, that's also easy to do by applying each patch for those commits, then staging all the changes from those patches into a new commit.

As patches are applied, particular changes can be staged and commits can be crafted. Use the --edit or --patch flags of git add:

git add --edit <filename>
git add --patch <filename>

This allows selective filtering of particular edits into the next commit, so that one patch (or any number of patches) can be applied, and selective changes can be staged into a commit.

Once you are ready, just run

git commit

without specifying the filename. (If you specify the filename, it will stage all changes, and ignore the crafting you've done.)

As you create a commit or a set of commits specific to changeset D, you can work on a D branch. When you finish all commits related to D, you can start a new branch with

git checkout feature-e

that will start a new branch from where the d-branch left off. Chaining your changes together into several small branches that build on each other will help keep pull requests simpler too.

The advantages of this approach include:

  • Commits can be split by applying the patch and staging particular edits
  • The ability to split single commits into more commits, or combine/squash commits together, means this approach has a lot of flexibility
  • Best for some situations where, e.g., a long series of commits with many small commits that should be squashed and some large commits that should be split

The disadvantages of this approach include:

  • Patches applied out of order can confuse the program applying the patches

cherry-pick and unstage

An alternative to the above workflow is to use git cherry-pick to apply the changes from particular commits, but to leave those changes unstaged using the --no-commit or -n flag:

git cherry-pick --no-commit <commit-hash>
git cherry-pick -n <commit-hash>

Alternatively, a range of commits can be used instead:

git cherry-pick -n <commit-hash-start>..<commit-hash-end>

This can help achieve a similar level of flexibility to the patch approach.

soft reset and commit

Suppose the commit history is simple enough that you can squash all of the commits together into a single diff set, and pick the changes to split into commits D, E, and F.

In that case, the easiest way might be to roll back all of the commits made, but preserve the changes that each commit made. This is precisely what a soft reset will do.

For the git commit history

A - B - C (master)
    \
     D1 - E1 - D2 - F1 - E2 - F2 - D3 - F3 - E3 (feature)

Run the command

git reset --soft B

to move the HEAD pointer to commit B, while also preserving all changes made from the start of the feature branch D1 to the tip of the feature branch E3, all added as staged changes (as though they had been git add-ed).

The changes will be staged, but changes to files can be unstaged using

git restore --staged <filename>

Now add changes selectively using the --edit or --patch flags

git add --edit <filename>
git add --patch <filename>

If desired, those changes can be unstaged, and then re-staged using git add --edit or git add --patch to selectively add changes to particular commits.

When done, run

git commit

with no arguments to commit the changes you made.

Refactoring Large Pull Requests

The approaches above can be useful for refactoring branches. The end result will look something like this:

A - B - C (master)
    \
     D (feature-d)
      \ 
       E (feature-e)
        \
         F (feature-f)

Now 3 pull requests can be made, one for each feature. Thanks to the refactoring above, each branch should be a more isolated set of changes that are all related, and therefore easier to review.

Chaining Pull Requests

The three D E F branches should be merged in together, since they are all related. But their changes should be kept separate to make reviewing each branch easier. To accomplish this, chain the pull requests together like so:

Pull Request 1: merge feature-d into master

Pull Request 2: merge feature-e into feature-d

Pull Request 3: merge feature-f into feature-e

In this way, each pull request only shows the changes specific to that branch.

(If each pull request were made against master, then later branches (F) would also incorporate changes from prior branches (D), resulting in messy and hard-to-review pull requests.)

Pull requests are reviewed and discussed, and new commits will probably be added to fix things or incorporate feedback:

A - B - C (master)
    \
     D - DA - DB (feature-d)
      \ 
       E - EA - EB (feature-e)
        \
         F - FA - FB (feature-f)

Preparing to Merge a Large Pull Request

All of your pull requests are approved and ready to merge. Now what?

Pull requests will need to be merged in reverse order (last PR is merged first - f into e, e into d, d into master). To test that things go smoothly with the first pull request (feature-f into feature-e), we should create a local E-F integration branch.

The local integration branch will have new commits if changes are needed to resolve merge conflicts or fix broken tests. Any changes made can be added to the feature-f branch and pushed to the remote, so that they are part of the pull request, making the merge into feature-e go smoothly.

To create a throwaway E-F integration branch, we start by creating a test integration branch from the tip of the feature-f branch, and we will merge branch feature-e into branch feature-f.

git checkout feature-f

Now we create a local E-F integration branch:

git checkout -b integration-e-f

Now we merge feature-e into integration-e-f, which is the same as feature-f:

git merge --no-ff feature-e

The --no-ff flag creates a separate merge commit, which is useful here to keep our commit history clean.

If merge conflicts are encountered, those can be resolved in the usual manner, and the (conflict-free) new versions of each file, reflecting changes from feature-f and feature-e, will all be present after the merge commit.

Further commits can also be made to make tests pass, with a resulting git diagram:

A - B - C (master)
    \
     D - DA - DB (feature-d)
      \ 
       E - EA - EB ----
        \              \
         F - FA - FB - EF1 - EF2 (integration-e-f)
                              ^
                             HEAD

Once the integration-e-f branch is polished and passing tests, we can re-label it as feature-f and push the new commits to the remote. To re-label integration-e-f as feature-f, assuming we're at the tip of the integration-e-f branch (where we left off above):

git branch -D feature-f
git checkout -b feature-f

and push the new commits to the remote's feature-f branch, before you merge in the pull request (feature-f into feature-e):

git push origin feature-f

Now you are ready to merge pull request 3 (F into E).

Rinse and Repeat

Rinse and repeat for pull requests 2 and 1.

For Pull Request 2, we start by creating a new integration-d-e-f branch from the tip of the integration-e-f branch, like so:

git checkout integration-d-e
git checkout -b integration-d-e-f

and use the same approach of merging in the feature-d branch with an explicit merge commit:

git merge --no-ff feature-d

Work out any merge conflicts that result, and add any additional changes needed to get tests passing, and you should now have a git commit history like this:

A - B - C (master)
    \
     D - DA - DB ----------------
      \                          \
       E - EA - EB ----           \
        \              \           \
         F - FA - FB - EF1 - EF2 - DEF1 - DEF2 (integration-d-e-f)
                                            ^
                                           HEAD

Now re-label the integration-d-e-f branch as feature-e:

git branch -D feature-e && git checkout -b feature-e

Finally, push all new commits to the remote, including the new merge commit, which will make sure the pull request can be merged without any conflicts:

git push origin feature-e

Now PR 2 (E into D) can be merged.

Final Merge into Master

The last and final PR, D into master, will merge all combined feature branches into the master branch. We start with a feature-d branch that has several commits related to feature D, then several commits from merging the feature-e branch in (pull request 2, E into D), and the feature-e branch also had feature-f merged into it.

A - B - C (master)
     \
      D - D2 - DEF1 - DEF2 (feature-d)

Now we will create one more commit on the feature-d branch that is merging master into feature-d, which will help the merge happen smoothly for pull request 1 (D into master).

But first we switch to an integration branch, in case things don't go smoothly and we want to throw away the merge commit:

git branch integration-def-master

Create an explicit merge commit to merge master into integration-def-master:

git merge --no-ff master

Work out any merge conflicts that result, and add any additional changes needed to get tests passing, and you should now have a git commit history like this:

A - B - C (master)
     \   \---------------------
      \                        \
       D - D2 - DEF1 - DEF2 - DEF3 (integration-def-master)

where commit DEF3 is the merge commit created with the --no-ff command.

The merge commit will resolve any conflicts. When you're satisfied with the merge commit, you can switch out the integration-def-master branch with the feature-d branch like so:

git branch -D feature-d
git checkout -b feature-d

Now you can push the merge commit to the remote:

git push origin feature-d

and you're now ready to merge your (conflict-free) pull request!

Tags:    git    rebase    cherry-pick    branching    version control   

March 2022

How to Read Ulysses

July 2020

Applied Gitflow

September 2019

Mocking AWS in Unit Tests

May 2018

Current Projects