Building Snakemake Command Line Wrappers for Workflows

Posted in Snakemake

permalink

NOTE: These ideas are implemented in the repository charlesreid1/2019-snakemake-cli.

Basic Idea: Wrapping Snakemake API Calls

2018-snakemake-cli

This blog post covers the implementation of an idea that was originally explored in a blog post from Titus Brown, Pydoit, snakemake, and workflows-as-applications.

That blog post implemented a basic command line wrapper around the Snakemake API to demonstrate how a Snakemake workflow could be turned into an executable.

Relevant code is in ctb/2018-snakemake-cli, but the basic idea is to implement a command line utility that takes two orthogonal sets of inputs: a workflow configuration file, and a parameter set.

./run <workflow-config> <workflow-params>

The run script is a Python executable file that parses arguments from the user.

Here is the main entrypoint of run:

#! /usr/bin/env python
"""
Execution script for snakemake workflows.
"""
import argparse
import os.path
import snakemake
import sys
import pprint
import json

thisdir = os.path.abspath(os.path.dirname(__file__))

def main(args):
    # 
    # ...see below...
    #

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='run snakemake workflows', usage='''run <workflow> <parameters> [<target>]
Run snakemake workflows, using the given workflow name & parameters file.
''')

    parser.add_argument('workflowfile')
    parser.add_argument('paramsfile')
    parser.add_argument('-n', '--dry-run', action='store_true')
    args = parser.parse_args()

    sys.exit(main(args))

The main() method uses the os module to look for the Snakefile, the config file, and the params file, then makes a call to the Snakemake API:

def main(args):
    #
    # ...find the snakefile...
    # ...find the config file...
    # ...find the params file...
    # 

    target = workflow_info['workflow_target']
    config = dict()

    print('--------')
    print('details!')
    print('\tsnakefile: {}'.format(snakefile))
    print('\tconfig: {}'.format(workflowfile))
    print('\tparams: {}'.format(paramsfile))
    print('\ttarget: {}'.format(target))
    print('--------')

    # run!!
    status = snakemake.snakemake(snakefile, 
                                 configfile=paramsfile,
                                 targets=[target], 
                                 printshellcmds=True,
                                 dryrun=args.dry_run, 
                                 config=config)

    if status: # translate "success" into shell exit code of 0
       return 0
    return 1

This call uses the provided parameters file to set the Snakemake configuration dictionary, but this can be overridden with the config dictionary. Additional argparser flags can be added, and the config dictionary contents modified based on the flags.

2019-snakemake-cli

We wanted to take this demo a step further, and add a few things to it:

  • Bundle the Snakefile and command line utility as an installable Python package with a setup.py

  • Implement Travis CI tests of the Snakemake workflow.

We implemented a bundled Snakemake workflow as a command line tool called bananas.

Turning Executables into Packages

We began with an executable script run and wished to turn it into an installable command line utility called bananas.

To do this, we moved the contents of run into a new file command.py in a new Python module called cli:

cli/
├── Snakefile
├── __init__.py
└── command.py

The Snakefile will contain the workflow. Here is the very simple workflow from ctb/2018-snakemake-cli. The named rules are specified by the workflow configuration file, while the parameters in {} are provided through the parameters file (or via command line flags).

cli/Snakefile:

name = config['name']

rule rulename1:
     input:
        "hello.txt"

rule target1:
     output:
        "hello.txt"
     shell:
        "echo hello {name} > {output}"

rule target2:
     output:
        "goodbye.txt"
     shell:
        "echo goodbye {name} > {output}"

NOTE: In this case we are bundling the Snakefile with the command line wrapper, and writing the command line wrapper to expect the Snakefile to be in the package. But we can modify the command line wrapper function (below) to look for the Snakefile in a local directory, allowing the user to provide Snakefiles and workflows to the command line wrapper.

The __init__.py file sets two important parameters: the name of the command line utility, and the version number:

cli/__init__.py:

_program = "bananas"
__version__ = "0.1.0"

The contents of command.py are similar to run and basically control how the command line utility runs:

cli/command.py:

"""
Command line interface driver for snakemake workflows
"""
import argparse
import os.path
import snakemake
import sys
import pprint
import json

from . import _program


thisdir = os.path.abspath(os.path.dirname(__file__))
parentdir = os.path.join(thisdir,'..')
cwd = os.getcwd()

def main(sysargs = sys.argv[1:]):

    parser = argparse.ArgumentParser(prog = _program, description='bananas: run snakemake workflows', usage='''bananas <workflow> <parameters> [<target>]

bananas: run snakemake workflows, using the given workflow name & parameters file.

''')

    parser.add_argument('workflowfile')
    parser.add_argument('paramsfile')
    parser.add_argument('-n', '--dry-run', action='store_true')
    parser.add_argument('-f', '--force', action='store_true')
    args = parser.parse_args(sysargs)

    # ...find the Snakefile...
    # ...find the config file...
    # ...find the params file...

    target = workflow_info['workflow_target']
    config = dict()

    print('--------')
    print('details!')
    print('\tsnakefile: {}'.format(snakefile))
    print('\tconfig: {}'.format(workflowfile))
    print('\tparams: {}'.format(paramsfile))
    print('\ttarget: {}'.format(target))
    print('--------')

    # run bananas!!
    status = snakemake.snakemake(snakefile, configfile=paramsfile,
                                 targets=[target], printshellcmds=True,
                                 dryrun=args.dry_run, forceall=args.force,
                                 config=config)

    if status: # translate "success" into shell exit code of 0
       return 0
    return 1


if __name__ == '__main__':
    main()

The last component here is to make the function in cli/command.py the entrypoint of a command line utility called bananas, which can be done via setup.py. This will put the executable bananas in the Python binaries folder when the package is installed.

setup.py:

from setuptools import setup, find_packages
import glob
import os

with open('requirements.txt') as f:
    required = [x for x in f.read().splitlines() if not x.startswith("#")]

# Note: the _program variable is set in __init__.py.
# it determines the name of the package/final command line tool.
from cli import __version__, _program

setup(name='bananas',
      version=__version__,
      packages=['cli'],
      test_suite='pytest.collector',
      tests_require=['pytest'],
      description='bananas command line interface',
      url='https://charlesreid1.github.io/2019-snakemake-cli',
      author='@charlesreid1',
      author_email='cmreid@ucdavis.edu',
      license='MIT',
      entry_points="""
      [console_scripts]
      {program} = cli.command:main
      """.format(program = _program),
      install_requires=required,
      include_package_data=True,
      keywords=[],
      zip_safe=False)

First, we grab the variables from __init__.py:

from cli import __version__, _program

Next we specify where our package lives, the cli directory:

setup(name='bananas',
        ...
        packages=['cli'],

and finally, we specify that we want to build a command line interface, with the entrypoint being the main() method of the cli/command.py file using entry_points:

setup(name='bananas',
        ...
        entry_points="""
[console_scripts]
{program} = cli.command:main
      """.format(program = _program),

End Result: Using bananas

The end result is a command line utility that bundles a Snakemake workflow. The repository contains some tests, so let's run through the quick start installation and run the tests.

Quick Start: Installing

Start by setting up a virtual environment:

virtualenv vp
source vp/bin/activate

Install required components, then install the package:

pip install -r requirements.txt
python setup.py build install

Now you should see bananas on your path:

which bananas

Quick Start: Running Tests

pytest

Quick Start: Running Examples

Change to the test/ directory and run tests with the example config and param files.

cd test

Run the hello workflow with Amy params:

rm -f hello.txt
bananas workflow-hello params-amy

Run the hello workflow with Beth params:

rm -f hello.txt
bananas workflow-hello params-beth

Run the goodbye workflow with Beth params:

rm -f goodbye.txt
bananas workflow-goodbye params-beth

Adding Travis CI Tests

To test or workflow, we break down the necessary tasks:

  • Use a Python environment
  • Install our requirements (snakemake)
  • Install bananas with setup.py
  • Run pytest

This is an easy Travis file to write, following the Travis docs.

.travis.yml:

language: python
python:
  - "3.5"
  - "3.6"
  #- "3.7-dev" # fails due to datrie build failure (snakemake dependency)

# command to install dependencies
install:
  - pip install -r requirements.txt
  - python setup.py build install

# command to run tests
script:
  - pytest

Final Repository

All of the code for this repository is in charlesreid1/2019-snakemake-cli.

See the v2.0 tag in case there are changes to the code that are not reflected in this blog post.

Next Steps

This demo provides a starting point for creating executable Snakemake workflows that are installable.

A few open question and directions:

  • Bundling the Snakefile vs. user-provided Snakefles

    • There is obviously more utility and flexibility in letting the user provide Snakefiles.
    • User-provided Snakefiles provide more ways for workflows to go wrong.
    • Testing is either more difficult, or shifted to the workflow author.
    • Bundled Snakefiles take the burden of writing the workflow off of the user, so they can focus on param/config files.
  • Kubernetes

  • Applications

Tags:    python    bioinformatics    workflows    pipelines    snakemake    travis