charlesreid1.com blog

First Post of the Fall, Part 2: Flaskadillo

Posted in Python

permalink

Flask + ILLO = Flaskadillo

On October 15, 2018, I had the opportunity to offer an in-lab learning opportunity (ILLO) at the Lab for Data Intensive Biology. The ILLO focused on Flask, a useful Python library for creating and running web servers. This library is useful because it has a very low learning curve, but also has the complexity to handle complicated, real-world projects.

As a part of this in-lab learning opportunity, I created repository with five simple Flask examples to highlight five useful capabilities of Flask.

The repository is called flaskadillo and it is available on git.charlesreid1.com or on github.com.

The five capabilities covered by the examples in flaskadillo are listed below:

  1. hello - hello world flask server

  2. api - a simple API server

  3. jinja - a simple Flask server that makes use of Jinja templates

  4. package - a simple demonstration of how to package flask apps

  5. tests - a simple demonstration of how to write Flask tests

Example 1: Hello World

We'll just cover example 1 here, but similar materials are available for all five examples.

Example 1 consists of a simple flask app, simple.py:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

The hello directory of the flaskadillo repo covers how to install the necessary packages and run the Flask application.

There is also a unit test, test_simple.py, which demonstrates how to write tests for Flask applications. To run the unit test, run:

pytest

More Information

For instructions on each of the 5 examples, visit each of the 5 directories in the flaskadillo repository.

Why flaskadillo?

Because armadillo.

Why armadillo?

The word armadillo means "little armoured one" in Spanish.

Armadillos are related to anteaters and sloths (all are in the Xenartha superorder).

The Aztecs called them turtle-rabbits.

Tags:    Github    Software    Python    Flask   

First Post of the Fall, Part 1: Data Commons

Posted in Centillion

permalink

Background: a bit about the Data Commons

It has been a productive but busy summer at the Lab for Data Intensive Biology.

As part of my job, I am supporting a lot of websites and infrastructure for the Data Commons Pilot Phase Consortium (DCPPC), which wrapped up Phase 1 this month.

The Data Commons is a large-scale effort to establish a community-driven set of standards for interoperability for biological data and computation, a massive effort and a broad mandate that has the potential to enable breakthrough research that is currently impossible because data and computations cannot inter-operate between the data, compute resources, and domain expertise that are provided by universities, hospitals, research institutes, companies, nonprofits, and citizen scientists.

Informationally challenged: Data Commons growing pains

An important part of defining a community-driven set of standards is defining a community, and toward that end the collective members of the Data Commons met at monthly face-to-face workshops to iterate tightly on a set of technologies and standards that will allow each institution's different compute platforms or data banks to use other institutions' platforms or data banks. Doing this requirs fostering community and creating the right environment for people to work through the issues.

One of the biggest challenges we faced in fostering a community that could develop and implement a set of standards across such a large and diverse group of experts and institutes was coordinating information. Specifically, making sure that decisions were properly communicated to the appropriate parties, that important documents made their way to the entire consortium, and that documents that were created and edited also be findable and sharable.

This problem began, back in April, as a very small trash fire. People were getting used to the Github workflow and did not know how to find the appropriate repository for the information they needed to contribute, and consortium members were universally annoyed that Google Drive's search functionality was so terrible.

In June we rolled out a trial document-tagging system to the consortium, to deafening silence - no one was impressed or satisfied with the tagging system. The real problem was with search.

Toward that end, I implemented a full-fledged search engine for the Data Commons that utilized various third-party APIs (Github, Google Drive, Groups.io, etc.) to index content related to the project, and make it full-text-searchable.

The result was centillion, the Data Commons search engine. This search engine provides a portal to search for Data Commons-related Google Drive documents, Github issues, Github pull requests, Github files, Groups.io email threads, and more.

Our story picks up with centillion.

Presenting centillion, the Data Commons search engine

One of the tools I have made heavy use of in support of web infrastructure for the DCPPC project is Flask, a Python library for running a web server. Flask is a very powerful library, but it starts with a relatively simple premise: Flask lets you create a web appplication that will bind to a particular port, and you can then add "routes" that are endpoints a user can visit, like /hello/world, and link those routes to Python functions.

On Monday 2018-10-28 the DIB Lab's weekly lab meeting featured yours truly covering the topic of centillion, the Data Commons search engine.

centillion makes use of the Python library whoosh under the hood, to provide search functionality, while the web front-end uses Flask to connect Python functions to a website that users can interact with.

Screen shot of the centillion search engine (2018-10-27).

Screen shot of the centillion search engine (2018-10-27).

centillion architecture: the short version

As of version 1.7, centillion is packaged as a Python package. The centillion package consists of two submodules, corresponding to the Flask frontend and Whoosh backend, respectively: webapp and search.

webapp submodule

centillion.webapp implements the Flask app and defines all routes. When the user runs a search, it passes the query string on to a Search object from the search submodule. The webapp submodule does not know anything about the details of the search engine or search index.

This submodule is located in src/webapp/ in the centillion repo.

search submodule

centillion.search implements a search engine using Whoosh, a programming library for building search engines. Whoosh does not implement any kind of front end, so its role is restricted entirely to the back end.

The search submodule also handles interfacing with the Github, Google, and Groups.io APIs and translating the results of API calls from these services into documents whose contents can be extracted and indexed by Whoosh.

This submodule is located in src/search/ in the centillion repo.

Tags:    DCPPC    Data Commons    Github    Community    Science    Centillion   

Current Projects

Posted in General

permalink

A list of various ongoing projects:

The Git College of Surgery:

Python + APIs:

  • building an API that calls APIs so you can API while you API (a webhook that calls a hook - see captain hook)
  • testing APIs with Python + requests (currently top secret, coming soon.)

Python + Command line:

  • command line utilities with python
  • testing command line utilities with python

More stuff:

  • magic flying camel is a seed repository for getting started with a simple Jekyll page on Github Pages

  • magic flying pelican is a seed repository for getting started with a simple Pelican blog on Github Pages

The rise of the mind machines:

Each software package in the mind machine suite follows (or will follow) the prime number version system:

PyPi and Dockerhub:

  • Rainbow mind machine software packages are requiring a more streamlined deployment process
  • Makefiles are in progress

how do i pandoc

how do i pelican - a crash course in building a pelican blog

mkdocs search demo a quick pop-up site demonstrating how to use the built-in search functionality of mkdocs-material and lunr.js to index a pile of markdown files containing interesting links.

captain hook - we have already mentioned captain hook several times, but this is the magic that makes pages.charlesreid1.com possible.

Tags:    Git    Github    Software    Python   

March 2022

How to Read Ulysses

July 2020

Applied Gitflow

September 2019

Mocking AWS in Unit Tests

May 2018

Current Projects