First Post of the Fall, Part 2: Flaskadillo

Posted Tuesday 10/30/2018 in Python

Flask + ILLO = Flaskadillo

On October 15, 2018, I had the opportunity to offer an in-lab learning opportunity (ILLO) at the Lab for Data Intensive Biology. The ILLO focused on Flask, a useful Python library for creating and running web servers. This library is useful because it has a very low learning curve, but also has the complexity to handle complicated, real-world projects.

As a part of this in-lab learning opportunity, I created repository with five simple Flask examples to highlight five useful capabilities of Flask.

The repository is called flaskadillo and it is available on git.charlesreid1.com or on github.com.

The five capabilities covered by the examples in flaskadillo are listed below:

hello - hello world flask server
api - a simple API server
jinja - a simple Flask server that makes use of Jinja templates
package - a simple demonstration of how to package flask apps
tests - a simple demonstration of how to write Flask tests

Example 1: Hello World

We'll just cover example 1 here, but similar materials are available for all five examples.

Example 1 consists of a simple flask app, simple.py:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

The hello directory of the flaskadillo repo covers how to install the necessary packages and run the Flask application.

There is also a unit test, test_simple.py, which demonstrates how to write tests for Flask applications. To run the unit test, run:

pytest

More Information

For instructions on each of the 5 examples, visit each of the 5 directories in the flaskadillo repository.

Why flaskadillo?

Because armadillo.

Why armadillo?

The word armadillo means "little armoured one" in Spanish.

Armadillos are related to anteaters and sloths (all are in the Xenartha superorder).

The Aztecs called them turtle-rabbits.

Tags: Github Software Python Flask

First Post of the Fall, Part 1: Data Commons

Posted Saturday 10/27/2018 in Centillion

permalink

Background: a bit about the Data Commons

It has been a productive but busy summer at the Lab for Data Intensive Biology.

As part of my job, I am supporting a lot of websites and infrastructure for the Data Commons Pilot Phase Consortium (DCPPC), which wrapped up Phase 1 this month.

The Data Commons is a large-scale effort to establish a community-driven set of standards for interoperability for biological data and computation, a massive effort and a broad mandate that has the potential to enable breakthrough research that is currently impossible because data and computations cannot inter-operate between the data, compute resources, and domain expertise that are provided by universities, hospitals, research institutes, companies, nonprofits, and citizen scientists.

Informationally challenged: Data Commons growing pains

An important part of defining a community-driven set of standards is defining a community, and toward that end the collective members of the Data Commons met at monthly face-to-face workshops to iterate tightly on a set of technologies and standards that will allow each institution's different compute platforms or data banks to use other institutions' platforms or data banks. Doing this requirs fostering community and creating the right environment for people to work through the issues.

One of the biggest challenges we faced in fostering a community that could develop and implement a set of standards across such a large and diverse group of experts and institutes was coordinating information. Specifically, making sure that decisions were properly communicated to the appropriate parties, that important documents made their way to the entire consortium, and that documents that were created and edited also be findable and sharable.

This problem began, back in April, as a very small trash fire. People were getting used to the Github workflow and did not know how to find the appropriate repository for the information they needed to contribute, and consortium members were universally annoyed that Google Drive's search functionality was so terrible.

In June we rolled out a trial document-tagging system to the consortium, to deafening silence - no one was impressed or satisfied with the tagging system. The real problem was with search.

Toward that end, I implemented a full-fledged search engine for the Data Commons that utilized various third-party APIs (Github, Google Drive, Groups.io, etc.) to index content related to the project, and make it full-text-searchable.

The result was centillion, the Data Commons search engine. This search engine provides a portal to search for Data Commons-related Google Drive documents, Github issues, Github pull requests, Github files, Groups.io email threads, and more.

Our story picks up with centillion.

Presenting centillion, the Data Commons search engine

One of the tools I have made heavy use of in support of web infrastructure for the DCPPC project is Flask, a Python library for running a web server. Flask is a very powerful library, but it starts with a relatively simple premise: Flask lets you create a web appplication that will bind to a particular port, and you can then add "routes" that are endpoints a user can visit, like /hello/world, and link those routes to Python functions.

On Monday 2018-10-28 the DIB Lab's weekly lab meeting featured yours truly covering the topic of centillion, the Data Commons search engine.

centillion makes use of the Python library whoosh under the hood, to provide search functionality, while the web front-end uses Flask to connect Python functions to a website that users can interact with.

Screen shot of the centillion search engine (2018-10-27).

Screen shot of the centillion search engine (2018-10-27).

centillion architecture: the short version

As of version 1.7, centillion is packaged as a Python package. The centillion package consists of two submodules, corresponding to the Flask frontend and Whoosh backend, respectively: webapp and search.

webapp submodule

centillion.webapp implements the Flask app and defines all routes. When the user runs a search, it passes the query string on to a Search object from the search submodule. The webapp submodule does not know anything about the details of the search engine or search index.

This submodule is located in src/webapp/ in the centillion repo.

search submodule

centillion.search implements a search engine using Whoosh, a programming library for building search engines. Whoosh does not implement any kind of front end, so its role is restricted entirely to the back end.

The search submodule also handles interfacing with the Github, Google, and Groups.io APIs and translating the results of API calls from these services into documents whose contents can be extracted and indexed by Whoosh.

This submodule is located in src/search/ in the centillion repo.

Tags: DCPPC Data Commons Github Community Science Centillion

Current Projects

Posted Saturday 05/12/2018 in General

permalink

A list of various ongoing projects:

pandocs and panflute - how do i pandoc
captain hook

The Git College of Surgery:

git college of surgery
The first successful git-commit-ectomy took place on Friday, June 1, 2018. See https://pages.charlesreid1.com/git-commit-ectomy
Finishing this requires a better way to visualize git commits
To do that, we have developed git-subway-maps

Python + APIs:

building an API that calls APIs so you can API while you API (a webhook that calls a hook - see captain hook)
testing APIs with Python + requests (currently top secret, coming soon.)

Python + Command line:

command line utilities with python
testing command line utilities with python

More stuff:

magic flying camel is a seed repository for getting started with a simple Jekyll page on Github Pages
magic flying pelican is a seed repository for getting started with a simple Pelican blog on Github Pages

The rise of the mind machines:

boring-mind-machine - contains base classes used by all the mind machines
rainbow-mind-machine - for running Twitter bot flocks
embarcadero-mind-machine - for running Github bot flocks
cheeseburger-mind-machine - for running Google Drive bot flocks
The rainbow mind machine organization - for containing all of this craziness

Each software package in the mind machine suite follows (or will follow) the prime number version system:

More info on the Prime Number Version System
This is another bit of documentation that was being blocked by the ability to visualize git repositories
git-subway-maps should help with this.

PyPi and Dockerhub:

Rainbow mind machine software packages are requiring a more streamlined deployment process
Makefiles are in progress

how do i pandoc

Currently working on implementing several Pandoc/Panflute filters
Also see https://github.com/charlesreid1/translate-yer-docs for a practical usage of pandoc filters: translating documentation

how do i pelican - a crash course in building a pelican blog

mkdocs search demo a quick pop-up site demonstrating how to use the built-in search functionality of mkdocs-material and lunr.js to index a pile of markdown files containing interesting links.

captain hook - we have already mentioned captain hook several times, but this is the magic that makes pages.charlesreid1.com possible.

Tags: Git Github Software Python

← Newer Older →

charlesreid1.com blog

Flask + ILLO = Flaskadillo

Example 1: Hello World

More Information

Why flaskadillo?

Why armadillo?

Background: a bit about the Data Commons

Informationally challenged: Data Commons growing pains

Presenting centillion, the Data Commons search engine

centillion architecture: the short version

webapp submodule

search submodule

March 2022

February 2022

January 2022

December 2021

August 2020

July 2020

April 2020

March 2020

February 2020

December 2019

November 2019

October 2019

September 2019

May 2019

April 2019

March 2019

February 2019

January 2019

December 2018

October 2018

May 2018

March 2018

February 2018

January 2018

September 2017

August 2017

July 2017

June 2017

May 2017

April 2017

March 2017