Category: Centillion

centillion: a document search engine

Posted in Centillion


We're excited to announce the public release of centillion, a document search engine.

centillion is a search tool that can be used by any individual or organization to index Github repositories (including the content of markdown files), Google Drive folders (including the content of .docx files), and Disqus comment threads.

centillion is tested using Travis CI.

centillion was originally written for the NIH Data Commons effort (which recently concluded). centillion was built to facilitate information-finding in a project with hundreds of people at dozens of institutions generating a sea of email threads, Google Drive folders, markdown files, websites, and Github …

Tags:    python    centillion    search    search engine    google drive    github    flask   

First Post of the Fall, Part 1: Data Commons

Posted in Centillion


Background: a bit about the Data Commons

It has been a productive but busy summer at the Lab for Data Intensive Biology.

As part of my job, I am supporting a lot of websites and infrastructure for the Data Commons Pilot Phase Consortium (DCPPC), which wrapped up Phase 1 this month.

The Data Commons is a large-scale effort to establish a community-driven set of standards for interoperability for biological data and computation, a massive effort and a broad mandate that has the potential to enable breakthrough research that is currently impossible because data and computations cannot inter-operate between the data …

Tags:    DCPPC    Data Commons    Github    Community    Science    Centillion