Words Words Words

Tagging word etymology with Python and HTML .

 

French Old Fr. Greek Latin Sanskrit
Norse Old Norse German Germanic
English American Eng. Old Eng. Welsh Irish Dutch
Old Dutch Old Saxon Old Frisian Russian Arabic Spanish Italian Slavonic Polish Turkish

What does it do?

Words Words Words is a project using Python to color-code words in text according to their etymological roots, and render the result as HTML.

The output of the scripts is available in the book list on the right. The repository contains the scripts used to mark up the text, look words up in the Online Etymology Dictionary and parse the results to tag words with their root language, and create the final HTML.

The tool is entirely implemented in Python.

What books are available?

All of the books that have been tagged are listed on the right. More books are on the way...

How does it work?

Words Words Words uses a couple of Python libraries to do its primary tasks: parse text, look up words on a web page, extract and process the result, and convert the original text into HTML, color-coding each word in the process with its etymological root language.

  • To parse the text and extract unique words, I'm using the Natural Language Toolkit.
  • To scrape the web, I'm using Mechanize.
  • To obtain etymological root languages for words, I'm using the Online Etymology Dictionary.
  • To process the resulting HTML, I'm using Beautiful Soup.
  • To deal with all the data resulting from these tasks, I'm using Pandas.
  • To tag each word, I'm just using Python's built-in list and string types.
  • To pull all of the tagged HTML, CSS stylesheets, and JS together, I'm using Pelican (my preferred Python alternative to Ruby's Jekyll)
  • To generate color palettes for various languages, I used The Color Brewer

Who wrote Words Words Words?

Charles Reid wrote Words Words Words.

Visit charlesreid1.com

or check out @charlesreid1 on Github

Check out the code

The code for Words Words Words is on Github:

Words Words Words on Github

Tagged Books

Dubliners, by James Joyce

See this book's etymology

Ulysses, by James Joyce

See this book's etymology

Frankenstein, by Mary Shelley

See this book's etymology

Second Treatise on Government, by John Locke

See this book's etymology

Crime and Punishment, by Fyodor Dostoyevsky (Constance Garnett translation)

See this book's etymology

Roughing It, by Mark Twain

See this book's etymology

Variable Man, by Philip K. Dick

See this book's etymology