Counting words in HathiTrust with Python and MPI

In recent months we’ve been working on a couple of projects here in the Lab that are making use of the Extracted Features data¬†from HathiTrust. To help kick off the lab’s new Techne series, I wanted to take a look at some of the programming patterns we’ve been using that make it easier to work these kinds of large data sets – namely the “Message Passing Interface” (MPI), a set of semantics for spreading out programs in large computing grids.