Last week I looked at some of the clusters of words that fluctuate together across narrative time in the Lab’s corpus of ~27k American novels. A lot of these are pretty semantically “legible,” in the sense that it’s not hard
I wanted to pick back up quickly with that list of the 500 most “non-uniform” words at the end of the last post about word distributions across narrative time in the American novel corpus. Before, I just put these into
Over the course of the last few months here at the Literary Lab, I’ve been working on a little project that looks at the distributions of individual words inside of novels, when averaged out across lots and lots of texts.
In recent months we’ve been working on a couple of projects here in the Lab that are making use of the Extracted Features data from HathiTrust. To help kick off the lab’s new Techne series, I wanted to take a look at some of the programming patterns we’ve been using that make it easier to work these kinds of large data sets – namely the “Message Passing Interface” (MPI), a set of semantics for spreading out programs in large computing grids.