Shortly after the Lab released my recent pamphlet on the structure of the literary canon, New York magazine ran an article about the 21st century canon, in which a panel of judges pick an early version of the literary canon
One of the goals of the Techne blog as a whole is to highlight technical issues in Digital Humanities—the kinds of in-the-weeds ideas that are interesting to specialists but don’t necessarily make the cut of a final paper. It’s easy
Last week I looked at some of the clusters of words that fluctuate together across narrative time in the Lab’s corpus of ~27k American novels. A lot of these are pretty semantically “legible,” in the sense that it’s not hard
I wanted to pick back up quickly with that list of the 500 most “non-uniform” words at the end of the last post about word distributions across narrative time in the American novel corpus. Before, I just put these into
Over the course of the last few months here at the Literary Lab, I’ve been working on a little project that looks at the distributions of individual words inside of novels, when averaged out across lots and lots of texts.
Not for the first time, I find myself wanting to know how big the field of the novel is. Granted, finding the precise number of novels published in English is impossible. And even if we had an exact figure, the
This was my sophomore summer with the Literary Lab. I started the summer ready to capitalize on my veteran knowledge and pick up where I left off. I did just that when I spent the the first weeks of summer
I first became familiar with the Literary Lab when I took a class on literary text mining in R with Mark Algee-Hewitt last winter. From discussing the philosophies behind the digital humanities to constructing cluster dendrograms (plus lots of other
On my first day of work, I looked up the term “operationalize” in the dictionary. A mixture of curiosity and sheer pragmatism led me to do this; after all, the project I was about to embark on aimed to “operationalize
In recent months we’ve been working on a couple of projects here in the Lab that are making use of the Extracted Features data from HathiTrust. To help kick off the lab’s new Techne series, I wanted to take a look at some of the programming patterns we’ve been using that make it easier to work these kinds of large data sets – namely the “Message Passing Interface” (MPI), a set of semantics for spreading out programs in large computing grids.