Last week I looked at some of the clusters of words that fluctuate together across narrative time in the Lab’s corpus of ~27k American novels. A lot of these are pretty semantically “legible,” in the sense that it’s not hard
I wanted to pick back up quickly with that list of the 500 most “non-uniform” words at the end of the last post about word distributions across narrative time in the American novel corpus. Before, I just put these into
Over the course of the last few months here at the Literary Lab, I’ve been working on a little project that looks at the distributions of individual words inside of novels, when averaged out across lots and lots of texts.
Not for the first time, I find myself wanting to know how big the field of the novel is. Granted, finding the precise number of novels published in English is impossible. And even if we had an exact figure, the