Skip to main content

Between Canon and Corpus: Six Perspectives on 20th Century Novels

About the project

Status: archive

Project team: Mark Algee-Hewitt, Mark McGurl;

End date: Jan 1, 2015

  • Between Canon and Corpus: Six Perspectives on 20th-Century Novels

This project emerged from a collection-building effort to create a digitized corpus of 20th century fiction; the lack of such a corpus had impeded efforts to do the kind of computational work as had been done with 19th and 18th century fiction. The scale of publication in the 20th century complicated the matter further: over the last 40 years, the number of books published annually in English rose from around 8,000 to nearly 279,000. Achieving statistical representativeness would be difficult, and due to the cost involved, the sample would need to be around 350 texts, similar to other "clean" corpora used by the Lab. Furthermore, there was no comprehensive list one could even sample randomly from; modern publishers tend to only have data available starting in the late 60's.

For this study, the team agreed to a selection bias leaning towards canonicity -- although this too only fostered more debate. The project explored the Modern Library 100 Best Novels of the 20th Century collection, shaped by two sets of questionable factors: the exclusivity of the Board and the openness of the public forum that led to a "Readers' List" that included multiple Ayn Rand and L. Ron Hubbard books, with many more works of genre fiction, alongside several literary classics. These two lists together included only a total of 169 unique works.

Expanding to other lists, the "Radcliffe's Rival 100 Best Novels List" extends the collection towards "children's classic" literature such as Charlotte's Web and Winnie-the-Pooh. Larry McCaffrey's List of the 100 Best Novels of the 20th Century and yearly best-selling works of the 20th Century rounded out the list of lists.

The project then examined the overlap of the all lists consulted, finding that Publishers Weekly diverged the most, sharing only 8 titles with the other lists.

The project looked at the gender and ethnic distribution of the books across these lists, finding around 15% by female authors, and 5% by non-white authors. To address this, the project contacted the editorial board of the journal MELUS (Multi-Ethnic Literature of the United States), the members of the Postcolonial Studies Association, and the editorial board of the Feminist Press, asking for 40 works to be included in the corpus. In this new network, The Grapes of Wrath was the only book shared by all six sources, and these additions shifted the cumulative corpus stats to 17% female and 10% non-white.