Plotting Your Favorite Novels Visually

Keywords that most frequently appear in Bram Stoker's 'Dracula'
Keywords that most frequently appear in Bram Stoker’s ‘Dracula’ (all images courtesy Matthew Jockers)

In the 1967 lecture “Cybernetics and Ghosts,” Italian author and critic Italo Calvino asks us to imagine a machine that composes literature. Would the authorship of the resulting works cheapen their value? Calvino, who follows the structuralist tradition in maintaining that literature is just the combination and recombination of a set of stock elements, thinks not. Others beg to differ.

The literature machine may still be a few years away, but the literary criticism machine is already here, at least in one iteration: Matthew Jockers, a professor at the University of Nebraska, is at work compiling data from novels in an attempt to determine whether there’s a set of common plots. This month, he released his preliminary findings in a blog post on his website.

Jockers’s work suggests that there are six or seven recurring plots, although when he spoke to Hyperallergic he was careful to clarify that we should be cautious in drawing broader conclusions. “The six I find come from the application of a particular methodology, and scholars are going to be free to argue whether that methodology was a good one or not,” he said. It’s only according to his “particular yardstick” that there are six core novelistic structures, he warns.

Courtesy of Jockers.
A graph charting the plot of James Joyce’s ‘A Portrait of the Artist as a Young Man’

Still, it’s an interesting yardstick. Borrowing terms from the Russian formalist critic Vladimir Propp, Jockers focused on syuzhet, which his website describes as “the organization of the narrative,” rather than fabula, which he characterizes as the “raw elements of the story.” In other words, Jockers’s findings measure the various ways in which information is arranged and relayed to readers, rather than the chronology of the fictional events themselves. If a book has a flashback in the first chapter, Jockers’s study regards the flashback as coming first.

Jockers said that he chose to focus on syuzhet for several reasons. For one thing, he said, “you can’t extract fabula computationally: how do you algorithmically figure out that the author has now taken you 100 years into the past?” More importantly, however, he’s interested in “working chronologically through the pages, because that’s what the reader is presented.” His project is largely concerned with readers’ reactions and how our emotional responses to texts track their success and appeal: the endeavor was conceived in part because Jockers was initially interested in affect theory.

To analyze the syuzhet, Jockers looked at positive and negative words occurring throughout a large sampling of texts. He used the valence of these words to map the “shape” of the book, noting emotional fluctuations that occur between beginning and end.

A graph plotting the narrative of Dan Brown's 'The Da Vinci Code'
A graph plotting the narrative of Dan Brown’s ‘The Da Vinci Code’ (click to enlarge)

Though the professor didn’t want to scoop himself, he promised that he’d release more in-depth information about the six fundamental plots soon. He’s working on a book about what differentiates bestsellers from less popular books, and he says his work will be instrumental in helping predict and characterize at least some versions of literary success.

While more traditional readers may balk at analyzing qualitative data so quantitatively, Jockers’s study is one tool among many that we can use to assess literary “merit.” Better-selling books are not necessarily better books, but taking stock of their commonalities can help us understand why they’re popular. As long as algorithmic approaches to literature acknowledge their own very serious limitations, there’s no harm in including them in our critical toolboxes. At the very least, they raise interesting questions about the relationship between qualitative and quantitative approaches to literature, between cybernetics and ghosts.

comments (0)