Keywords that most frequently appear in Bram Stoker's 'Dracula'

Keywords that most frequently appear in Bram Stoker’s ‘Dracula’ (all images courtesy Matthew Jockers)

In the 1967 lecture “Cybernetics and Ghosts,” Italian author and critic Italo Calvino asks us to imagine a machine that composes literature. Would the authorship of the resulting works cheapen their value? Calvino, who follows the structuralist tradition in maintaining that literature is just the combination and recombination of a set of stock elements, thinks not. Others beg to differ.

The literature machine may still be a few years away, but the literary criticism machine is already here, at least in one iteration: Matthew Jockers, a professor at the University of Nebraska, is at work compiling data from novels in an attempt to determine whether there’s a set of common plots. This month, he released his preliminary findings in a blog post on his website.

Jockers’s work suggests that there are six or seven recurring plots, although when he spoke to Hyperallergic he was careful to clarify that we should be cautious in drawing broader conclusions. “The six I find come from the application of a particular methodology, and scholars are going to be free to argue whether that methodology was a good one or not,” he said. It’s only according to his “particular yardstick” that there are six core novelistic structures, he warns.

Courtesy of Jockers.

A graph charting the plot of James Joyce’s ‘A Portrait of the Artist as a Young Man’

Still, it’s an interesting yardstick. Borrowing terms from the Russian formalist critic Vladimir Propp, Jockers focused on syuzhet, which his website describes as “the organization of the narrative,” rather than fabula, which he characterizes as the “raw elements of the story.” In other words, Jockers’s findings measure the various ways in which information is arranged and relayed to readers, rather than the chronology of the fictional events themselves. If a book has a flashback in the first chapter, Jockers’s study regards the flashback as coming first.

Jockers said that he chose to focus on syuzhet for several reasons. For one thing, he said, “you can’t extract fabula computationally: how do you algorithmically figure out that the author has now taken you 100 years into the past?” More importantly, however, he’s interested in “working chronologically through the pages, because that’s what the reader is presented.” His project is largely concerned with readers’ reactions and how our emotional responses to texts track their success and appeal: the endeavor was conceived in part because Jockers was initially interested in affect theory.

To analyze the syuzhet, Jockers looked at positive and negative words occurring throughout a large sampling of texts. He used the valence of these words to map the “shape” of the book, noting emotional fluctuations that occur between beginning and end.

A graph plotting the narrative of Dan Brown's 'The Da Vinci Code'

A graph plotting the narrative of Dan Brown’s ‘The Da Vinci Code’ (click to enlarge)

Though the professor didn’t want to scoop himself, he promised that he’d release more in-depth information about the six fundamental plots soon. He’s working on a book about what differentiates bestsellers from less popular books, and he says his work will be instrumental in helping predict and characterize at least some versions of literary success.

While more traditional readers may balk at analyzing qualitative data so quantitatively, Jockers’s study is one tool among many that we can use to assess literary “merit.” Better-selling books are not necessarily better books, but taking stock of their commonalities can help us understand why they’re popular. As long as algorithmic approaches to literature acknowledge their own very serious limitations, there’s no harm in including them in our critical toolboxes. At the very least, they raise interesting questions about the relationship between qualitative and quantitative approaches to literature, between cybernetics and ghosts.

Becca Rothfeld is assistant literary editor of The New Republic and a contributor to The Los Angeles Review of Books, The New York Daily News’ literary blog, The Baffler, and Slate, among other publications....

4 replies on “Plotting Your Favorite Novels Visually”

  1. The distinction between what is syuzhet and what is fabula isn’t always so clean cut. It often is influenced by what you choose as your ‘unit.’ For example, when we were doing all of our work on ‘tv-as-found-database’ ( ) we found that certain narrative forms (syuzhet) were invisible to us when working from the basis of the single ‘shot.’ For example, working from a shot-by-shot basis, it was impossible to make a category of ‘every flashback’ because that structure required more information than was contained in a single shot. Similar challenges exist here. Reading through the code on github it looks like the basic unit of the sentiment analysis is the sentence. Are certain structures invisible when the sentence is the unit? From the linked blog posts, there are similar unit challenges in using Fourier Transforms to solve the time domain variance issue.

    This comment is already too long, but the problems show up as boundary issues as mentioned here: ( )

    1. Basically, you are right, but Jockers has clearly defined what he means by syuzhet and fabula. So, when reading his studies we should thing to syuzhet and fabula in his own terms.

      1. My question still stands: Are certain structures invisible when your informational data point is the sentence? We’ll have to wait for his book and its larger conclusions to find out.

  2. So, what’s the point? I get that there are only so many types of story, or plots or, more accurately, conflict types. There are already books on the subject. It’s nothing new.

    The rest seems to me to be an over-intellectualization that serves neither the writer, nor the person trying to understand a novel.

    This is just meant to be something in the critic’s toolbox? Yikes.

Comments are closed.