Why We Need Both Close and Distant Reading in the Age of Big Data
Ever since the rise of Digital Humanities a few decades ago, there have been fierce, ongoing debates between adapters of computational methods and defenders of ‘old-school’ humanist approaches. Inge van de Ven argues that we need both.
Earlier this year, Daniel Allington, Sarah Brouillette, and David Golumbia published a much-discussed piece in the LA Review of Books (1-5-2016), critiquing Digital Humanities (DH) advocates for their alignment with the ‘neoliberal takeover’ of universities, and provoking Alan Liu to defend DH on Twitter  A recurring point of contestation within this debate centers on methodological concerns: the ‘traditional’ humanist strategy of close reading, versus the new computational method of distant reading. Franco Moretti has provocatively stated that close reading is a ‘theological exercise’ and that we should ‘learn how not to read’ (2013: 48). Others, like Michael Manderino (2015) and Antoine Compagnon (2014), attempt to rehabilitate close reading and its devotion to detail, arguing that we need these skills more than ever in times of information overload.
In a nutshell: Close reading
Close reading is an umbrella term for an assortment of reading strategies characterized by a devout and detailed attention to the meaning and composition of art works. The approach was made famous by the New Critics, a group of Anglo-American literary scholars including Cleanth Brooks, William K. Wimsatt and Monroe C. Beardsley. Inspired by I.A. Richards (author of Practical Criticism, 1929), Matthew Arnold and T.S. Eliot. These scholars experienced their heyday of academic fame in the forties and fifties of the last century. Going against contemporary practices that overvalued historical context and biographical information, New Criticism suggested that the critic would do well to take a better look at the text itself. Their work consisted of a lot of pointers as to what not to do when analyzing literary texts: letting your own emotions factor into the interpretation (Wimsatt and Beardsley’s ‘affective fallacy’, 1949), or worrying about what the author wanted (the ‘intentional fallacy,’ Wimsatt and Beardsley, 1946). Another practice they railed against was the paraphrasing of the contents or ‘message’ of a work (the ‘Heresy of Paraphrase’, Brooks, 1947).
Close reading was not considered a method by its practitioners. It could not be systematized as it demanded tact, sensitivity, and intuition: performing a close reading called for fingerspitzengefühl.
What was allowed according to this school, was the examination of evidence offered by the text itself: images, symbols, and metaphors as part of a larger structure that gives the text its unity and meaning. Of particular interest to the close reader were devices that create ambiguities, paradoxes, irony, and other forms of tension within the text. Importantly, close reading was not considered a method by its practitioners. It could not be systematized as it demanded tact, sensitivity, and intuition: performing a close reading called for fingerspitzengefühl.
Perhaps none too surprisingly, the 1960s saw the downfall of this textual approach (at least in its purest form). Besides being considered elitist (solely focused on complex, ‘high-end’ texts) and intellectualist (favoring intricate and dense interpretations and treating the text as a puzzle to be solved), New Criticism was deemed too restrictive. In its most dogmatic form it does not invite considerations of race, class, gender, emotions, the author, reader response, socio-historical context, and ideology — all categories that moved to the center of literary studies in the sixties. With the rise of poststructuralism, close reading became a bit of an embarrassment, connoting evasiveness and acquiescence of the status quo. Yet these last decades, it seems ready for a revival: the term is brought up more frequently, and always in opposition to distant reading.
So… What Is Distant Reading?
Distant reading is the practice of aggregating and processing information about, or content in, large bodies of texts without having to actually read these texts (see Drucker, 2013). ‘Reading’ is outsourced to a computer: it is in fact a form of data mining that allows information in (e.g. subjects, places, actors) or about (e.g. author, title, date, number of pages) the text to be processed and analyzed. The latter are called metadata: data on the data. Natural language processing can summarize the contents of ‘unreadably’ large corpora of texts, while with data mining we can expose patterns on a scale that is beyond human capacity.
Thanks to distant reading and big data analysis we now know who the real author of The Cuckoo’s Calling is and that Hamlet would in fact be significantly different without... the character Hamlet
The term was coined, somewhat provocatively, by Franco Moretti in 2000. He introduced distant reading in explicit opposition to the old close reading which, to his mind, fails to uncover the true scope of literature as its sample sizes are simply too small. Moretti is founder of the Stanford Literary Lab, which seeks to confront literary ‘problems’ by scientific means — computational modeling, hypothesis-testing, automatic text processing, algorithmic criticism, and quantitative analysis. With this lab, he published a series of pamphlets answering questions like: can computers recognize literary genres? (yes they can), can we employ network theory to map out plots? (yes we can). Thanks to distant reading and big data analysis we now know who the real author of The Cuckoo’s Calling is (J.K. Rowling)  and that Hamlet would in fact be significantly different without... the character Hamlet (Moretti, 2013: 218-22).
The considerable merits that the new possibilities of computational analysis bring to the Humanities can hardly be denied. Yet, distant reading has not been received in an unambiguously positive way. As many students in my Hermeneutics course expressed in their responses to this method, it’s not reading (Myrthe Giesen: ‘what does the amount of pages say about the book that is meaningful in relation to the story?’). To my relief, for a generation brought up with computers, interpretation as a meaning-making activity is still deemed vital in a Humanities education.
More information doesn’t necessarily bring us closer to meaning; in fact, the opposite could be the case. Several scholars, too, have expressed their concern that the importance of interpretation will be underestimated if data-centered methods take center stage. As the title of Lisa Gitelman’s 2013 collection goes, Raw Data Is an Oxymoron. We should not be so naïve to believe that through data (which in Latin means ‘given,’ in the sense of ‘fact’) the ‘real’ is transmitted, as if independent of representation and the subjective human perspective . But should a much-needed reminder of the importance of interpretation automatically lead us back to a defense of the old close reading?
Modes of Attention
Strikingly, in this debate, close and distant reading are almost always presented as mutually exclusive: if you adhere to the one, you have to be condescending about the other. Of course they do have undeniably antithetical features: ambiguity versus transparency, form versus information, devout attention versus not reading, small versus big (in fact metadata can be seen as the ultimate paraphrase). To give close attention to anything we need to make choices, and we live in an era that tends to undervalue selection (think of the word ‘discrimination’). We want the full picture: Big Data, Big Science, Big Humanities. Distant reading (‘[T]he more ambitious the project, the greater must the distance be,’ Moretti, 2013: 48) caters to current demands. Yet by reinscribing this binary, the debate remains stuck at the surface level. Are quantified, big-scale methodologies and meticulously attentive readings really mutually exclusive?
The ‘crisis’ the Humanities supposedly face, to my mind is one of attention in two senses. First, in the sense of valuation: in this respect the debate can be traced back to the canon wars of the eighties. Moretti in fact proposed distant reading as a solution to questions of the canon of World Literature: ‘Reading ‘more’ is always a good thing, but not the solution’ (2013: 46). Big data would be the ultimate dissolver of the canon, and not reading the most democratic gesture thinkable.
The debate also regards attention in a second sense, of concentration: a shift in modes of attention we use for acts of reading. Here I think of the supposed dichotomy between what N. Katherine Hayles in How We Think (2012) calls ‘deep attention’ versus ‘hyperattention’. As contemporary media environments become more information-intensive, Hayles claims that a shift in cognitive modes is taking place, turning from the deep attention needed for humanistic inquiry and heading towards the hyperattention that is typical in the act of scanning Web pages.
The enormous amount of material online that awaits us for reading leads to skimming instead of prolonged attention to one source of input. Hyperlinks draw away our focus from the linear flow of the text, very short forms of writing like tweets promote reading in a state of distraction, and small habitual actions such as clicking and navigating increase the cognitive load of web reading (12). Then again, don´t we continuously switch gears between both modes? I side with Kirstin Veel who argues that today, distraction often is a prerequisite for concentration instead of its opposite (2011: 312).
Let’s do both!
The same logic holds for close and distant reading. Both have invaluable assets to offer Humanities research, education in general, and the world beyond it. Close reading digs for complexity, opacity, and ambiguity: values that stand to be reappraised in a in a time when we encounter vast bodies of information through multiple platforms, and tend to overemphasize transparency and immediacy. Distant reading uncovers textual elements at scales inaccessible to the human reader, offering new and unexpected vantage points for researchers. Besides, the similarities between both approaches have rarely been noted: both modes of (not-)reading are directed at pattern recognition. Close readers look for repetitions, contradictions, and similarities, and ask ‘how does this object work?’ instead of ‘what is its message?’ or ‘what does it stand for’? To my mind, distant reading does something surprisingly similar, only for larger datasets and aided by non-human readers.
Literature can become a testing ground for strategies of dealing with sign systems in a time of big data.
The real challenge that lies in front of us is reinvestigating the different ways in which we read today — online and offline, analog and digital, deep and hyper, computer and human — with attention to shifts and scale variances. And we can only do so if we take both close and distant reading and their respective potentials seriously. How can we read creatively, making use of this whole range? Even at their most extreme (not reading thousands of nineteenth-century tomes, writing a 600-page analysis of a 6-line poem), both approaches point us to the question as to what constitutes legibility now writing is increasingly replaced by, and reworked into, other codes. How does what we consider ‘legible’ change under the influence of digitalization and datafication? How to decide what to read and what to outsource? How to combine reading with strategically not-reading, or hermeneutics with computation? Can we find ways to perform machine readings attuned to aesthetic properties? Let’s be creative and close read the maximalist or distant read the miniscule. Literature can become a testing ground for strategies of dealing with sign systems in a time of big data.
What is more, there are texts and media that actively resist the binary between close and distant readings and demand a variation between scales. Qualitative, traditional humanities methods of textual analysis fall short of analyzing works of literature like Richard Grossman’s ever-in-progress Breeze Avenue, a novel with a projected three-million-page length, or ‘endless’ computer-generated texts that do nevertheless reward close analysis. What to do with micronarratives or Twitterbot poetry whose single, minimal units of output are in themselves not terribly interesting, but whose underlying algorithms are? Such objects solicit new ways of reading which zoom in and out between part and whole, micro and macro, surface and depth, and which negotiate between attention and distraction, the legible and illegible.
A more open-minded perspective on the close and the distant is warranted outside academia as well. After all, we collectively read more than ever before, both online and offline. Training and reflection on how we read and how we select what not to read, what demands close attention and what can be skimmed, what must be understood in a deeper sense and what can be consumed in a distracted fashion, would benefit education from an early age, and should not be refrained to the classrooms of Hermeneutics courses.
 From the second half of the twentieth century, humanities scholars cautiously started using computational methods for research and teaching; it is only now that these methods are becoming more central to the curricula and research agendas.
 For Liu’s response see On Digital Humanities and "Critique"
 5 Ways Big Data Analytics Caught J.K. Rowling in the Act : Pseudonyms Can’t Hide
 On the devaluation of interpretation in distant reading and computational approaches, see also Van Dijck, 2014; Drucker, 2011; Compagnon, 2014: 276).
 On (the demise of) deep reading, see also Proust and the Squid: the Story and Science of the Reading Brain (2007) by developmental psychologist Maryanne Wolf.
 See, for instance, <poem.exe> for a Twitterbot which ‘randomly’ assembles haikus from a database and spews them out on Twitter.
Allington, D., S. Brouillette and D. Golumbia (2016). Neoliberal Tools (and Archives): A Political History of Digital Humanities. LA Review of Books, 1-05-2016.
Brooks, C. (1947). The Well-wrought Urn: Studies in the Structure of Poetry. San Diego: Harcourt.
Burdick, A., J. Drucker and P. Lunenfeld et al. (2012). Digital_Humanities. Cambridge: MIT P,
Compagnon, A. (2014). The Resistance to Interpretation. New Literary History. Volume 45, issue 2, 2014. Pages 271-80.
Van Dijck, J. (2014). Datafication, Dataism and Dataveillance: Big Data between Scientific Paradigm and Ideology. Surveillance & Society. Volume 12, issue 2, 2014. Pages 197–208.
Drucker, J. (2011). Humanities Approaches to Graphical Display. Digital Humanities Quarterly 5.1, 2011.
Drucker, J. (2013). Intro to Digital Humanities. September 2013. Web.
Gitelman, L. (ed.) (2013). “Raw Data” Is an Oxymoron. Cambridge and London: The MIT Press.
Grossmann, R. Breeze Avenue. Ongoing.
Hayles, N. K. (2012). How We Think: Digital Media and Contemporary Technogenesis. Chicago: The U of Chicago P.
Manderino, M. (2015). Reading and Understanding in the Digital Age. A look at the critical need for close reading of digital and multimodal texts. Reading Today. Jan/Feb. 22-3.
Moretti, F. (2013). Distant Reading. London: Verso.
Richards, I.A. (1929). Practical Criticism: A Study of Literary Judgment. London: Routledge, 2014.
Veel, K. (2011). Information Overload and Database Aesthetics. Comparative Critical Studies. Volume 8, issue 2–3, 2011. Pages. 307–19.
Wimsatt, W. K. and M. Beardsley. (1946). The intentional fallacy. Sewanee Review. Volume 54. Pages 468-488.
Wimsatt, W.K. and M. Beardsley. (1949). The affective fallacy. Sewanee Review. Volume 57, issue 1. Pages 31-55.
Wolf, M. (2007). Proust and the Squid: The Story and Science of the Reading Brain. New York: HarperCollins.