Netflix is by far the biggest online streaming platform in the world. The platform provides suggestions on what to watch for over 167 million people in almost every country worldwide. Additionally, the platform has started producing its own content since 2012. More than 1,200 Netflix Originals have already been produced and the company is planning to spend another 8 billion dollars on producing new content in 2020. The trick to their success? Using data science. Algorithms are used to present the right titles for each of their subscribers at the right time, and the data they collect from their users is used to produce new shows which have the aim to connect emotionally with their viewers.
Netflix uses data science to present content to users in a highly personalized way by using the data from earlier viewing experiences, which in time reveal more and more accurately what a user wants to see. But, and this is a side effect of this process, a user’s personal "taste" is measured and predicted based on their previous clicks, and therefore they will only see films and shows that perfectly fit their calculated appreciation. As Eli Pariser has argued, when algorithmic information services are personalized to this degree, users' preferences lead them into filter bubbles where they will only find information that fits their own way of thinking (Pariser, 2011).
To understand what this means for Netflix users, I draw on Neta Alexander's (2016) insights. Alexander uses the definition of “great films” given by Susan Sontag in her 1996 essay “The decay of cinema”: “works based on the actual violations of the norms and practices that now govern movie making everywhere in the capitalist and would-be capitalist world — which is to say, everywhere.” Following this line of thought, Alexander argues that the greatest films tend to be the most difficult to classify or break down into categories. And because this is exactly what Netflix has to do in order to present their content to their users efficiently, she regards this as problematic since the more information you provide to Netflix, the less likely you are to encounter any great films outside your comfort zone.
For each show on Netflix multiple thumbnails are created that highlight different themes and aesthetics by showing specific imagery.
That said, this article is not a place to evaluate the greatness of films and the potential risks of ending up in a Netflix filter bubble. Rather, it discusses how algorithms are used as a means of evaluating the relevance of information in Netflix's content suggestions and content production processes.
Interogating the algorithm
First, we have to establish the patterns of inclusion at play in Netflix's algorithms, in order to find out which data makes it into the database and how this data is made algorithm-ready (Gillespie, 2012, p. 168). According to Alexander (2016), the Cinematch system roughly works in three steps.
- First, an individual user profile is created, which is built up by the choices that a user has made in the past, the ratings the user has given to shows and the user’s scrolling activity and viewing habits.
- Second, the shows that Netflix screens are provided with an exhaustive amount of tags. In order to tag shows correctly, Netflix hired aspiring screenwriters to rate the shows on many aspects that range from plotline, shooting locations and actors' performances, in order to abstract categories such as tone, morality and emotional effect. The resulting tags are combined with algorithms that group together films that share the same aspects.
- Third, for the recommendation process to start, the algorithm creates customer clusters. These clusters are an example of calculated publics (Gillespie, 2014, p. 188) and they are the foundation of the recommendation system that predicts what a user wants to see based on decisions that other users in a shared cluster have made. It works as follows: users are put into the same cluster when they give similar ratings to the same shows. If one of the members of a certain cluster has not seen a show that others cluster members have given a high rating to, this show will be recommended to this user (Alexander, 2016, p. 86).
Finally, the calculated recommendations are presented to the users in two ways. When you open the Netflix homepage, you see thumbnails depicting shows that are presented in rows. These rows have names that range from “Action Movies” to very specific micro-genres like “Emotional Independent Sports Movies”. As of 2014, there were 76,897 such micro-genres (or "altgenres," in Netflix jargon) that were created by the tagging system described above. Every user gets to see a personalized selection of altgenres.
Additionally, the Artwork Personalization system shows personalized thumbnails for each individual user with the aim to “find the best artwork for each of [their] members to highlight the aspects of a title that are specifically relevant to them.” For each show on Netflix, multiple thumbnails are created that highlight different themes and aesthetics by showing specific imagery. Based on a user profile, the algorithm decides which artwork would be most compelling for a specific user. If a user happens to be interested in comedies, he or she will most likely see artwork showing the image of Robin Williams when Good Will Hunting is recommended. If a user has watched more romantic movies, it will be an image of the two protagonists making out. The artworks also change to make some shows stand out from others on the page.
The last way in which shows are presented to the user is in the form of advertisements for Netflix Originals, which I will discuss below.
Evaluation of relevance
Now that we have established what data makes it into the database we should look at the evaluation of relevance: what are the criteria by which the algorithm decides what is relevant? According to Gillespie (2014), "relevance" is a fluid and loaded judgement open to interpretation; so, to find out what is considered relevant by the Netflix algorithms, we have to look at what is most relevant for the company itself. On the Netflix Tech Blog, Netflix engineers provide background information about their algorithms and operating systems for the public. In one of their entries, we find the following sentence, which can be read as the articulation of the algorithm (Gillespie, 2014, p. 177): “For many years, the main goal of the Netflix personalized recommendation system has been to get the right titles in front of each of our members at the right time.”
The interface of Netflix is designed in a way that obscures the fact that the content constantly changes due to expired licensing agreements.
We can conclude from this statement that what is most relevant for Netflix is that as many users spend as much time as possible on Netflix. This is what drives the algorithms of the recommendation system, which decide what counts as relevant content to present to users. When we take this information as a starting point, we can identify the main question that Netflix has to deal with: how do we satisfy users and keep them on the platform as long as possible?
One of the ways to achieve this is to give users the illusion of endless choice. The Netflix interface is designed in a way that obscures the fact that the content constantly changes due to expired licensing agreements (Alexander, 2016, p. 86). In order to be able to stream content, Netflix has to compete with companies all over the world for streaming licenses. Those licenses can cover individual countries, or regions, or the entire globe, and can vary in duration. That is also the reason why the inventory of offered shows varies from country to country. Since Netflix is available in 190 countries (that means worldwide except for China, Syria, North Korea and Crimea) it seems to be more profitable to escape the bidding wars and to start producing content of their own.
In 2012, Netflix produced their first original show, the series Lilyhammer, in co-production with a Norwegian broadcasting company. In 2013, Netflix produced House of Cards, the first show of which they exclusively held the distribution rights. Nowadays, Netflix produces either shows that are wholly created by Netflix itself, so-called "Netflix Originals," or they pick up the production rights of existing cable TV shows, like the series Arrested Development, and produce additional seasons. The exact number of shows that are produced by Netflix is difficult to confirm but their official website presents 1,200 shows as available “only on Netflix,” and according to the Netflix fan page whats-on-netflix.com, there are currently 1,452 Netflix Originals, of which 132 shows are released in only one single week.
The fact that it is hard to find out the number of Netflix Originals produced is not a coincidence. Ever since it started producing its own content, Netflix has stirred up the traditional television and film business with their distribution methods. This can be best explained with the example of their Oscar-winning film Roma (2018). The film was released in theaters three weeks before it became available for streaming on Netflix. This distribution method conflicts with the standard release routine for distributing films. Originally, films are first released in cinemas, and three months later, they become available for DVD services and the like; after approximately two years, they are also made available for television. This is done so that the film can take advantage of different markets at different times. For Netflix as a streaming service, partaking in this traditional system is not desirable; however, it is desirable to work together with great directors with the opportunity of receiving great awards and additional publicity. And to satisfy the demands of directors and award-giving organizations, it seems that Netflix has decided on the compromise of offering only a three-week theatrical release. This, in combination with the fact that Netflix does not report on box office figures, bothers many people and organizations in the traditional film industry. Still, whatever one may think of the influence of Netflix on the film industry, the fact remains that the company needs to be dealt with. Or, in the more positive words of the Netflix engineers: “It is not often that one gets to witness the transformation of an entire industry.”
Producing films with data science
The answer to the question of how Netflix transforms the industry can be found, perhaps unsurprisingly, in algorithms. Netflix uses data science in two ways for producing original content. First, data science is used to optimize the production process, which takes place all over the world. Mathematical optimization is adopted for business and technical decision-making such as planning budgets, finding locations, building sets and scheduling shooting days.
Second, through data science, Netflix decides what its calculated publics, or customer clusters, want to see based on the retrieved individual user data. In an interview with Wired, Netflix’s VP of product innovation Todd Yellin commented on the use of gathered user data: “That mountain is composed of two things. Garbage is 99 percent of that mountain. Gold is one percent… Geography, age and gender? We put that in the garbage heap. Where you live is not that important.” For Netflix, the only thing that matters are a user’s previous clicks, which determine to which cluster a user gets assigned to. Then, their recommendation system is put to work to make "predictions about a new show’s per-language consumption based on the per-language consumption of “similar” shows," to make sure that “the content connects emotionally with viewers” to finally “turn […] ideas into joy for viewers”. The result: highly personalized content for every user around the world.
“Computational research techniques are not barometers of the social. They produce hieroglyphs: shaped by the tool by which they are carved, requiring of priestly interpretation, they tell powerful but often mythological stories—usually in the service of the gods” (Gillespie, 2014, p. 190).
This metaphor used by Gillespie serves as quite an accurate summary of Netflix’s role in our culture. To understand this role it is necessary to know how they use computational research techniques in deciding what to suggest users should watch next. And it is for each user to decide whether they will embrace or reject these suggestions, to evaluate the greatness of the suggested shows and to decide if it is problematic to find oneself in a filter bubble or not.
In 2020, Netflix will spend 8 billion dollars on producing new content. When watching any of those shows, we should keep in mind that it is their company goal to connect emotionally with viewers, and that they will use any means necessary to achieve that. Don’t be surprised if you actually like what you see because, most probably, you will.
Alexander, N. (2016). Catered to your future self: Netflix’s ‘predictive personalization’ and the mathematization of taste. In K. McDonald & D. Smith-Rowsey (Eds.), The Netflix effect: Technology and entertainment in the 21st century (pp. 81–97), Bloomsbury.
Gillespie, T. (2014) The relevance of algorithms. In P. J. Boczowski & K. A. Foot (Eds.), Media technologies: Essays on communication, materiality and society (pp. 167-193), MIT Scholarship Online.
Pariser, E. (2011). The filter bubble: How the new personalized web is changing what we read and how we think. Penguin Press.
Sontag, S. (1996, February 25). The decay of cinema. The New York Times.