The (pseudo)science of profiling and surveillance

9 minutes to read
Piia Varis

Much has been made about the likes of Cambridge Analytica in supposedly hacking not only our brains but our democracies, too, and the disastrous effects of online profiling to influence people for political causes. Profiling is also used at borders to technologically 'manage' migration, and to identify those who shall not pass. All of this relies on specific ideas on how people can supposedly be made transparent: their characteristics, motivations and intentions readable, and consequently their behaviours influenced. As it turns out, these ideas may not be all that well-founded.

The science of online profiling

In Targeted. My inside story of Cambridge Analytica and how Trump, Brexit and Facebook broke democracy (2019) Brittany Kaiser gives an insider's account of what has been one of the most discussed political stories in the past couple of years. In the book Kaiser explains how, instead of the more traditional route of relying on pollsters, Cambridge Analytica (CA) employed in-house psychologists to design political surveys and 'segment' people based on them.

CA used 'psychographics' "to understand people's complex personalities and devise ways to trigger their behavior. Then, through 'data modeling', the teams data gurus created algorithms that could accurately predict those people's behavior when they received certain messages that had been carefully crafted precisely for them" (p. 25). CA, Kaiser explains, "weren't trying to fit people into categories based on what they looked like or any other preconceived assumptions we might have about them, but according to their underlying motivations and their 'levers of persuasion'." (p. 92)

To this end, they took Facebook data, obtained for instance through fun-sounding quizzes such as Sex Compass and Musical Walrus (p. 84) that people enthusiastically take to 'discover' something about their sexuality or musical tastes, and of course, importantly, to broadcast the results as identity statements to their networks. These they matched with other data from 'outside vendors'. They then gave millions of people OCEAN scores based on the thousands of data points they had on them (p. 84).

OCEAN, or the Big Five model, is about determining the degree to which someone is 'open' (O), 'conscientious' (C), 'extroverted' (E), 'agreeable' (A), or 'neurotic' (N). Having done some sorting with people, "CA then added in the issues about which they had already shown an interest (say, from their Facebook 'likes') and segmented each group with even more refinement." (p. 85) As a result, CA claims to have "provided clients, political and commercial, with a benefit that set the company apart: the accuracy of its predictive algorithms" (p. 86). They then "took what they had learned from these algorithms (...) and used platforms such as Twitter, Facebook, Pandora (music streaming) and YouTube to find out where the people they wished to target spent the most interactive time. Where was the best place to reach each person?" (p. 86)

Just to give one example of the kind of pitches Kaiser and others at CA gave as a result: "The 'Extroverted and Disagreeable' voter needs a message that is all about her ability to assert her rights (...) This type of voter likes to be heard. On any topic (...). She knows what's best for her. She has a strong internal locus of control and hates to be told what to do, especially by the government" (p. 93).

At the risk of making an understatement: there's a lot going on here. For one, the validity of the OCEAN model has of course been the subject of serious criticism (see here for an example), as have personality tests in general. Not to mention that for anyone researching anything related to meaning and social life, just the idea that something very specific can be inferred from something like Facebook 'likes' seems somewhat dubious, to put it mildly. People hate-like, curious-like, irony-like, peer-pressure-like, and so on.

This problem with indexicals - what does a data point mean exactly - leads us to the interesting reporting The Correspondent has done around the questions 'What do we really know about the effectiveness of digital advertising?' and 'Are advertising platforms any good at manipulating us?'. As one of their articles points out about the hype, "Newspapers are teeming with treatises about these tech giants' saturnine activities. An essay by best-selling author Yuval Noah Harari on 'the end of free will' exemplifies the genre: (...) it's only a matter of time before big data systems 'understand humans much better than we understand ourselves'. In a highly acclaimed new book, Harvard professor Shoshana Zuboff predicts a 'seventh extinction wave', where human beings lose 'the will to will'. Cunning marketers can predict and manipulate our behaviour. Facebook knows your soul. Google is hacking your brain."

Tech companies such as Microsoft, IBM and Amazon all sell 'emotion recognition' algorithms that supposedly are able to tell how people feel based on facial analysis.

The Correspondent reporting on 'the dot com bubble that is online advertising' discusses how "Realistically, advertising does something, but only a small something - and at any rate it does far less than most advertisers believe." Something to keep in mind "the next time you read one of those calamity stories about Google, Facebook or Cambridge Analytica. If people were easier to manipulate with images and videos they don't really want to see, economists would have a much easier task."

Another issue they point to is that advertisers will have to rely on third parties for metrics: "It's Facebook and Google themselves who tell their advertisers how many views and clicks their ads have generated." And not all of that is necessarily all that accurate: "Facebook has several lawsuits against it that revolve around metrics. Advertisers and publishers believe that the company has been fooling them for years. After all, you pay Facebook based on its own reporting. If that reporting is false, you will have paid too much." Facebook has also admitted to having overstated its numbers on e.g. how many people watch videos and how many click on ads. As The Correspondent reporting concludes: "Call it the paradox of the online advertising world: metrics are sacred, but proper measurement is impossible." Fittingly, in this exchange, for instance, Brittany Kaiser does not seem to be able to give a very convincing answer when challenged about the actual effects of Cambridge Analytica targeting.

Apart from journalists, researchers such as Benkler et al. (2018) have expressed skepticism about Cambridge Analytica and  recommend that "we should be similarly cautious to impute to Facebook magical powers of persuasion." (p. 279) While arguing that "it is plausible that microtargeting will improve as the algorithms for identifying personal characteristics improve", Benkler et al. (ibid.) still maintain that "Using tailored advertisements to change hearts and minds, and more importantly voter behavior, is still primarily an act of faith, much like most of the rest of online advertising."

The science of border control

In the digital border in the meantime, acts of faith also appear to be taking place. The EU border for instance has been a playground for testing a system called IBORDERCTRL "which will analyse the facial expressions of those at the border, looking for 'micro-expressions'." This exercise, costing the EU 4.5 million euros, is meant to catch 'illegal' immigrants and prevent crime and terrorism. The European Commission website explains that "IBORDERCTRL's system will collect data that will move beyond biometrics and on to biomarkers of deceit", and that "The unique approach to 'deception detection' analyses the micro-gestures of travellers to figure out if the interviewee is lying."

According to Ray Bull, professor of criminal investigation, there's deception on the EU's side: "They are deceiving themselves into thinking it will ever be substantially effective and they are wasting a lot of money." And this is because "The technology is based on a fundamental misunderstanding of what humans do when being truthful and deceptive." Louise Amoore, professor of political geography, weighs in on the use of border technologies as follows: "There is no doubt in the research I and others have done that people have been wrongly detained, questioned, stopped, searched and even deported on the basis of a technology that has huge propensity for false positives." There are all kinds of challenges involved in the science concerned with reading people's faces for their intent and emotional states. In the meantime, tech companies such as Microsoft, IBM and Amazon all sell 'emotion recognition' algorithms that supposedly are able to tell how people feel based on facial analysis.

We live in a Pokemonesque "Gotta catch them all!" world of surveillance and data gathering.

Plenty of mistakes have been made indeed by relying on 'smart' technologies that seem to deliver pretty stupid results. An AI Now Institute report includes examples such as a voice recognition system designed to detect immigration fraud leading to the mistaken cancelling of visas and deporting people in the UK. People have been placed on terrorist watch lists and 'no-fly' lists, which the use of big data and algorithmic prediction give the veneer of 'scientific objectivity' while the outcomes can be disastrous for individuals. Some researchers worry about facial recognition meaning the return of the pseudoscience of physiognomy, drawing conclusions about a person's character, intent, traits and behaviour based on their appearance. Others point out that “No serious researcher would claim that you can analyze action units in the face and then you actually know what people are thinking”.

While border management clearly relies on science that doesn't stand up to scrutiny, researchers working with AI, emotion recognition and affective computing criticise the hype surrounding them and their supposed efficacy: "Many agree that their work - which uses various methods (like analyzing micro-expressions or voice) to discern and interpret human expressions - is being co-opted and used in commercial applications that have a shaky basis in science."

Science and dataism

We live in a Pokemonesque "Gotta catch them all!" world of surveillance and data gathering. When it comes to understanding and interpreting that data, not to mention the trust that is placed on the accuracy and efficacy of the data that is being gathered, it seems we are often labouring under what José van Dijck has called 'dataism'. The ideology of dataism, she (2014: 198, emphasis original) explains, "shows characteristics of a widespread belief in the objective quantification and potential tracking of all kinds of human behavior and sociality through online media technologies. Besides, dataism also involves trust in the (institutional) agents that collect, interpret, and share (meta)data culled from social media, internet platforms, and other communication technologies."

One reason this trust seems misplaced is the fact that "we lack insight in the algorithmic criteria used to define what counts as job seeking, dysfunctional motherhood, or terrorism" (van Dijck 2014: 204). Data points are being interpreted, facial expressions analysed, and conclusions drawn about the individuals concerned. But whom do we trust with the interpretation, and what are the interpretations based upon? While it's surely a concern that with blanket data collection Personally Identifiable Information is becoming an ever-broader category, we should also be paying attention to the Personally Damaging Interpretations made as information is turned into supposedly reliable data.

Acts of faith take place all the time in profiling and surveillance, with often catastrophic consequences for individuals. But someone's laughing all the way to the bank. When it comes to the industrial border complex, it's huge in the US, and the European Commission blatantly states on their website about IBORDERCTRL and 'smart' lie-detection that "'The global maritime and border security market is growing fast in light of the alarming terror threats and increasing terror attacks taking place on European Union soil, and the migration crisis,' says [project coordinator George] Boultadakis. As a consequence, the partner organisations of IBORDERCTRL are likely to benefit from this growing European security market – a sector predicted to be worth USD 146 billion (EUR 128 bn) in Europe by 2020."

As for online targeting and advertising, estimates vary, but "The amount of money spent on internet ads goes up each year. In 2018, more than $273bn dollars was spent on digital ads globally, according to research firm eMarketer. Most of those ads were purchased from two companies: Google ($116bn in 2018) and Facebook ($54.5bn in 2018).” Cambridge Analytica may be history now, but new aspiring data influencers keep appearing. The Trump 2020 campaign is now allegedly working with a company called Data Propria, featuring three former Cambridge Analytica employees. This time, however, they claim to be “focused on campaign operations and data analysis – not behavioral psychology. Data Propria is ‘not going down the psychometrics side of things’”.  On its website, Data Propria promises that “We elevate the art of storytelling at the nexus of where science meets emotion. We gather, understand, and apply rational qualities of human interpretation to produce measurable results. It’s not magic, it’s a deeper level of understanding.” Sounds deep alright. (Though on the 'art of storytelling' we might better rely on something like this.) It remains to be seen what the 'science meets emotion' might mean.

Both online profiling and targeting as well as border surveillance seem to rely on dubious scientific claims and foundations. The consequences vary from the mild inconvenience of seeing 'irrelevant' or annoying ads about anything ranging from toothpaste over to Trump, to life-and-death decisions on someone's mobility at the digital border. The science may be shaky or uncertain, but one thing's for sure: there's Big Money involved. And this means that acts of faith and dataism can become business as usual.