Another layer of opacity: how spies use AI and why we should talk about it

Discussion Prompt: Will AI solve the "information overload" challenge for intelligence agencies?

See all contributions to this question.

Instead of reducing complexity for intelligence agencies, AI creates new data problems and will add another layer of opacity to already secretive organisations. At the same time, the general hype around AI presents a unique opportunity to learn about and weigh in on intelligence practice in real time.

Whenever Western intelligence agencies developed new capabilities for surveillance in the past, a decade or two would pass until a public debate around their implementation took hold. The interception of telegraph communication, including the first use of computational power for pattern recognition by the allied codebreakers during World War II, wasn’t publicly recognised until the 1970s. The global satellite surveillance system known as Echelon had been established by US intelligence agencies and their partners in the 1970s. Yet, only in September 2000 did it receive political attention by the European Commission. Internet surveillance began in the late 1990s but wasn’t the subject of broader political and legal inquiries until the revelations by Edward Snowden in 2013.

With Artificial Intelligence (AI), the situation is different — at least it could be.

The good thing about the hype around AI is that even the most secretive of organisations have come out with a public strategy. In 2018, the US Office of National Intelligence published a declassified version of its AI initiative. In NSA’s tech magazine The Next Wave, researchers discuss some of the machine learning algorithms they are developing for their application in signals intelligence. On this side of the Atlantic, GCHQ’s Deputy Director for Strategic Policy has publicly justified the agency’s use of AI. The British agency, he claims, has an “obligation to innovate” and needs AI to cope with the “information overload” [read GCHQ’s contribution on this topic].

One reason for intelligence agencies being relatively open about the use of AI is that they rely on industry and academia. The agencies see themselves in fierce competition not only with foreign counterparts over AI innovation but also with tech companies over AI expertise. This sense of competition, combined with the high level of attention on AI in media, academia, and politics, could open a window of opportunity for civil society and policy makers to shine a public light on otherwise secret, routinised surveillance.

In this article, I will first offer a peak through that window by giving an overview of how signals intelligence (SIGINT) agencies use AI, as far as we know. I will then argue that instead of reducing complexity for intelligence agencies, AI creates new data problems and will add another layer of opacity to already secretive organisations. Eventually, I will offer a few suggestions for how the opacity of intelligence as a whole can be moderated during the AI moment.

How intelligence agencies use AI

What do we know about how spies use AI, and why should we care? To narrow down this rather big question, I will focus on how SIGINT agencies, which operate in the most data-intensive area of intelligence, use machine learning (ML). ML, arguably the most prominent subfield of AI at the moment, is characterised by computational algorithms that ‘learn’ in the sense of finding out about rules by being exposed to data. This is in contrast to traditional statistics where computational algorithms follow strict, pre-defined rules. For the purpose of this article, we can think of ML as analytical bits and pieces that are becoming increasingly integrated into different SIGINT tasks — instead of imagining an ‘omnipotent AI’ that takes over the agencies’ work.

Broadly, we can distinguish between two types of problems that ML is applied to in SIGINT: tasks that would principally, although very differently, be performed by humans like image or speech recognition, and tasks that would usually be beyond human capacity like recognising patterns in large, complex data sets.

An example of ML in the former category is ‘human language technology’, which the Five Eyes agencies have used for quite some time on an operational level. This includes not only transcribing speech to text (speech recognition) and machine translation but also algorithmic models that recognise who is speaking (speaker identification). The Snowden documents showed how since the early 2000s, the NSA and GCHQ have worked together closely to improve language analytics. Up until now, ML has enabled large-scale surveillance of speech by rendering large quantities of intercepted voice data searchable and speakers identifiable. In 2012, the NSA spent $34 million on research improving language analytics, which included models “that automatically analyze the linguistic content of communications”.

Algorithms & privacy

Now, intelligence agencies would argue that what looks like machine-enabled, massive surveillance is a more targeted, even privacy-friendly technology. For instance, a semantic analysis of the content of our conversations could help decide if it is worth it for an analyst to listen in. This rationale, however, relies on the assumption that surveillance matters only when data are “seen by human eyes”. This idea is flawed for several reasons.

If the automatic collection and storage of information already constitutes a violation of privacy — a view that is supported by the European Court of Human Rights — the algorithmic analysis of data that goes beyond a simple keyword search must do so even more. Moreover, limiting surveillance is not only about protecting privacy but also about restricting the agencies’ actual and potential power over data and communicative relationships in a democratic society. We can’t just assume that the chilling effects of surveillance don’t matter when all our communication is, by default, subject to machine analysis. Lastly, it isn’t always possible to clearly distinguish what part of a surveillance program is conducted by a human or an algorithm respectively, because using ML as an analytical tool requires testing, experimenting, and interpretation by humans. This is particularly true for unsupervised algorithms — for instance in clustering and anomaly detection — that are used by the agencies for pattern recognition.

Pattern recognition: Who and what is ‘suspicious’?

For intelligence agencies, ML promises to be a solution to the ‘unknown unknown problem’: discovering previously unknown patterns of ‘suspicious’ behaviour, and ultimately finding new surveillance targets, which, in some cases, may become drone strike targets. One such application, an NSA program that already used ML for target discovery in 2012, is Skynet. Based on mobile communication data, Skynet was designed to find the couriers of terrorists. What the analysts did with Skynet, in essence, was to train the algorithms (‘random forests’) on the data of the few known couriers and then run the model on large, unknown groups or even whole populations — in this case 55 million Pakistanis.

There’s a range of grave concerns with programs like Skynet. First, not only do they indiscriminately collect but also indiscriminately analyse data associated with people who may have nothing to do with a known suspect at all. Target discovery by ML goes beyond so-called contact-chaining, where, traditionally, an analyst starts from a known suspect and then analyses the wider communication network of this person. Second, Skynet systematically puts ‘anormal’ behaviour under surveillance; it is subsequently labelled as suspicious. An internal NSA slide shows how Skynet, which was designed to find terrorist couriers, ended up finding a journalist: Al Jazeera’s Ahmed Zaidan. This was not the result of algorithmic inaccuracy, rather to the contrary. One thing terrorist couriers and investigative journalists tend to have in common is that they behave differently from the ‘normal’ Pakistani. Both may travel more often and communicate securely — like Ahmed Zaidan who was reporting on the Taliban and Al-Qaeda. This example shows that ML-enhanced surveillance programs like Skynet, in which human reasoning and algorithmic logic are closely entangled, have the potential to harm not only individual privacy but also collective goods such as press freedom.

Target discovery and language analytics are two examples of how agencies use ML on an operational level. A goal for the near future is developing ML models referred to as multimodal AI. Multimodal algorithms are supposed to simultaneously process information from texts, images, and sounds — “across all INTs [intelligence collection disciplines] and open source”. The development of an AI-enhanced “all-source sense-making” capability highlights the need to oversee and define rules for what intelligence agencies do with public and private data, including biometrics in the field of “identity intelligence” — an area that SIGINT agencies also engage in but that has been less scrutinised than communications surveillance.

Lots of data ≠ good training data

Recent successes in ML are driven by an explosion of available data and computing power. Considering this, it seems that SIGINT agencies must be, technically speaking, an ideal environment for AI development. But a closer look at the professional context of SIGINT shows that although the agencies might be ‘drowning in data’, good training data is wanting. What appeared to be a data solution has already produced a new data problem. Why?

For one, the type of knowledge intelligence agencies pursue is particular. They look for relatively rare events and very specific people, which makes it hard to generate ample training data. Put simply: there are many pictures labelled ‘this shows a cat’. Ones labelled ‘this shows a terrorist courier’? Not that many. Another problem is the scope, variety, and sometimes poor quality of secretly collected data. The Five Eyes and their European partner services have set up a global surveillance network that includes a myriad of different languages, dialects, and accents spoken around the world. Even for state-of-the-art algorithms, it is still more challenging to recognise regional Arabic, Russian, or Mandarin than English or French. On top of that, intelligence agencies must deal with noisy intercepts from people who might not want to be discovered, making the use of ML even more challenging. In contrast to commercial applications like Amazon’s Alexa, secretly intercepted voice data seldom comes in well-articulated, simple sentences like “How. Will. The. Weather. Be. On. Sunday?”. The organisational environment of intelligence, which is highly departmentalised and thoroughly classified and thereby limits the ability to label and share data, also contributes to the scarcity of good training data.

There are two conclusions to draw from the training data problem. First, both security professionals and surveillance critics should resist the temptation to fetishise the power of AI. Instead, bias and ‘erroneous accuracy’ should be a more common concern with using ML in high risk environments such as intelligence. If Google images incorrectly classified African-Americans as gorillas, how many people will be put under surveillance by facial recognition models because they share features with NSA’s most wanted terrorists? All male hipsters wearing full beards and fisherman beanies, be aware! But seriously: while the systematic surveillance of hipsters is a purely theoretical exercise, the systematic surveillance of investigative journalists through anomaly detection by programs like Skynet is not. In addition, biases inscribed into and ‘objectified’ through algorithms might make the usual suspects of Western agencies — Arabic and Muslim populations — particularly vulnerable to automated surveillance.

Second, it is important to pay attention to the steps that agencies will take to deal with the lack of training data. AI creates an incentive to collect ever more and to share data or algorithmic models with domestic but also with transnational partner agencies. Overseers should try to make sure that transnational joint ventures for AI development won’t be buried under third party rules^[1].

Another consequence of the training data problem is that intelligence agencies have become particularly interested in unsupervised or semi-supervised learning, which is more promising for analysing very large amounts of unlabelled data. In a 2011 memo, a GCHQ researcher acknowledged that supervisedalgorithms with high levels of accuracy “had surprisingly little impact in GCHQ”. However, in contrast to already established, high-performance, supervised analytics — like some facial recognition systems — unsupervised ML is more difficult to operationalise. It is often applied by analysts to make “inferences in the absence of [known] outcomes”, as AI expert Scott Zodi explains. For instance, when unsupervised ML is used for anomaly detection, like in Skynet, it is not straightforward what anomaly means, and how to set the thresholds and weights that define ‘anomaly-ness’ and hence ‘suspicious-ness’. ML does not automatically and magically extract some objective ‘truth’ out of ‘raw data’. Particularly unsupervised learning therefore creates new forms of uncertainty for a profession that claims to secure.

A third layer of opacity

There is something undeniably opaque about the way powerful organisations, and especially intelligence agencies, involve algorithms in decision-making and knowledge production. In her paper “How the machine ‘thinks‘”, Jenna Burrell conceptualises three general levels on which that opacity manifests itself. The first layer is organisational secrecy. SIGINT agencies have cultivated a particularly high level of secrecy, often even more so than other intelligence organisations, which renders internal rules and procedures incomprehensible — including computational algorithms. The second layer of opacity, according to Burrell, is unequal technical literacy. ML is part of a complex surveillance infrastructure that is only understood by a relatively small circle in SIGINT, which, more often than not, excludes those in charge of overseeing intelligence.

These two layers of opacity are nothing new to intelligence governance. State secrecy and technical complexity have made it difficult to establish effective forms of oversight and accountability for a long time. But machine learning adds a third layer of opacity to the intelligence process that has to do with the limited interpretability and inscrutability of how these algorithms work. Not only do useful machine learning applications usually operate at scale and often on heterogeneous data, it is in the very nature of machine learning that algorithms generate and alter their own logic when analysing (and learning from) data. That exponentiates the complexity of the process, rendering it difficult to understand even to those who designed it. Burrell writes: “While datasets may be extremely large but possible to comprehend and code may be written with clarity, the interplay between the two in the mechanism of the algorithm is what yields the complexity (and thus opacity).”

Limiting opacity

For all three layers, there are different strategies to address opacity and the problem of accountability that comes with it. Organisational secrecy is habitual and intentional but not natural. As stated at the beginning of this article, this moment might constitute a unique opportunity to talk about (AI in) intelligence beyond a narrow circle, especially in a European context. However, most of the examples used in this article are from the US, where agencies have been more upfront about their AI plans. In Europe, intelligence has been notably absent from much of the recent debate around AI. Here, if AI-enhanced surveillance is addressed, it is usually in the context of predictive policing. This needs to change. Intelligence needs to be included more systematically in the European discourse on AI by reflecting on its organisational and regulatory context.

Political debate, and thereby transparency, can remedy the first layer of opacity, but they won’t be useful to address unequal technical knowledge. Here, a good starting point would be to give oversight bodies greater technical capabilities and expertise, especially on ML and its social consequences, and to establish internal procedures to avoid organisational silos composed of a few people privy to AI. At the third layer of opacity, the inscrutability of ML, even code audits by skilled experts may not do the trick. Labelling who or what is suspicious has implications for privacy, press freedom, and real people’s lives. Explaining these actions remains crucial but becomes more challenging when ML is used. How will GCHQ, for example, meaningfully explain why a certain communications link was chosen for surveillance by the ML algorithms it is experimenting with? Evolving research areas like explainable AI promise to reduce the opacity of ML in the future. However, we should keep in mind that solving data problems with data solutions might not always be the way to go. As the current debate on facial recognition shows, it is well within the bounds of reason simply not to use ML-based surveillance tools at all in certain areas; or at least to put safeguards in place before the technology is implemented.

A nuanced understanding of what exactly is so opaque about ML, and why, is more productive than simply assuming it to be a natural black box. Nonetheless, the blackbox metaphor holds value in another sense. According to the sociologist Bruno Latour, a technology is primarily a black box when it works so well that no one questions it anymore. Thus, paradoxically, the more successful and taken-for-granted technologies are, “the more opaque and obscure they become” (p. 304). From this perspective, rather than become increasingly explainable, AI might become more and more of a black box, even for people inside the field of intelligence. Yet another reason why right now is the moment to find out more about how spies use AI and what we can do about that.

This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation – Project Number 396819157).

[1] The European Union Agency for Fundamental Rights provides some background for this: “In international intelligence service cooperation, the third-party rule prevents a service from disclosing to a third party any data received from a partner without the source’s consent. FRA’s research underlines that the third-party rule protects sources and guarantees trust among intelligence services that cooperate. However, FRA’s data show that oversight bodies are often considered as ‘third parties’ and therefore cannot assess data coming from international cooperation. In some Member States, oversight bodies are no longer considered as ‘third parties’ and so have full access to such data.”

Another layer of opacity: how spies use AI and why we should talk about it

How intelligence agencies use AI

Pattern recognition: Who and what is ‘suspicious’?

Lots of data ≠ good training data

A third layer of opacity

Limiting opacity

Add comment

Other articles

Legal and Oversight Gaps in Germany’s Military Intelligence

Update on the implications of the Dutch Temporary Cyber Operations Act

Why and how German judicial intelligence oversight needs empowerment

The Pegasus scandal in Poland – between old problems with state surveillance and the current rule of law crisis

Finnish intelligence overseers’ right of access supersedes Originator Control