Discussion Prompt: When, if ever, is predictive policing effective, fair, and legitimate? What is the role of data reliability in this?

See all contributions to this question.

Jurisdictions continue to roll out predictive policing methods that use AI-based analytics. So far, trials of these systems – especially those utilising facial recognition – have demonstrated extreme lack of effectiveness. Even more worrying is the risk of fundamental rights violations and the lack of attention paid to the ‘human factor’, i.e. the rights and competencies of people using machine-learning systems based on insufficient data.

Predictive policing promises to reduce crime rates by using AI-based analytics and probabilities. However, by comparing results so far against the collateral impact on fundamental rights and public trust, it seems obvious that the heralded benefits of predictive policing are a myth. This article shows how the practice is rather a danger to society if not developed and implemented with care. It explains in detail the important roles data collection, processing, and interpretation play in relation to the AI systems.

There are currently 11 known projects in the EU that use automated analysis based on machine learning in policing. Machine learning can in fact support policing in calculating the probability of certain crimes, e.g. housebreaking. Trials show however, that it is only possible to calculate probabilities, to say for instance, that areas where break-ins have been common in the past will probably be affected by this kind of crime in the future. This is already known as the “near repeat theory” – here predictive policing data only repeats trends police are already aware of. Research from the Max-Planck-Institute in Freiburg supports this, finding that predictive policing contributes little to policing efforts overall.

Predictive policing may have a strong impact on people’s trust in public authorities. Due to an extensive potential for damage, it is considered a high-risk application in the risk model of the German Data Ethics Commission. A closer look is necessary to discuss the risks of predictive policing and which steps need to be taken before implementing a system authorities do not yet fully understand.

Legal, technical, and structural obstacles

On a legal level, predictive policing means shifting investigation to before the occurrence of an incident, rather than conducting it afterwards. This in turn may lead to a reversal of the presumption of innocence, as AI systems calculate and classify potential danger only by correlational data and not direct evidence.

Collecting and combining personal data for that purpose could violate the EU’s General Data Protection Regulation (GDPR) and the fundamental right of informational self-determination. The number of, and access to, biometric databases has already increased and will continue to do so if databases are further combined.

Access itself may become a legal matter in cases where the implemented software is not secure. Since 2017, the police in Hessen have used the “Hessendata” system to combine and analyse existing data, including from social media platforms, and to detect connections between incidents and personal networks. Besides the possible violation of the GDPR, the developing company Palantir is highly problematic. Not only does this company have a notorious reputation for poor data protection, but it is also closely linked to their leading clients, the United States Department of Defense and the United States Intelligence Community.

But the details of machine learning also reveal a host of problems on the technical level. Machines learn with, and from, already-existing data. These data need to be operationalisable. What is a simple feat with data like age, time, or location becomes difficult and even impossible with data that do not deliver any unique information, e.g. emotions, motives, and affects, which play a crucial role in crime that has a personal relationship context.

Existing data might also contain biases that can be replicated by AI systems. If, for instance, crime rates are higher in districts with low income, AI would learn and extrapolate from that data to make its own one-dimensional judgements. Another source of bias is, for example, the annotation of images used to train the AI. These annotations are often outsourced to low-skilled jobs, where there is little time to reflect about the labeling, and the classifiers (i.e. ‘clickworkers’) are likely not trained in diversity. AlgorithmWatch recently revealed that Google Vision classified a thermometer as a “tool” in a hand that had a light skin tone and a “gun” in a dark-skinned one. (Google has since corrected this).

When building such large data sets, it must be made sure that these data:

  1. are necessary to fulfil the purpose of the AI system;
  2. do not force discrimination aginst women or minorities;
  3. are sufficient in quality and quantity;
  4. and relate to the crime at all.

As AI systems are socio-technical systems, they yield different effects in different social contexts. For example, a system that is trained to detect fraud in taxation authorities would not work well to analyse financial need applications of poor families. Therefore, an AI system should never be implemented for any purpose other than what it is trained for, as this may result in false correlations.

On the structural level, we do not know, and profoundly call into question, whether police authorities have both the legal right and sufficient competencies to choose, test, and evaluate AI systems to mitigate the above mentioned risks. There exists hardly any research on how humans react to machine-based suggestions and decisions; therefore, it is not yet determined which exact regulations are needed to enable and encourage officers to obey or reject AI systems’ suggestions, depending on the circumstances.

Additionally, people with technical AI competencies are so rare, and funding for authorities so low, it is unlikely that police departments have enough qualified staff to monitor all elements of an AI system — from modelling to evaluation — to ensure that bias is absent and fundamental rights are respected. If this is not ensured, it becomes more likely that wrongly selected and poorly implemented systems will lead to discriminating results, false measures, and data being collected and processed that do not meet the target of the modelled system. Furthermore, plans to combine databases on a European level do not calm privacy critics.

The question of human decision-making, addressed, for instance, in various contributions to the “Artificial Intelligence” discussion on about:intel, is extremely important in AI policing. But we have to watch carefully and ask precisely what the ‘human element’ means in detail and what its relationship to the AI system is. After all, studies show that people tend to follow the results of machine learning, and it is yet unclear which rights police officers actually have when deciding against an AI recommendation. It is important to understand on what level the system participates in the policing process. Does it show correlations and offer different recommendations for action? Does it offer one recommendation with the opportunity to accept or reject? Or does it automatically start an action that can only be stopped by human interference? These different levels of automation also constitute three vastly different modes of operation, with significant ramifications for outcome and accountability. Saying that “the human is always in the loop” is not differentiated enough to really address these issues.

Expanding facial recognition is like riding a dead horse

Most predictive policing trial projects work with some form of facial recognition. By now, test runs of facial recognition in different cities across Europe, e.g. in London and Berlin, showed low effectiveness and instead discrimination and high numbers of false positives. Due to its inefficacy, cities like San Francisco have already banned facial recognition or set up high barriers for its future implementation.

The best-known case in Germany is the trial that was conducted at the train station Berlin-Südkreuz in 2017. The outcome was so poor that the German Home Office had to ‘creatively interpret’ the results to spin them into a satisfactory outcome. The Chaos Computer Club revealed these ‘creative’ tactics and showed how high the false-positive results actually were. In their analysis, more than 600 travellers per day were falsely classified. The German Home Office had been planning to set up facial recognition at large train stations and airports, but extrapolating from the botched trial at Berlin-Südkreuz, it would only be a matter of time until everyone in Germany was mistaken for a criminal at least once.

At the central station in Mannheim, Germany, a current pilot project analyses and classifies people’s movements. One possible movement to classify as risky is “running”. To train the system, it is necessary to define when running is risky to make it operationalisable. However, only the bimodal classifications of “risky” or “not risky” can be entered into the system. It is questionable how such a crude classification system can be implemented reliably in our complex world. An evaluation of the project is not yet available, but it will be interesting to see its results.

Using AI profiling to detect potential criminals has only been beset by chronic failure. In the US, some systems have classified retired persons and people older than 100 years as potential criminals simply because they were active in gangs many decades prior. Meanwhile, in Germany, the above-mentioned, controversial Palantir program ‘Hessendata’ could not predict nor prevent the February 2020 antisemitic attack in Hanau, where ten people were killed.

Gaps to be filled

Overall, the main challenge in AI is that machines calculate probabilities and detect only correlations, not necessarily causality. Furthermore, they do not use scientific processes to obtain results. It is humans who play the most important role in AI, as they choose the purpose of the system, often choose the training database, and provide feedback to the system to direct its learning process. It is humans who have a high responsibility throughout the entire chain of development, application, and evaluation of such systems. AI systems in predictive policing will never solve social problems like crime so long as humans are not able to fully understand the complexity of crime, have access to all relevant data, and are able to operationalise them. Therefore, predictive policing may support policing in some cases, but it will never predict crime or substitute police officers. 

As AI systems in policing will also never be completely reliable, the question is: In what kind of society do we wish to live? Is it better to falsely classify some innocent people as suspects or rather to allow some suspects to fall through the cracks? This requires a broad debate before we implement predictive policing systems that make promises they cannot deliver on and violate fundamental rights. Today, there is no application proven to be more effective than reasoning by human police officers. Furthermore, predictive policing systems require huge financial investments and big databases with — to date — no evidence that it leads to increased public security.

To find out if AI can effectively support the reduction of crime, more research is necessary, as well as internal trials for special authority purposes, e.g. to detect misuses of access to databases or racist structures. These can be used to train competencies and reliabilities of systems. Using AI systems to improve internal processes can also build public trust in the practice of predictive policing.

We also need transparency on how systems are procured, how persons are trained on these systems, who makes decisions, and how these persons are qualified. Public monitoring is needed to make sure that the implementation of the systems will be stopped if relevant requirements are not met, if the evaluation shows that they discriminated against minorities or vulnerable populations, or if decisions cannot be reproduced or explained.