Fresh, fair, and smart: data reliability in predictive policing

Discussion Prompt: When, if ever, is predictive policing effective, fair, and legitimate? What is the role of data reliability in this?

See all contributions to this question.

As budgets have been slashed, police departments in the UK and worldwide increasingly use algorithmic analysis tools to augment their intelligence capabilities. Whether the use of this technology is timely or tragic, however, rests largely with the reliability of the data being analysed. If data-driven policing is to become a fair and ethical reality, we must determine what a code of practice for data analytics in criminal justice should look like.

Algorithms and artificial intelligence are changing the way we see advertising, how we learn in our universities, and even how we one day might conceive of the sentience required to afford legal rights to machines. In the UK at least, government is not only increasingly (and problematically) online but is becoming algorithmically-augmented. Machine learning, or AI, often empowers controversial tech such as facial recognition systems — the subject of much recent challenge and opposition. Facial recognition tech is just one example of the way human rights issues around privacy and fairness result from under-regulation by a state in relation to emerging data-driven governance. Some jurisdictions will perhaps decide to ban the use of facial recognition software in street policing, due to the invasion of privacy that powerful live facial recognition technology poses in relation to the use of communal public space. Meanwhile some jurisdictions, like the UK, may well see their own courts hesitantly give permission — although not carte blanche — for the use of facial recognition by criminal justice agencies if a sufficiently clear and rigorous set of rules for its use are put in place. Facial recognition systems are not the only controversial AI or machine learning application in town, however. The question of how to regulate broader police data analytics practices is also pressing.

A checklist of ethics

The UK police service is currently being urged by the National Police Chiefs’ Council (NPCC) to use a framework known as ALGO-CARE, which is a regulatory ‘checklist’ of crucial ethical and legal considerations for the development and deployment of algorithmic police intelligence analysis tools. ALGO-CARE was developed and published by a quartet of researchers from different backgrounds — meaning that its approach is inherently interdisciplinary, covering key points from the perspective of a senior police leader (Sheena Urwin, Durham Constabulary), a data scientist (Dr Geoffrey Barnes, University of Cambridge), and two academic lawyers (Marion Oswald, now based at Northumbria University, and myself).

The costs of cost savings

Standards that jurisdictions ultimately adopt over the uses of data analytics will undoubtedly have a relationship with the degree of respect that a state has for human rights in general. But there is a trend of growing concern globally over the way law enforcement bodies seek to use AI data analytics in order to augment their intelligence capabilities. In the UK, investment in this sort of tech is being backed by the Chief Inspector of Constabulary in his most recent annual report to the Home Secretary. It is argued there are cost savings in the use of predictive analytics based on ‘big data’. This is at a time when the police in the UK, for one, are being put under great pressure — and some would argue even overwhelmed — because of funding reductions by government in the last several years.

The lab vs. the real world

However, of particular concern in the here and now is the use of predictive profiling tools that score an individual as to their likely involvement in serious or organised crime, including in violent gangs. Databases like the Gangs Matrix operated by the Metropolitan Police in London (and similar tools used in cities like Los Angeles) list a disproportionate number of young, non-white males, and suffer from data that is stale, patchy, and difficult to review and weed out. In the UK, the Information Commissioner’s Office took an important step in issuing an enforcement notice against the Met in relation to the ‘function creep’, lack of transparency, and inaccuracy of its Gangs Matrix. The office followed this up by providing police forces with guidelines on the lawful use of gangs databases. In California, the LAPD has seen powerful public criticism of their use of data-driven patrolling software PredPol, and the LASER suspect identification programme. A theme seems to be that even if machine learning technology works in a scientifically sound way in a ‘data lab’, when the tool concerned is actually put into practice by a police force, there can be problems with the consistency of its use, as well as fears that the tool will only re-entrench biases and discrimination. To paraphrase some influential American researchers, dirty data makes for bad predictions.

Fresh, fair, and smart

To me, three key factors that the regulation of predictive policing in any jurisdiction cannot fail to address are all intrinsically linked to the issue of data reliability. These are: the standards set as to ‘data currency’ (i.e. how fresh is it?); the ability of the public to address unfairness in the information used about them, that is to say, the concept of ‘data challenge’ (i.e. how fair is it?), and the principle that the predictive worth of a point of data about a person must be statistically proven for its inclusion in a model (i.e. how smart is it?). These three points of concern are interlinked, and careful public policy considerations must be deliberated by police leaders if, for example, a predictive tool uses older data about offenders, because this boosts predictive accuracy, since the concepts of data currency and data challenge are related in some ways. I would argue that members of the public will be less likely to become upset, challenging the legitimacy of the tool in a societal sense, if it makes predictions using fresher data, as long as the overall accuracy of a tool does not dip too low when older data is removed from the process of (re)building an analytics tool from datasets.

Timely or tragic?

Speaking personally, it has been a pleasure to participate in a form of fresh accountability for the use of algorithmic profiling by a large police force in the UK: specifically as a member of the independent data analytics ethics committee established by West Midlands Police (WMP) and their local elected police and crime commissioner. This committee was founded to guide the development of the force’s Data Lab, and more recently the development of a National Data Analytics Solution (NDAS) for the UK police service (since this work is also overseen by WMP). In time, the NDAS project may evolve to mean the profiling of individuals according to their risk presented to society in terms of a propensity to involvement in knife crime, or the conduct of modern slavery practices, all by way of an algorithmic assessment of police intelligence data. This approach is something that will, as a result, become increasingly widespread and eventually entrenched in UK policing practices. This shift in the form of a predictive emphasis in police policy approaches will without doubt be transformational — but with public commentary split as to whether such advances in data-driven policing are timely or tragic.

By using police intelligence data that forces already possess, in its early work on NDAS we can be pleased that WMP are not conducting the kind of social profiling and citizen scoring that takes place in China, but there are still concerns about a risk of undue unfairness being ‘baked in’ due to lower accuracy rates. You can see for yourself the degree of challenge our ethics committee is offering to the NDAS project in our December 2019 minutes – it is important to note that the progress of WMP analytics projects and the scrutiny of the committee is placed into the public domain. What is also encouraging is that West Midlands Police have continued incorporating ALGO-CARE into their own internal development processes in relation to new algorithmic analysis tools, in the absence of a national code of practice for UK police uses of data analytics, machine learning and AI. Essex Police have since planned to follow suit in their work within their partnership-based Essex Centre for Data Analytics.

In the UK, the wider picture entails the government-backed Centre for Data Ethics and Innovation being partnered with the Royal United Services Institute to research what a code of practice for data analytics in criminal justice should look like. This work is important because the range of data analytics techniques used in policing in the UK is growing, and will continue to be used to tackle vital issues of resource pressures through to concerns around the process of digital forensics work. If police forces are going to use algorithms to, for instance, advise them on when crimes might be unsolvable — so that officers can be diverted away from certain investigations — or to rigorously search for digital evidence on a rape complainant’s mobile phone, we had better have a very good idea of how machine learning in the justice system is regulated and controlled.

Fresh, fair, and smart: data reliability in predictive policing

Add comment

Other articles

Legal and Oversight Gaps in Germany’s Military Intelligence

Update on the implications of the Dutch Temporary Cyber Operations Act

Why and how German judicial intelligence oversight needs empowerment

The Pegasus scandal in Poland – between old problems with state surveillance and the current rule of law crisis

Finnish intelligence overseers’ right of access supersedes Originator Control