Predictive Policing Isn’t an Exact Science, and That’s the Problem

Photo: Erin Hooley/Chicago Tribune/TNS via Getty Images

If the idea of cops using data to predict who’s going to be involved in crime makes you wince, that’s understandable; after all, Netflix using algorithms to predict what dramedy you should watch after Stranger Things is a lot less high-stakes than arrests or murders. Though they have a robotic sheen, algorithms are not without bias; earlier this year, ProPublica did a massive investigation into the data services that courts use to risk-assess future criminals, and it was shown to be heartbreakingly racist. In lab settings, using high-end, machine-learning-based technology in police work — “predictive policing,” as it’s called — has been shown to be effective, but the research on field applications is messy. One study on the effectiveness of location-based predictive policing (co-authored by the developers of predictive policing software) found that it’s more effective than traditional approaches, while another found that analytics-oriented approaches didn’t outperform conventional strategies.

Starting in 2013, the Chicago Police Department has had one such data-centric approach, called the Strategic Subjects List (SSL), meant to track the individuals at highest risk for gun violence. The technology behind it, developed by the Illinois Institute of Technology, was inspired by the work of Yale sociologist Andrew Papachristos, who found that between 2006 and 2012, 70 percent of nonfatal gunshot victims came from a network within less than 6 percent of Chicago’s overall population. (Gunshot victimization is a good index of gun violence, since the victim of a gunshot injury could have likely made the first shot.) Like sexually transmitted infections, he told me, gun violence travels along networks. With that theoretical underpinning, the first iteration of the SSL was a list of 426 people at the highest risk of becoming “party to violence.” Additionally, the New York Times reported that other variables fell along the lines of questions like “Have you been shot before? Is your ‘trend line’ for crimes increasing or decreasing? Do you have an arrest for weapons?” Put together, the idea was to predict who in Chicago was going to shoot or get shot.

But according to a new study in the Journal of Experimental Criminology, “being on the SSL did not significantly reduce the likelihood of being a murder or shooting victim, or being arrested for murder.” According to lead author and RAND criminologist Jessica Saunders and her colleagues, people on the SSL were 2.88 times more likely to be arrested for a shooting than the control group, and they were 39 percent more likely to have an interaction — be it arrest, victimization, or contact — with the Chicago PD than the control group. According to the researchers’ analysis, the pilot version of the algorithm identified under one percent of homicide victims for that year. In an interview with Science of Us, Saunders described SSL as having a “null effect,” one that “had no impact on overall public safety.” Notably, the algorithm has reportedly improved since its first iteration: Referencing a paper presented at the International Association of Chiefs of Police, Saunders and her colleagues relay that at present, a much larger 29 percent of the top 400 subjects on the SLL “were accurately predicted to be involved in gun violence over an 18-month window.”

Why the SSL has a “null effect” in its first iteration is hard to say. It could be problems with the how those predictions are applied in the actual doing of police work: Saunders and her colleagues report that the algorithms didn’t draw a line between individuals that were “high-threat” (as in “being violent threats to their local community and persons”) and “high-risk” (as in nonviolent, and likely to be a victim based on whom they hung out with or “lifestyle attributes” like “substance use disorders and gambling”), which was problematic in the field. Additionally, while the SSL list was distributed to district commanders across Chicago, the commanders were given “wide discretion” as how to apply the data in on-the-ground police work. That ambiguity may have presented an obstacle: If you’re going to create a list of high-risk individuals, Saunders says, then you need to know what the intervention will be as well, and it should be just as thought out as the predictions themselves.

Saunders says it’s “tough to comment” on whether the SLL analytic model is “racist,” as described by Boing Boing blogger Cory Doctorow. The first iteration of the model, Saunders says, was built around the same network logic that Papachristos, the Yale sociologist, used to find that 70 percent of Chicago gunshot victims belonged to the same 6 percent of the city’s population: people linked by being arrested together. “Race does influence arrest,” she says. “Inputs in the model have gone through that selection process. Race is not explicit in the model, but other attributes related to race are a part of the model. That’s the most objective way that I can answer that question. It’s a very valid concern.”

Another concern is the inaccuracy. Saunders said that if the intervention the SSL used were social services, she’d be more comfortable with an inaccurate model. But with arrests, it’s way more high-stakes.

Predictive Policing Isn’t an Exact Science