Judges rightly view sentencing as a weighty responsibility.1 They must consider not only the appropriate punishment for the offense but also the risk the offender poses, predicting the probability of the offender’s recidivism.2 A potential solution to this judicial anxiety has arrived thanks to the bipartisan interest in criminal justice reform and the rise of big data.3 Recidivism risk assessments are increasingly commonplace in presentencing investigation reports (PSIs),4 the documents that typically provide background information on offenders to sentencing courts.5 These assessments calculate the likelihood of an individual with the offender’s background committing another crime based on an evaluation of actuarial data.6 Recently, in State v. Loomis,7 the Wisconsin Supreme Court held that a trial court’s use of an algorithmic risk assessment in sentencing did not violate the defendant’s due process rights even though the methodology used to produce the assessment was disclosed neither to the court nor to the defendant. While the Loomis court provided a procedural safeguard to alert judges to the dangers of these assessments — a “written advisement” to accompany PSIs — this prescription is an ineffective means of altering judges’ evaluations of risk assessments. The court’s “advisement” is unlikely to create meaningful judicial skepticism because it is silent on the strength of the criticisms of these assessments, it ignores judges’ inability to evaluate risk assessment tools, and it fails to consider the internal and external pressures on judges to use such assessments.
In early 2013, Wisconsin charged Eric Loomis with five criminal counts related to a drive-by shooting in La Crosse.8 Loomis denied participating in the shooting, but he admitted that he had driven the same car involved later that evening.9 Loomis pleaded guilty to two of the less severe charges — “attempting to flee a traffic officer and operating a motor vehicle without the owner’s consent.”10 In preparation for sentencing, a Wisconsin Department of Corrections officer produced a PSI that included a COMPAS risk assessment.11 COMPAS assessments estimate the risk of recidivism based on both an interview with the offender and information from the offender’s criminal history.12 As the methodology behind COMPAS is a trade secret, only the estimates of recidivism risk are reported to the court.13 At Loomis’s sentencing hearing, the trial court referred to the COMPAS assessment in its sentencing determination and, based in part on this assessment, sentenced Loomis to six years of imprisonment and five years of extended supervision.14
Loomis filed a motion for post-conviction relief in the trial court, arguing that the court’s reliance on COMPAS violated his due process rights.15 Because COMPAS reports provide data relevant only to particular groups and because the methodology used to make the reports is a trade secret, Loomis asserted that the court’s use of the COMPAS assessment infringed on both his right to an individualized sentence and his right to be sentenced on accurate information.16 Loomis additionally argued on due process grounds that the court unconstitutionally considered gender at sentencing by relying on a risk assessment that took gender into account.17 The trial court denied the post-conviction motion, and the Wisconsin Court of Appeals certified the appeal to the Wisconsin Supreme Court.18
The Wisconsin Supreme Court affirmed. Writing for the court, Justice Ann Walsh Bradley19 rejected Loomis’s due process arguments.20 Justice Bradley found that the use of gender as a factor in the risk assessment served the nondiscriminatory purpose of promoting accuracy and that Loomis had not provided sufficient evidence that the sentencing court had actually considered gender.21 Moreover, as COMPAS uses only publicly available data and data provided by the defendant, the court concluded that Loomis could have denied or explained any information that went into making the report and therefore could have verified the accuracy of the information used in sentencing.22 Regarding individualization, Justice Bradley stressed the importance of individualized sentencing and admitted that COMPAS provides only aggregate data on recidivism risk for groups similar to the offender.23 But she explained that as the report would not be the sole basis for a decision, sentencing that considers a COMPAS assessment would still be sufficiently individualized because courts have the discretion and information necessary to disagree with the assessment when appropriate.24
However, Justice Bradley added that judges must proceed with caution when using such risk assessments.25 To ensure that judges weigh risk assessments appropriately, the court prescribed both how these assessments must be presented to trial courts and the extent to which judges may use them.26 The court explained that risk scores may not be used “to determine whether an offender is incarcerated” or “to determine the severity of the sentence.”27 Therefore, judges using risk assessments must explain the factors other than the assessment that support the sentence imposed.28 Furthermore, PSIs that incorporate a COMPAS assessment must include five written warnings for judges: first, the “proprietary nature of COMPAS” prevents the disclosure of how risk scores are calculated; second, COMPAS scores are unable to identify specific high-risk individuals because these scores rely on group data; third, although COMPAS relies on a national data sample, there has been “no cross-validation study for a Wisconsin population”;29 fourth, studies “have raised questions about whether [COMPAS scores] disproportionately classify minority offenders as having a higher risk of recidivism”;30 and fifth, COMPAS was developed specifically to assist the Department of Corrections in making post-sentencing determinations.31 In issuing these warnings, the court made clear its desire to instill both general skepticism about the tool’s accuracy and a more targeted skepticism with regard to the tool’s assessment of risks posed by minority offenders.
Justice Abrahamson concurred.32 While she agreed with the judgment, she was concerned that the court had difficulty understanding algorithmic risk assessments.33 In particular, she criticized the court’s decision to deny Northpointe, the company that developed COMPAS, the opportunity to file an amicus brief.34 She would have required a more extensive record from sentencing courts on “the strengths, weaknesses, and relevance to the individualized sentence being rendered of the evidence-based tool.”35 This explanation, she argued, was necessary in light of the criticism that such assessments have drawn from both public officials and scholars.36
The Loomis court’s opinion suggests an attempt to temper the current enthusiasm for algorithmic risk assessments in sentencing.37 But the written disclaimer the court requires for PSIs including such assessments is unlikely to “enable courts to better assess the accuracy of the assessment and the appropriate weight to be given to the risk score.”38 In failing to specify the vigor of the criticisms of COMPAS, disregarding the lack of information available to judges, and overlooking the external and internal pressures to use such assessments, the court’s solution is unlikely to create the desired judicial skepticism.
First, encouraging judicial skepticism of the value of risk assessments alone does little to tell judges how much to discount these assessments. Although the advisement explains that studies “have raised questions about whether [these assessments] disproportionately classify minority offenders as having a higher risk of recidivism,” the force of these criticisms is not mentioned nor the actual studies named.39 Indeed, the criticisms of algorithmic risk assessments are far greater than simply “questions,” and critics have voiced particular wariness of the technology’s use in the criminal law context as purported advancements may reinforce existing inequalities.40 Scholars warn that these assessments often disguise “overt discrimination based on demographics and socioeconomic status.”41 Independent testing of the assessment tool used in Loomis’s sentencing showed that offenders of color were more likely to receive higher risk ratings than were white offenders.42 Black defendants who did not go on to commit future crimes were falsely labeled as future criminals at nearly twice the rate of white defendants.43 Other studies have raised broader concerns about these assessments’ efficacy.44 Of course, one could worry that judges might discount these assessments too much — thereby not identifying low-risk offenders. But without information on the strength of these critiques, judges will not be able to calibrate their interpretations of COMPAS at all.
Similarly, even should these warnings increase skepticism, judges likely lack the requisite information necessary to modulate their consideration of the tool. The methodology behind COMPAS, like many such assessments, is a trade secret.45 Accordingly, courts cannot evaluate “how the risk scores are determined or how the factors are weighed.”46 Further, there are currently few studies analyzing these assessments, in part because the lack of methodological transparency from data assessment companies has made study difficult.47 While the Loomis court asserted that only those jurisdictions that “have the capacity for maintaining [risk assessment] tools and monitoring their continued accuracy” should use these tools,48 the court largely disregarded the problem that the current accuracy of these assessments is subject to debate.49 The larger problem, however, is simply that most judges are unlikely to understand algorithmic risk assessments. As Justice Abrahamson’s concurrence noted, despite briefing and lengthy oral arguments, the “lack of understanding of COMPAS was a significant problem” and “[t]he court needed all the help it could get.”50 This admission shows the court was mistaken to think that as long as judges are informed about COMPAS’s potential inaccuracy, they can discount appropriately.
Additionally, the warning will likely be ineffectual in changing the way judges think about risk assessments given the pressure within the judicial system to use these assessments as well as the cognitive biases supporting data reliance. The widespread endorsement of these sentencing tools communicates to judges that the tools are, in fact, reliable. The Model Penal Code endorses evidence-based sentencing,51 and advocates in academia52 and the judiciary53 have encouraged use of algorithmic risk assessments. Many states are seriously considering the implementation of recidivism-risk data in sentencing.54 Indeed, some states already require the use of risk assessment tools in sentencing proceedings.55 Beyond this external pressure, there are psychological biases encouraging the use of risk assessment tools. Individuals tend to weigh purportedly expert empirical assessments more heavily than nonempirical evidence56 — which might create a bias in favor of COMPAS assessments over an offender’s own narrative. Research suggests that it is challenging and unusual for individuals to defy algorithmic recommendations.57 Behavioral economists use the term “anchoring” to describe the common phenomenon in which individuals draw upon an available piece of evidence — no matter its weakness — when making subsequent decisions.58 A judge presented with an assessment that shows a higher recidivism risk than predicted may increase the sentence without realizing that “anchoring” has played a role in the judgment. While warnings may alert judges to the shortcomings of these tools, the advisement may still fail to negate the considerable external and internal pressures of a system urging use of quantitative assessments.
Ultimately, the problem with the Loomis court’s advisement solution is that it favors the quantity of information provided to sentencing courts over the quality of that information. While judges have often considered recidivism risk,59 historically this assessment required reliance on a judge’s “intuition, instinct and sense of justice,” which could result in a “more severe sentence[]” based on an “unspoken clinical prediction.”60 The judicial system has frequently lamented the lack of objective measures available in making individualized sentences in criminal cases.61 Proponents of assessments argue that these evaluations make sentencing more transparent62 and rational.63 But the history of using new technological innovations in law has not always been a happy one,64 and the research into COMPAS and similar assessments suggests that the same could be true here. The Loomis opinion, then, failed to answer why, given the risks, courts should still use such assessments.
Therefore, as “words yield to numbers” in sentencing,65 the judiciary should use considerable caution in assessing the qualitative value of these new technologies. The Wisconsin Supreme Court’s mandated PSI advisement is a critical — though likely ineffectual — acknowledgement of the potential problems of algorithmic risk assessments. Since the overwhelming majority of defendants plead guilty, the PSI provides the bulk of a judge’s material in a criminal case; thus, the information provided in a PSI is vitally important.66 But the court’s required advisement suggests that judges should be a bias check on a tool itself designed to correct judges’ biases. In this troubling state of affairs, the advisement fails to provide the information required for judges to properly play this role. Stricter measures — such as excluding risk assessments that keep their methodology secret or reining in their use until more studies are available — would be a more appropriate way to counter the downsides of these assessments.