Judges rightly view sentencing as a weighty responsibility.1×1. See, e.g., Williams v. New York, 337 U.S. 241, 251 (1949) (describing the “grave responsibility” of sentencing). They must consider not only the appropriate punishment for the offense but also the risk the offender poses, predicting the probability of the offender’s recidivism.2×2. There are, of course, many considerations involved in sentencing. However, recidivism risk has become a focal point for sentencing courts and reformers. See, e.g., Conference of Chief Justices & Conference of State Court Adm’rs, Resolution 12: In Support of Sentencing Practices that Promote Public Safety and Reduce Recidivism, Nat’l Ctr. for St. Cts. (Aug. 1, 2007), http://www.ncsc.org/sitecore/content/microsites/csi/home/~/media/microsites/files/csi/education/handout%2031%20ccj%20resolution%2012.ashx [https://perma.cc/G3G5-S65M]. A potential solution to this judicial anxiety has arrived thanks to the bipartisan interest in criminal justice reform and the rise of big data.3×3. See, e.g., Anna Maria Barry-Jester et al., The New Science of Sentencing, Marshall Project (Aug. 4, 2015, 7:15 AM), https://www.themarshallproject.org/2015/08/04/the-new-science-of-sentencing [https://perma.cc/Z5SV-TRLG] (noting bipartisan support for data-driven reform); Marc Levin, The Conservative Case for Pretrial Justice Reform and Seven Solutions for Right-Sizing the Solution, Right on Crime (June 29, 2015), http://rightoncrime.com/2015/06/the-conservative-case-for-pretrial-justice-reform-and-seven-solutions-for-right-sizing-the-solution [https://perma.cc/4JC8-7SJ9]. Recidivism risk assessments are increasingly commonplace in presentencing investigation reports (PSIs),4×4. See Pamela M. Casey et al., Nat’l Ctr. for State Courts, Using Offender Risk and Needs Assessment Information at Sentencing: Guidance for Courts from a National Working Group 1, 33 (2011), http://www.ncsc.org/~/media/Microsites/Files/CSI/RNA%20Guide%20Final [https://perma.cc/WR5F-HP94]. the documents that typically provide background information on offenders to sentencing courts.5×5. See, e.g., United States v. Cesaitis, 506 F. Supp. 518, 520–21 (E.D. Mich. 1981); Timothy Bakken, The Continued Failure of Modern Law to Create Fairness and Efficiency: The Presentence Investigation Report and Its Effect on Justice, 40 N.Y.L. Sch. L. Rev. 363, 364 (1996). These assessments calculate the likelihood of an individual with the offender’s background committing another crime based on an evaluation of actuarial data.6×6. See Mitch Smith, In Wisconsin, a Backlash Against Using Data to Foretell Defendants’ Futures, N.Y. Times (June 22, 2016), http://www.nytimes.com/2016/06/23/us/backlash-in-wisconsin-against-using-data-to-foretell-defendants-futures.html [https://perma.cc/KPF5-59AB]. Recently, in State v. Loomis,7×7. 881 N.W.2d 749 (Wis. 2016). the Wisconsin Supreme Court held that a trial court’s use of an algorithmic risk assessment in sentencing did not violate the defendant’s due process rights even though the methodology used to produce the assessment was disclosed neither to the court nor to the defendant. While the Loomis court provided a procedural safeguard to alert judges to the dangers of these assessments — a “written advisement” to accompany PSIs — this prescription is an ineffective means of altering judges’ evaluations of risk assessments. The court’s “advisement” is unlikely to create meaningful judicial skepticism because it is silent on the strength of the criticisms of these assessments, it ignores judges’ inability to evaluate risk assessment tools, and it fails to consider the internal and external pressures on judges to use such assessments.
In early 2013, Wisconsin charged Eric Loomis with five criminal counts related to a drive-by shooting in La Crosse.8×8. Brief of Defendant-Appellant at 1–3, State v. Loomis, No. 2015AP157-CR (Wis. Ct. App. Sept. 17, 2015), 2015 WL 1724741, at *iii–2. Loomis denied participating in the shooting, but he admitted that he had driven the same car involved later that evening.9×9. Id. at 3. Loomis pleaded guilty to two of the less severe charges — “attempting to flee a traffic officer and operating a motor vehicle without the owner’s consent.”10×10. Loomis, 881 N.W.2d at 754. The remaining counts were “dismissed but read in,” or made available, for the court’s consideration at sentencing. Id. In preparation for sentencing, a Wisconsin Department of Corrections officer produced a PSI that included a COMPAS risk assessment.11×11. See id. COMPAS stands for “Correctional Offender Management Profiling for Alternative Sanctions.” Tim Brennan et al., Northpointe Inst. for Pub. Mgmt. Inc., Evaluating the Predictive Validity of the COMPAS Risk and Needs Assessment System, 36 Crim. Just. & Behav. 21, 21 (2009). It has been described as a “risk–need assessment system . . . that incorporates a range of theoretically relevant criminogenic factors and key factors emerging from meta-analytic studies of recidivism.” Id. COMPAS assessments estimate the risk of recidivism based on both an interview with the offender and information from the offender’s criminal history.12×12. Loomis, 881 N.W.2d at 754. COMPAS reports also provide a needs assessment focused on the possible rehabilitation of the offender. See id. As the methodology behind COMPAS is a trade secret, only the estimates of recidivism risk are reported to the court.13×13. Id. at 761. At Loomis’s sentencing hearing, the trial court referred to the COMPAS assessment in its sentencing determination and, based in part on this assessment, sentenced Loomis to six years of imprisonment and five years of extended supervision.14×14. Brief of Defendant-Appellant, supra note 8, at 9–11; see also Loomis, 881 N.W.2d at 755–56.
Loomis filed a motion for post-conviction relief in the trial court, arguing that the court’s reliance on COMPAS violated his due process rights.15×15. Loomis, 881 N.W.2d at 756. Because COMPAS reports provide data relevant only to particular groups and because the methodology used to make the reports is a trade secret, Loomis asserted that the court’s use of the COMPAS assessment infringed on both his right to an individualized sentence and his right to be sentenced on accurate information.16×16. Id. at 757. Loomis additionally argued on due process grounds that the court unconstitutionally considered gender at sentencing by relying on a risk assessment that took gender into account.17×17. Id. Loomis did not make an equal protection argument based on gender. Id. at 766. Loomis also argued that the trial court erroneously exercised its discretion in assuming the truth of the factual bases for the read-in charges. Id. at 757. The trial court denied the post-conviction motion, and the Wisconsin Court of Appeals certified the appeal to the Wisconsin Supreme Court.18×18. Id. at 757.
The Wisconsin Supreme Court affirmed. Writing for the court, Justice Ann Walsh Bradley19×19. Justice Bradley was joined by Justices Abrahamson, Prosser, Ziegler, Gableman, and Rebecca Bradley. rejected Loomis’s due process arguments.20×20. Loomis, 881 N.W.2d at 757. The court also rejected the claim related to the read-in charges. Id. at 772. Justice Bradley found that the use of gender as a factor in the risk assessment served the nondiscriminatory purpose of promoting accuracy and that Loomis had not provided sufficient evidence that the sentencing court had actually considered gender.21×21. Id. at 766–67. Moreover, as COMPAS uses only publicly available data and data provided by the defendant, the court concluded that Loomis could have denied or explained any information that went into making the report and therefore could have verified the accuracy of the information used in sentencing.22×22. Id. at 761–62. Regarding individualization, Justice Bradley stressed the importance of individualized sentencing and admitted that COMPAS provides only aggregate data on recidivism risk for groups similar to the offender.23×23. Id. at 764. But she explained that as the report would not be the sole basis for a decision, sentencing that considers a COMPAS assessment would still be sufficiently individualized because courts have the discretion and information necessary to disagree with the assessment when appropriate.24×24. Id. at 764–65.
However, Justice Bradley added that judges must proceed with caution when using such risk assessments.25×25. Id. at 765. To ensure that judges weigh risk assessments appropriately, the court prescribed both how these assessments must be presented to trial courts and the extent to which judges may use them.26×26. See id. at 763–65. The court explained that risk scores may not be used “to determine whether an offender is incarcerated” or “to determine the severity of the sentence.”27×27. Id. at 769. Therefore, judges using risk assessments must explain the factors other than the assessment that support the sentence imposed.28×28. Id. Furthermore, PSIs that incorporate a COMPAS assessment must include five written warnings for judges: first, the “proprietary nature of COMPAS” prevents the disclosure of how risk scores are calculated; second, COMPAS scores are unable to identify specific high-risk individuals because these scores rely on group data; third, although COMPAS relies on a national data sample, there has been “no cross-validation study for a Wisconsin population”;29×29. Id. Studies used to validate COMPAS typically assess the predictive model’s accuracy and usually entail testing the model on a data set not used in the original estimation, such as a local population. See, e.g., David Farabee et al., Cal. Dep’t of Corr. & Rehab., COMPAS Validation Study: Final Report 3–4 (2010), http://www.cdcr.ca.gov/adult_research_branch/Research_Documents/COMPAS_Final_report_08-11-10.pdf [https://perma.cc/G4PK-TPDR]. California’s validation study found that in COMPAS assessments for the State, “the general recidivism risk scale achieved . . . the conventional standard [for acceptability], though the violence risk scale did not.” Id. at 29. fourth, studies “have raised questions about whether [COMPAS scores] disproportionately classify minority offenders as having a higher risk of recidivism”;30×30. Loomis, 881 N.W.2d at 769. and fifth, COMPAS was developed specifically to assist the Department of Corrections in making post-sentencing determinations.31×31. Id. at 769–70. Justice Bradley stated that the advisement should be updated as new information becomes available. Id. at 770. In issuing these warnings, the court made clear its desire to instill both general skepticism about the tool’s accuracy and a more targeted skepticism with regard to the tool’s assessment of risks posed by minority offenders.
Justice Abrahamson concurred.32×32. Id. at 774 (Abrahamson, J., concurring). Chief Justice Roggensack also filed a concurring opinion. Id. at 772 (Roggensack, C.J., concurring). She emphasized that though the certified question before the court was whether sentencing courts could rely on COMPAS assessments, the question that the court answered in the affirmative was whether sentencing courts could consider the assessments. Id. at 774. While she agreed with the judgment, she was concerned that the court had difficulty understanding algorithmic risk assessments.33×33. Id. at 774 (Abrahamson, J., concurring). In particular, she criticized the court’s decision to deny Northpointe, the company that developed COMPAS, the opportunity to file an amicus brief.34×34. Id. She would have required a more extensive record from sentencing courts on “the strengths, weaknesses, and relevance to the individualized sentence being rendered of the evidence-based tool.”35×35. Id. This explanation, she argued, was necessary in light of the criticism that such assessments have drawn from both public officials and scholars.36×36. Id. at 774–75.
The Loomis court’s opinion suggests an attempt to temper the current enthusiasm for algorithmic risk assessments in sentencing.37×37. This contrasts, for example, with the Indiana Supreme Court’s enthusiasm for risk assessment tools. See Malenchik v. State, 928 N.E.2d 564, 574–75 (Ind. 2010). But the written disclaimer the court requires for PSIs including such assessments is unlikely to “enable courts to better assess the accuracy of the assessment and the appropriate weight to be given to the risk score.”38×38. Loomis, 881 N.W.2d at 764. In failing to specify the vigor of the criticisms of COMPAS, disregarding the lack of information available to judges, and overlooking the external and internal pressures to use such assessments, the court’s solution is unlikely to create the desired judicial skepticism.
First, encouraging judicial skepticism of the value of risk assessments alone does little to tell judges how much to discount these assessments. Although the advisement explains that studies “have raised questions about whether [these assessments] disproportionately classify minority offenders as having a higher risk of recidivism,” the force of these criticisms is not mentioned nor the actual studies named.39×39. Id. at 769. Indeed, the criticisms of algorithmic risk assessments are far greater than simply “questions,” and critics have voiced particular wariness of the technology’s use in the criminal law context as purported advancements may reinforce existing inequalities.40×40. See, e.g., Eric Holder, Att’y Gen., U.S. Dep’t of Justice, Address at the National Association of Criminal Defense Lawyers 57th Annual Meeting and 13th State Criminal Justice Network Conference (Aug. 1, 2014), https://www.justice.gov/opa/speech/attorney-general-eric-holder-speaks-national-association-criminal-defense-lawyers-57th [https://perma.cc/6772-W8VD] (cautioning that recidivism risk assessments “may exacerbate unwarranted and unjust disparities”). Scholars warn that these assessments often disguise “overt discrimination based on demographics and socioeconomic status.”41×41. Sonja B. Starr, Evidence-Based Sentencing and the Scientific Rationalization of Discrimination, 66 Stan. L. Rev. 803, 806 (2014). Independent testing of the assessment tool used in Loomis’s sentencing showed that offenders of color were more likely to receive higher risk ratings than were white offenders.42×42. See Julia Angwin et al., Machine Bias, ProPublica (May 23, 2016), https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing [https://perma.cc/ZWX5-6BZP]; see also Cecelia Klingele, The Promises and Perils of Evidence-Based Corrections, 91 Notre Dame L. Rev. 537, 577 (2015). But see Sam Corbett-Davies et al., A Computer Program Used for Bail and Sentencing Decisions Was Labeled Biased Against Blacks. It’s Actually Not That Clear, Wash. Post (Oct. 17, 2016), https://www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas [https://perma.cc/2WAM-7QPM]. Black defendants who did not go on to commit future crimes were falsely labeled as future criminals at nearly twice the rate of white defendants.43×43. Angwin et al., supra note 42. Other studies have raised broader concerns about these assessments’ efficacy.44×44. See, e.g., Jennifer L. Skeem & Jennifer Eno Louden, Assessment of Evidence on the Quality of the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) 28 (2007), http://www.cdcr.ca.gov/adult_research_branch/Research_Documents/COMPAS_Skeem_EnoLouden_Dec_2007.pdf [https://perma.cc/PYQ6-WD4V] (explaining that “there is little evidence that the COMPAS predicts recidivism” and “there is no evidence that the COMPAS assesses risk state, or change over time in criminogenic needs”). Of course, one could worry that judges might discount these assessments too much — thereby not identifying low-risk offenders. But without information on the strength of these critiques, judges will not be able to calibrate their interpretations of COMPAS at all.
Similarly, even should these warnings increase skepticism, judges likely lack the requisite information necessary to modulate their consideration of the tool. The methodology behind COMPAS, like many such assessments, is a trade secret.45×45. See Loomis, 881 N.W.2d at 761. Accordingly, courts cannot evaluate “how the risk scores are determined or how the factors are weighed.”46×46. Id. Further, there are currently few studies analyzing these assessments, in part because the lack of methodological transparency from data assessment companies has made study difficult.47×47. See Angwin et al., supra note 42; see also Sarah L. Desmarais & Jay P. Singh, Risk Assessment Instruments Validated and Implemented in Correctional Settings in the United States 2 (2013), https://csgjusticecenter.org/wp-content/uploads/2014/07/Risk-Assessment-Instruments-Validated-and-Implemented-in-Correctional-Settings-in-the-United-States.pdf [https://perma.cc/YLT9-4X9P]. While the Loomis court asserted that only those jurisdictions that “have the capacity for maintaining [risk assessment] tools and monitoring their continued accuracy” should use these tools,48×48. Loomis, 881 N.W.2d at 763. the court largely disregarded the problem that the current accuracy of these assessments is subject to debate.49×49. Few validation studies of COMPAS exist, and those that do often report mixed results. For example, while a 2010 study determined that “COMPAS is a reliable instrument,” it also cautioned that COMPAS is not perfect. Farabee et al., supra note 29, at 29. The larger problem, however, is simply that most judges are unlikely to understand algorithmic risk assessments. As Justice Abrahamson’s concurrence noted, despite briefing and lengthy oral arguments, the “lack of understanding of COMPAS was a significant problem” and “[t]he court needed all the help it could get.”50×50. Loomis, 881 N.W.2d at 774 (Abrahamson, J., concurring). This admission shows the court was mistaken to think that as long as judges are informed about COMPAS’s potential inaccuracy, they can discount appropriately.
Additionally, the warning will likely be ineffectual in changing the way judges think about risk assessments given the pressure within the judicial system to use these assessments as well as the cognitive biases supporting data reliance. The widespread endorsement of these sentencing tools communicates to judges that the tools are, in fact, reliable. The Model Penal Code endorses evidence-based sentencing,51×51. Model Penal Code: Sentencing § 6B.09 (Am. Law Inst., Tentative Draft No. 2, 2011). and advocates in academia52×52. See, e.g., Lynn S. Branham, Follow the Leader: The Advisability and Propriety of Considering Cost and Recidivism Data at Sentencing, 24 Fed. Sent’g Rep. 169, 169 (2012). and the judiciary53×53. See, e.g., William Ray Price, Jr., Chief Justice, Supreme Court of Mo., State of the Judiciary Address (Feb. 3, 2010), http://www.courts.mo.gov/page.jsp?id=36875 [https://perma.cc/AJX4-WK8E]. have encouraged use of algorithmic risk assessments. Many states are seriously considering the implementation of recidivism-risk data in sentencing.54×54. Cf. Douglas A. Berman, Are Costs a Unique (and Uniquely Problematic) Kind of Sentencing Data?, 24 Fed. Sent’g Rep. 159, 160 (2012) (“In some form, nearly every state in the nation has adopted, or at least been seriously considering how to incorporate, evidence-based research . . . into their sentencing policies and practices.”). Indeed, some states already require the use of risk assessment tools in sentencing proceedings.55×55. See, e.g., Ky. Rev. Stat. Ann. § 533.010(2) (LexisNexis 2014); Tenn. Code Ann. § 41-1-412(b) (2014); Vt. Stat. Ann. tit. 28, § 204a(b)(1) (2009); Wash. Rev. Code § 9.94A.500(1) (2016); cf. Ala. Code § 12-25-33(6) (2012); 42 Pa. Stat. and Cons. Stat. Ann. § 2154.5(a)(6) (West Supp. 2016). Beyond this external pressure, there are psychological biases encouraging the use of risk assessment tools. Individuals tend to weigh purportedly expert empirical assessments more heavily than nonempirical evidence56×56. See Stephen A. Fennell & William N. Hall, Due Process at Sentencing: An Empirical and Legal Analysis of the Disclosure of Presentence Reports in Federal Courts, 93 Harv. L. Rev. 1613, 1668–70 (1980). — which might create a bias in favor of COMPAS assessments over an offender’s own narrative. Research suggests that it is challenging and unusual for individuals to defy algorithmic recommendations.57×57. See Angèle Christin et al., Courts and Predictive Algorithms 8 (Oct. 27, 2015), http://www.datacivilrights.org/pubs/2015-1027/Courts_and_Predictive_Algorithms.pdf [https://perma.cc/5MBA-QPMQ]. Behavioral economists use the term “anchoring” to describe the common phenomenon in which individuals draw upon an available piece of evidence — no matter its weakness — when making subsequent decisions.58×58. See Thomas Mussweiler & Fritz Strack, Numeric Judgments Under Uncertainty: The Role of Knowledge in Anchoring, 36 J. Experimental Soc. Psychol. 495, 495 (2000); Amos Tversky & Daniel Kahneman, Judgment Under Uncertainty: Heuristics and Biases, 185 Science 1124, 1128–30 (1974). A judge presented with an assessment that shows a higher recidivism risk than predicted may increase the sentence without realizing that “anchoring” has played a role in the judgment. While warnings may alert judges to the shortcomings of these tools, the advisement may still fail to negate the considerable external and internal pressures of a system urging use of quantitative assessments.
Ultimately, the problem with the Loomis court’s advisement solution is that it favors the quantity of information provided to sentencing courts over the quality of that information. While judges have often considered recidivism risk,59×59. See Christin et al., supra note 57, at 8. historically this assessment required reliance on a judge’s “intuition, instinct and sense of justice,” which could result in a “more severe sentence” based on an “unspoken clinical prediction.”60×60. Jordan M. Hyatt et al., Reform in Motion: The Promise and Perils of Incorporating Risk Assessments and Cost-Benefit Analysis into Pennsylvania Sentencing, 49 Duq. L. Rev. 707, 725 (2011); see id. at 724–25. The judicial system has frequently lamented the lack of objective measures available in making individualized sentences in criminal cases.61×61. Cf. Melissa Hamilton, Adventures in Risk: Predicting Violent and Sexual Recidivism in Sentencing Law, 47 Ariz. St. L.J. 1, 6 (2015) (noting that sentencing based on nonempirical factors may sometimes be viewed as “idiosyncratic, biased, and unreliable”). Proponents of assessments argue that these evaluations make sentencing more transparent62×62. See Hyatt et al., supra note 60, at 725; Jennifer Skeem, Risk Technology in Sentencing: Testing the Promises and Perils (Commentary on Hannah-Moffat, 2011), 30 Just. Q. 297, 300 (2013). and rational.63×63. Cf. Kirk Heilbrun, Risk Assessment in Evidence-Based Sentencing: Context and Promising Uses, 1 Chap. J. Crim. Just. 127, 127 (2009) (describing risk assessment tools as “an exciting development in the application of empirical scientific evidence to legal decision-making”). But the history of using new technological innovations in law has not always been a happy one,64×64. See, e.g., Buck v. Bell, 274 U.S. 200, 207 (1927) (upholding the use of compulsory sterilization of the “unfit,” including the intellectually disabled). and the research into COMPAS and similar assessments suggests that the same could be true here. The Loomis opinion, then, failed to answer why, given the risks, courts should still use such assessments.
Therefore, as “words yield to numbers” in sentencing,65×65. Hamilton, supra note 61, at 13. the judiciary should use considerable caution in assessing the qualitative value of these new technologies. The Wisconsin Supreme Court’s mandated PSI advisement is a critical — though likely ineffectual — acknowledgement of the potential problems of algorithmic risk assessments. Since the overwhelming majority of defendants plead guilty, the PSI provides the bulk of a judge’s material in a criminal case; thus, the information provided in a PSI is vitally important.66×66. Fennell & Hall, supra note 56, at 1627. But the court’s required advisement suggests that judges should be a bias check on a tool itself designed to correct judges’ biases. In this troubling state of affairs, the advisement fails to provide the information required for judges to properly play this role. Stricter measures — such as excluding risk assessments that keep their methodology secret or reining in their use until more studies are available — would be a more appropriate way to counter the downsides of these assessments.