Courts have long sought to find the ordinary meaning of words and phrases in statutes,1 enlisting a variety of tools, such as dictionaries,2 canons of interpretation,3 and the common sense of an English-language speaker.4 Recently, in State v. Rasabout,5 the Supreme Court of Utah considered a novel tool for statutory interpretation: corpus linguistics, “the study of language based on examples of ‘real life’ language use.”6 In recent decades, linguistic programs at universities and institutes have assembled corpora (or bodies) of language — vast computer databases cataloguing written and spoken language.7 These databases can easily be searched to retrieve examples of how words or phrases have been used in different contexts at different times.8 In Rasabout, the majority and concurrence debated the legitimacy of using this linguistic tool. The court unanimously held that the phrase “unlawful discharge of a firearm” in a criminal statute referred to each individual shot fired.9 To arrive at this decision, the majority relied upon traditional tools of statutory interpretation.10 But a concurring justice found these tools wanting and informed his judgment by searching for the word discharge in contemporary news articles and a linguistics database.11 The majority argued that this research was inappropriate largely because corpus linguistics is an unfamiliar, scientific tool and its proper use requires an expertise judges lack.12 Corpus linguistics is indeed novel. But in service of the traditional task of considering how a word is commonly used, jurists are capable of searching an online database for examples of how a word has been used. By providing externally generated examples, corpus linguistics can be a helpful double check against a judge’s intuitive understanding of a word or phrase.
Eight years ago, Andy Rasabout fired twelve shots from a car into the home of a rival gang member.13 A jury convicted him of twelve separate counts of unlawful discharge of a firearm under a Utah statute that makes it illegal to “discharge any kind of dangerous weapon or firearm . . . from an automobile . . . ; from, upon, or across any highway; . . . or . . . within 600 feet of . . . a house.”14 Because the shots were part of a single criminal episode, before sentencing the trial court merged the twelve separate counts into one conviction.15
The Utah Court of Appeals reversed, holding that Rasabout must be convicted and sentenced for each discrete shot fired.16 The court of appeals examined the text of the criminal statute to determine if discharge meant that the legislature intended a separate conviction for each shot fired or one conviction for the whole episode.17 Looking to dictionary definitions, the court determined that, in this context, discharge meant to “fire a weapon”18 or to “shoot.”19
The Utah Supreme Court granted certiorari and unanimously affirmed the court of appeals’s decision, finding that discharge, in the context of a “dangerous weapon or firearm,”20 referred to “each discrete shot” and, as such, each of Rasabout’s twelve discrete shots constituted a criminal violation.21 To arrive at this conclusion, Justice Parrish, writing for the majority,22 looked to the structure of the word discharge, the dictionary definition of discharge, the accompanying language in the statute, and common sense.23 After observing that the root of the word discharge — charge — has noun and verb meanings related to the amount of gunpowder used in a single shot and that the dictionary definition of discharge included the meaning “to shoot,” Justice Parrish concluded that “the clearest reading of the statute” is that discharge refers to each shot.24
Next, the majority admonished Associate Chief Justice Lee’s concurrence for its reliance on corpus linguistics as a tool for statutory interpretation,25 contending that his research was unfair to the parties because the rationale did not appear in the parties’ arguments26 and because judges should not decide cases by conducting their own “independent scientific research.”27 Justice Parrish contended that judges lack the expertise to conduct this research because “[l]inguistics is a scientific field of study that uses empirical research to draw findings,” and judges are generalists, not scientists.28 To illustrate this problem, Justice Parrish pointed out that professional linguistic studies published in reliable journals are subject to the rigors of peer review to ensure that the findings are reliable; in comparison, court judgments lack this systemic oversight.29 Lastly, Justice Parrish criticized the concurrence’s methodology by questioning the statistical significance of the findings and claiming that the more appropriate data set of language to analyze would have been the text of the Utah Code.30
Associate Chief Justice Lee concurred in part and concurred in the judgment.31 He diverged from the majority because he did not find that dictionaries fully resolved the meaning of discharge.32 While one definition, “shoot,” would confine the meaning of discharge to each shot fired,33 another definition, “empty of a cargo: UNLOAD,” could include “unloading or emptying of the contents of a weapon.”34 Under Utah precedent, when neither of two meanings can be eliminated, the court opts for the more ordinary meaning.35 Associate Chief Justice Lee felt that discharge ordinarily refers to firing a single shot and not to unloading a firearm,36 but rather than rely on intuition alone to choose one meaning over another, he turned to corpus linguistics.37
Associate Chief Justice Lee began by defending judicial use of corpus linguistics.38 He argued that judges already use an introspective version of corpus linguistics to interpret statutes.39 By searching their memories for how they have heard words or phrases used, judges are comparing the statute’s language to a corpus of language in their minds.40 The justice further argued that dictionaries themselves are “compiled from broader linguistic corpora.”41 And he noted cases where judges — including Justice Breyer42 and Judge Posner43 — have informed their interpretation of a statute by using search engines to acquire examples of the statutory language in question.44
To clarify the meaning of “discharge,” Associate Chief Justice Lee performed a Google News search and a search of the Corpus of Contemporary American English (COCA).45 He considered Google News a reliable source because published newspaper articles contain a wealth of natural language and the search engine allows the judge to search for phrases — a task that cannot be performed with a dictionary.46 Associate Chief Justice Lee’s Google News search of “discharge of a firearm” yielded favorable results for the majority’s preferred definition.47 While some articles were unclear, most articles used discharge to indicate the firing of a single shot and none used discharge to indicate the unloading of an entire gun.48 But Associate Chief Justice Lee also acknowledged the deficiencies of a Google News search.49 The algorithm is not transparent, and the results may be particularized for an individual user.50 To avoid these defects, Associate Chief Justice Lee turned to COCA, which is free, is accessible on the Internet, “contains more than 520 million words of text[,] and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts.”51 Associate Chief Justice Lee searched for the word discharge within five words of firearm, firearms, gun, or weapon.52 The examples he found overwhelmingly used discharge in connection with a single shot.53 Given these results, he agreed with the majority that discharge referred to each shot fired.54
Associate Chief Justice Lee closed by responding to many of Justice Parrish’s criticisms. He argued that his research was an appropriate legal investigation into the meaning of a law — not an inappropriate factual investigation of evidence in the case.55 While conceding that judges are not expert linguists, he argued that judges are experts in legal interpretation.56 Because the judicial mandate is to say what the law is, judges should employ the best tools possible to judge with certainty.57 Although judges may misuse scientific tools, he argued that the response to this risk should not be to abandon those tools, especially because traditional tools — intuition and the dictionary — “involve bad linguistics” and thus similarly risk incorrect determinations of how a word is commonly used.58 Finally, Associate Chief Justice Lee claimed that corpus analysis “is not rocket science.”59 Analogously, judges are not historians but still attempt to unearth the historical meaning of words because the judicial role demands that they try.60
State v. Rasabout is unlikely to attract attention for its decision on the merits, but its lively debate about corpus linguistics may foretell future skirmishes over the legitimacy of interpreting statutes with the help of data-driven tools.61 In Rasabout, the majority accused the concurrence of sua sponte scientific research beyond the judiciary’s expertise.62 The majority’s charge reveals two concerns: first, that corpus linguistics is an unfamiliar tool for statutory interpretation and, second, that corpus linguistics demands a skill set that judges lack. Corpus linguistics is new. But as long as judges use it to provide a ready source of examples of a word or phrase in context — and not to provide conclusive empirical proof of statutory meaning — corpus linguistics can be a helpful double check against a judge’s intuitive understanding of statutory language.
Associate Chief Justice Lee’s research was not a foreign, scientific approach to statutory interpretation; he employed a new tool for a traditional task. Just as dictionaries once were,63 corpus linguistics is a novel device for statutory interpretation. Clearly, linguistic databases are useful for many scientific endeavors. But the tool does not define the task. Both linguists and jurists work with language, and they can each use corpus linguistics differently to accomplish their different goals. Associate Chief Justice Lee’s efforts were worlds removed from the peer-reviewed research that linguists conduct.64 He adopted a rudimentary linguistic approach of identifying two possible meanings of a word and observing each meaning’s frequency within a sample.65 Required to decide the ordinary meaning of discharge, Associate Chief Justice Lee used a linguistics tool for the classic judicial undertaking of checking his initial understanding against real-world examples.66
Even if corpus linguistics can be used for a traditional judicial task, corpus linguistics can be an appropriate aid to statutory interpretation only if the judiciary can effectively use it. Judge Easterbrook reminds us: “Judges are overburdened generalists . . . . Methods of interpretation that would be good for experts are not suitable for generalists. Generalists should be modest and simple.”67 As Associate Chief Justice Lee argues, with digitized corpora on the Internet, corpus linguistics now makes modest and simple demands of a jurist, requiring an effort and expertise similar to that required by other search engines.68 After filling out a few text boxes and clicking “search,” a judge is provided with real-world examples of statutory language more quickly than she could imagine such examples on her own.69 Corpus linguistics need not “require expertise in fields in which [judges] have no training,”70 but can be used as a generalist’s assistant.
In addressing the judiciary’s lack of expertise, the majority raised a valid concern that the judiciary could inexpertly use corpus linguistics to produce misleading results about a word’s meaning. Setting aside the majority and concurrence’s back-and-forth over the specter of statistical significance,71 common sense alone tells us that quick Internet searches can lead to distorted results. For example, if Associate Chief Justice Lee had instead searched for discharge in proximity to Glock, pistol, and magazine, the resulting examples would have been different and Associate Chief Justice Lee could have drawn a different inference.72 Associate Chief Justice Lee took a quick look at a sample of real-world examples. These examples do not exhaustively or definitively represent how discharge is commonly used. The majority’s critique rightly shows that Associate Chief Justice Lee’s research was a glimpse at the terrain rather than a full geological survey.73 In light of the risk of an unrepresentative sample, Associate Chief Justice Lee’s counting also invites the criticism that a precise analysis of a rough list is falsely precise.74 Because discharge, in the COCA examples, was “almost always used in the sense of a single shot,”75 we can conclude that discharge has been used in this sense. But because a quick database search may have missed examples of discharge used in a different sense, we cannot infer that the word discharge is confined to this meaning.
Given this risk, if judges use corpus linguistics for statutory interpretation, they should employ it not as a conclusive method for determining meaning but rather as a safety net to catch what intuition and the dictionary might miss. Perhaps the best example of the benefits of testing intuition with a quick corpus search comes from a dialogue with an opponent of corpus linguistics, Professor Noam Chomsky:
Chomsky: The verb perform cannot be used with mass word objects: one can perform a task but one cannot perform labour.
Hatcher: How do you know, if you don’t use a corpus and have not studied the verb perform?
Chomsky: How do I know? Because I am a native speaker of the English language.76
But Chomsky was wrong. A quick corpus search reveals that perform can be used with mass word objects — “[o]ne can perform magic, for example.”77 Using corpus linguistics as a complement to traditional tools of interpretation can temper this risk of deficient interpretations of a word or phrase. The majority correctly warned that a sample from a corpus may not accurately represent the entire English language,78 and the concurrence rightly warned that our intuitive understandings of language may likewise be prejudiced or incomplete.79 But each tool bears its flaws independent of the other. The judicial use of corpus linguistics, while not dispositive of meaning,80 can reveal what lurks in the blind spots of traditional tools of interpretation.
State v. Rasabout could be a bellwether case. Despite the majority’s admonishment that the concurrence relied “on scientific research that is not subject to scientific review,”81 as an accessible, non-technocratic check on traditional methods, corpus linguistics may well belong in judges’ statutory interpretation toolkits.