Constitutional Law Blog Essay

Corpus Linguistics and the Second Amendment

Big Data rekindles the debate over the original meaning of the Second Amendment

The Second Amendment provides: “A well regulated Militia, being necessary to the security of a free State, the right of the people to keep and bear Arms, shall not be infringed.” In District of Columbia v. Heller, the dissenting Justices contended that this provision “is most naturally read to secure to the people a right to use and possess arms in conjunction with service in a well-regulated militia.” In contrast, the majority opinion concluded that the Second Amendment “protects an individual right to possess a firearm unconnected with service in a militia, and to use that arm for traditionally lawful purposes, such as self-defense within the home.”  Both the majority and dissent relied on originalist arguments, so the case turned, in large part, on how the phrases “keep and bear Arms”—or more precisely “keep . . . Arms” and “bear Arms”—were understood in 1791.

At the time Heller was decided, both of us were persuaded by Justice Scalia’s majority opinion, which concluded that the Second Amendment protects an individual right to possess a firearm, without regard to militia service. Yet the Supreme Court considered only a fairly narrow range of sources to interpret the critical phrase, “the right of the people to keep and bear arms.” Indeed, Justice Scalia admitted that his analysis was limited to the “written documents of the founding period that we have found” (emphasis added). Likewise, Justice Stevens’s dissent cited “dozens of contemporary texts” (emphasis added). Today, big data allows us to do much better.

Brigham Young University recently released a new database known as the Corpus of Founding Era American English (COFEA). It organized nearly 100,000 texts with over 140 million words from the start of the reign of King George III (1760) to the death of George Washington (1799). (Justice Thomas recently cited COFEA in his separate opinion in Carpenter v. United States.) These documents are not all legal sources. Rather, the corpus—or “body” of works—also includes letters, newspapers, sermons, books, and other materials to show how people from all walks of life used certain words in various contexts during the 18th century. Through the approach known as corpus linguistics, scholars can now analyze how specific words and phrases were understood during this critical period.

Applying corpus linguistics to the Second Amendment leads to potentially uncomfortable criticisms for both the majority and dissenting opinions in Heller. Both Justices Scalia and Stevens should have expressed more caution when reaching their textualist conclusions based on the narrow subset of founding-era sources that they reviewed. Going forward, judges and scholars should consider how corpus linguistics can be integrated into the broader field of constitutional interpretation to better understand the entire Constitution. This essay explores the principles of corpus linguistics in relation to three elements of the Second Amendment: “bear arms,” “keep . . . arms”, and “the right of the People.” This brief essay does not purport to provide a definitive resolution about the Second Amendment’s original understanding. Rather, this essay seeks to shed some light on the discrete textualist inquiries about which the majority and dissent in Heller disagreed, through the lens of corpus linguistics.

Constitutional Interpretation and Corpus Linguistics

Corpus linguistics has the potential, in the words of Georgetown Law Professor Larry Solum, to “revolutionize” constitutional interpretation. This technology provides an unprecedented level of insight into what a constitutional word or phrase meant at the time it was ratified. Corpus linguistics improves on more traditional tools to determine original meaning, such as dictionaries. While never a bad starting place, founding-era dictionaries have a host of flaws. For instance, while dictionaries may define the word “bear” and the word “arms,” they tend not define phrases, such as “bear arms.” This limitation is problematic because a phrase is often more or less than the sum of its linguistic parts. And even if the dictionaries provide an example of a phrase, they are few and sometimes not helpful.

Dictionaries have another important limitation: Though they often can provide a range of possible senses, they do not generally say which is the most appropriate or most common in a particular context. For example, the Constitution provides that “no Person holding any Office of Profit or Trust under [the United States], shall, without the Consent of the Congress, accept of any present, Emolument, Office, or Title, of any kind whatever, from any King, Prince, or foreign State.” In the late 18th century, the word “emolument” had two senses. Under the broader sense, “emoluments” covered any profit, benefit, advantage, or gain one obtains, whether tangible or not, from any source. Under the narrower sense, “emoluments” was limited to legally authorized compensation or monetizable benefits from public office, employment, or service. One study concluded that the overwhelming majority of founding-era dictionaries only listed the broader sense of “emoluments.” However, those definitions were not limited to the sense in which “emoluments” is used in the Constitution: a “person holding any Office of Profit or Trust under” the United States.  As one of us has written, at the time of the framing, the narrow sense predominated when the phrase “emolument” was used if the recipient was an officer or the provider was the government. Founding-era dictionaries cannot yield such insights; corpus linguistics can.

There are other reasons to question founding-era dictionaries. First, such dictionaries often drew on much earlier instances of English, including usages that may have drifted in meaning over time. Second, editors of such dictionaries tended to lump (that is, adopt the broader sense) rather than to split (that is, list separate, narrower senses). Third, these dictionaries tended to be idiosyncratic, because they represented the work of just one or two minds. As a result, definitions were often affected by personal bias and human error. Fourth, founding-era dictionaries were more likely normative rather than descriptive in the definitions they published. That is, they focused on what language ought to mean, rather than how language was actually used. Fifth, editors of dictionaries often plagiarized each other, potentially creating a false sense of uniformity. It is not surprising, then, that many dictionaries provided virtually identical definitions. A corpus, however, which includes a wide range of documents, does not suffer from these same problems.

To be sure, relying on a corpus cannot answer every constitutional question. Sometimes the data may be inconclusive. Perhaps other times the inquiry at hand may not be answerable by a corpus. Yet, corpus linguistics is an appropriate interpretive tool for the deeply important question of what the Constitution originally meant. And it is a far more useful tool than other techniques used in the past, such as founding-era dictionaries.  In that sense, corpus linguistics is akin to a paradigm-shifting technology like the Hubble Telescope. Certainly, astronomers could glimpse the heavens from earth before the Hubble was launched. But the increased clarity and scope the Hubble brought to astronomic inquiries was revolutionary. In the same fashion, corpus linguistics can help mitigate methodological deficiencies that plagued earlier tools by allowing searches for phrases in a given context across thousands of texts. Critically, this collection was not created or assembled with any particular constitutional inquiry in mind. Finally, the increased objectivity, accessibility, and transparency of a public corpus allows anyone with a bit of training to engage in the research and debate, rather than just those with access to rare historical archives. You do not need a PhD in linguistics or history to perform this research. And anyone else can easily retrace and verify a corpus research trail. In our view, the Second Amendment represents an important test case for the value of corpus linguistics in constitutional interpretation.

“Bear Arms”

The Second Amendment begins, “A well regulated Militia, being necessary to the security of a free State. . . .” This prefatory clause announces that its purpose has something to do with military service. Justice Stevens’s dissent in Heller read the preface, along with the phrase “bear arms,” to limit the right to the militia context.  Justice Scalia’s majority opinion took a different approach. He contended that in 1791, the phrase “bear arms” was best understood to refer to carrying a weapon “for a particular purpose—confrontation.” But not necessarily a militia-related confrontation. He drew this conclusion based on his “review of founding-era sources,” including “[n]ine state constitutional provisions” that were drafted shortly before and after the Second Amendment’s ratification. These provisions, Justice Scalia explained, protected a right to bear arms in defense of oneself or the state. They “did not refer only to carrying a weapon in an organized military unit.”

Yet, Justice Scalia acknowledged that in other sources, the phrase “bear arms” had a distinctly military meaning. In any event, the majority opinion found it “especially unremarkable that the phrase [“bear arms”] was often used in a military context in the federal legal sources (such as records of congressional debate) that have been the focus of [the District of Columbia’s] inquiry.” Had the government looked at “[o]ther legal sources,” Justice Scalia argued,” they would have found the phrase “bear arms” used in “nonmilitary contexts.” The same result, he contended, would prevail “if one looks beyond legal sources.” Justice Scalia’s opinion implicitly recognized the deficiency of looking at a limited range of materials.

In Heller, three Linguistics Professors submitted an amicus brief, which attempted to answer this charge. They surveyed “115 texts,” including “books, pamphlets, broadsides and newspapers from the period between” 1776 and 1791 that used the phrase “bear arms.” Of those sources, 110 usages were “in clearly military context.” Of the five sources they located that used the phrase “bear arms” in a non-military context, only one was not “qualified by further language indicating a different meaning.” However, the Professors recognized the limits of their own research: “We otherwise have been unable to find” any usages of “bear arms” that did not have a military-related meaning.  In response to this evidence, Justice Scalia wrote “the fact that the phrase was commonly used in a particular context does not show that it is limited to that context, and, in any event, we have given many sources where the phrase was used in nonmilitary contexts.”

This sort of research may have been the state of the art in 2008. However, modern technology allows us to dive deeper. Professor Dennis Baron, who joined the Linguistics brief in Heller a decade ago, searched the COFEA for the term “bear arms.” He also performed the same search on the Corpus of Early Modern English, which includes nearly 1.3 billion words from over 40,000 texts from 1475-1800. He found “about 1,500 separate occurrences of ‘bear arms’ in the 17th and 18th centuries, and only a handful don’t refer to war, soldiering or organized, armed action.” From this evidence, Professor Baron concluded that the “[t]hese databases confirm that the natural meaning of ‘bear arms’ in the framers’ day was military.” Likewise, Professors Alison LaCroix and Jason Merchant used Google Books to search for the phase “bear arms” in sources published between 1760-1795. They found that in 67.4% of the sample size, “bear arms” was used in its collective sense, whereas in 18.2% of the sample, the phase was used in an individual sense.

In Heller, Justice Scalia acknowledged that “at the time of the founding,” the phrase “bear arms” had a military connotation at times, but “it unequivocally bore that idiomatic meaning only when followed by the preposition ‘against.’” For example, the Declaration of Independence charged that the King of England had “constrained our fellow Citizens taken Captive on the high Seas to bear Arms against their Country.” Justice Scalia added, “[e]very example” provided by the Linguistics Professors of “bear arms” with a military context “includes the preposition ‘against.’” That pattern may have been true of the limited number of examples provided by the Linguistics Professors in 2008, but what about when “bear arms against” is searched in the BYU corpora (a collection of corpuses)?

We performed a similar, but broader search in COFEA. Our research assembled every instance of “arm(s)” appearing within 4 words of some form of the verb “bear.” We then reviewed 50 instances. A sample size of 50, though small, is adequate when there are only about 600 instances in the total corpus. (We encourage other scholars to perform the same search and review as large of a sample as they deem appropriate.) According to our research, even when we exclude the phrase “bear arms against,” the overwhelming majority of instances of “bear arms” was in the military context. Justice Scalia certainly identified instances when the phrase “bear arms,” absent the preposition “against,” had a non-military context—but those usages were less common. Does this fact disprove Justice Scalia’s analysis of the Second Amendment?

That’s a difficult question. It is true that often when a particular sense (or meaning) of a word is used more frequently in a given context, then that sense is more likely to be the appropriate one in that context. Yet frequency is not always just about linguistic meaning, but also about facts concerning the state of the world. Consider the verb “to read.” If we search a corpus for “to read,” we will find more instances of people reading a newspaper than reading a street sign, even though both instances draw on the same meaning of “to read.” The relevant question then becomes whether a phrase—here, “bear arms”—has a distinct meaning, rather than just a frequent real-world situation where the verb “to bear” is used. The fact that most references to bearing arms in 18th century writings were in the military context could very well mean that “bear arms” has a military-only sense besides a more general sense. Or the data may show only that when people used the phrase “bear arms” they did so most frequently in a military setting.

Finally, the fact that “bear arms” was sometimes used in a non-military context may rebut the argument raised in Heller that “bear arms” only had an idiomatic military context. For example, a minority report from the Pennsylvania constitutional ratifying convention in 1788 expressly distinguished between the “right to bear arms for the defence of themselves and their own state or the United States” and the right to bear arms “for the purpose of killing game.” The report also noted that apart from “the right to bear arms,” the “military shall be kept under strict subordination.” These concepts were distinct. A 1770 book on the reign of Charles the Fifth referred to those who “could not bear arms in their own defence.” That usage does not relate to any military context. More research is needed to conduct a thorough assessment of how the phrase was used. But that analysis alone would not and cannot fully explain the meaning of the Second Amendment. There is another important term: “keep . . . arms.”

“Keep . . . Arms”

Does the Second Amendment refer only to the right to “bear arms”? Or does it also refer to the right to “keep . . . arms”? In Heller, Justice Scalia adopted the latter view: “[t]he most natural reading of ‘keep Arms’ in the Second Amendment is to ‘have weapons.’” In contrast, Justice Stevens adopted the former view:  The Second Amendment “protects only one right, rather than two.” That is, “the single right that it does describe is both a duty and a right to have arms available and ready for military service, and to use them for military purposes when necessary.”

Professor Baron’s recent analysis, as well as that of Professors LaCroix and Merchant, focused only on “bear arms,” and not “keep arms.” Justice Scalia observed in Heller that “[t]he phrase ‘keep arms’ was not prevalent in the written documents of the founding period that we have found” (emphasis added). Again, Justice Scalia’s opinion implicitly recognized the deficiency of studying a limited range of materials. Today, given a larger corpus, we were able to find a significant number of such usages. We performed a search for the word “keep” (and its variants, “keeping,” “kept,” etc.) within four words of “arm” or “arms.” This sort of complicated query of such a wide-range of founding-era sources was technologically impossible in 2008 when Heller was decided.

Our search yielded roughly 200 results, which included every instance of arms near some form of the verb to keep. Again, we reviewed 50 of these documents as a sample. From this lot, we discarded irrelevant searches (such as “she kept her arms above her head”), quotations from the Constitution, and duplicates. In the remaining 18 documents in the sample, about half referred to keeping arms in the military context, roughly a quarter referred to a private sense of keeping arms, and another quarter or so were ambiguous references.

Justice Stevens’s dissenting opinion concluded that the Second “Amendment’s use of the term ‘keep’ in no way contradicts the military meaning conveyed by the phrase ‘bear arms’ and the Amendment’s preamble.” Yet, the corpora suggest that the phrase “keep arms” was sometimes used to refer to private, non-militia ownership of firearms. This research poses a similar question as did the one posed by our research concerning “bear arms.” Was “keep arms” more often used in a military sense, but also sometimes used in a personal sense? That is, would people at the time have only understood the phrase “keep arms” to refer only to military keeping of arms? Or as Justice Scalia framed the question in Heller, was “‘[k]eep arms’ . . . simply a common way of referring to possessing arms, for militiamen and everyone else,” with the military just being a more common real-world context? Was the more common usage necessarily the “expected” usage? Conversely, was the less common usage necessarily the “outlier” usage? Additionally, was Justice Stevens correct that the Second Amendment protects only a “single right” and not separate rights to “bear arms” and “keep arms”? More research is needed to resolve these questions as well.

With respect to both “bear arms” and “keep arms,” the analyses and conclusions of both Justice Scalia’s majority and Justice Stevens’ dissent suffered from a lack of access to a large enough corpus to answer the linguistic questions presented.

“The Right of the People”

For purposes of a textualist analysis of the Second Amendment, the final relevant phrase is “the right of the people.” Neither the majority nor the dissent in Heller attempted to use linguistic sources to interpret the “right of the people,” standing by itself. Instead, both opinions used what Professor Akhil Reed Amar dubbed intratextualism in order to compare the Second Amendment’s reference to “the people” with how that phrase is used in First and Fourth Amendments. To Justice Stevens, the First Amendment’s protections of the “people[’s]” “right ‘peaceably to assemble, and to petition the Government for a redress of grievances . . . contemplate collective action.’” In other words, not individual rights. Likewise, he explains, “the words ‘the people’ in the Second Amendment refer back to the . . . collective action of individuals having a duty to serve in the militia.” Similarly, the Fourth Amendment “describes a right [of ‘the people’] against governmental interference rather than an affirmative right to engage in protected conduct.” Again, a collective right. Justice Scalia disagrees: These amendments “refer to individual rights, not ‘collective’ rights, or rights that may be exercised only through participation in some corporate body.”

Corpus Linguistics may provide further insights into the usage of “the right of the people” in the Second Amendment. For instance, how often was that phrase used outside of the Constitution to refer to individual versus collective rights? Likewise, we can study how often the word “arms” was used in the vicinity of the word “rights.” This query would inform whether a collective or individual right is referenced most often.

If research on all of these terms—“bear arms”, “keep arms,” and “the right of the people”—all point in the same direction, perhaps we can have some confidence in knowing how the words of the Second Amendment were more commonly used.


Corpus linguistics offers a very promising new avenue for originalist research, but it may not always fully answer the original meaning of the Second Amendment. Linguistic inquiries often fail to account for other evidence that informs constitutional meaning, including the structure of the Constitution and historical practice. And even within the linguistic evidence, we still need to determine whether all sources are treated equal or whether some sources should be seen as more probative of the interpretive question at hand. Specifically, we have to recognize that for legal terms of art, certain legal materials will be the most relevant. For non-legal-terms-of-art, non-legal materials will be the most relevant. For example, state constitutional provisions that protect the right to bear arms, that were drafted contemporaneously with the Second Amendment, perhaps should be weighted more heavily than other sources. Should substantially all available evidence—linguistics, the Constitution’s structure, and historical practice—point in the same direction, then we can be rather confident about the original meaning of “the right of the People to keep and bear arms.”

With respect to the Second Amendment, far more research is needed before either side can declare victory in this important and contentious constitutional debate. Corpus linguistics may facilitate that research. Corpus linguistics is a helpful new tool—neither liberal nor conservative—that can provide further insight into what the Constitution means.