Note from the Editor: The Federalist Society takes no positions on particular legal and public policy matters. Any expressions of opinion are those of the author. We welcome responses to the views presented here. To join the debate, please email us at [email protected].

In a case decided on February 11, the makers of generative AI (GenAI), such as ChatGPT, lost the first legal battle in the war over whether they commit copyright infringement by using the material of others as training data without permission. The case is called Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc.

If other courts follow this ruling, the cost of building and selling GenAI services will dramatically increase. Such businesses are already losing money.

The ruling could also empower content creators, such as writers, to deny the use of their material to train GenAIs or to demand license fees. Some creators might be unwilling to license use of their material for training AIs due to fear that GenAI will destroy demand for their work.

How GenAI Works

To understand this development, we need some technological background. To learn how to generate output, a GenAI is trained on a large volume of training data. The GenAI studies the relationships between features in training data. Features are fragments, such as individual words or numbers. A GenAI’s neural network uses this learned understanding of the relationship between features to calibrate the “weights” in its neural network. Once trained, a GenAI model references these weights to produce an output in response to a prompt.

It is widely understood that much of the data used to train GenAIs is material for which the GenAI maker did not purchase licenses, such as material scraped from the internet. Most expressive material, such as writing and pictures, is someone’s copyright property, even stuff that is free to view on the internet.

In GenAI training, a copy is made of each piece of training data. But a GenAI’s neural network does not store a literal copy. All it retains is information encoded in its weights about the relationship between the features in its training data.

Thus, in theory, the public can’t use a GenAI to retrieve copies of the training data. Yet, in some cases, people have been able to generate outputs highly similar to single pieces of training data, such as news articles, or things that appear commonly in training data, such as images of SpongeBob SquarePants. We don’t know how commonly that occurs and whether it tends to happen primarily when the GenAI user tries to engineer the prompt to produce a literal copy of something that likely appeared in training data.

The big legal issue is whether using someone else’s copyright property without permission to train a GenAI is copyright infringement. Powerful content-creator plaintiffs, including the New York Times, Getty Images, and author John Grisham, are suing GenAI companies arguing that it is.

Facts of the Case

A federal trial court in Delaware just issued the first court decision on the issue. It held that this use without permission constitutes copyright infringement and is not protected as fair use.

Yet this case has unusual facts. For one thing, it concerned training an AI that is not a GenAI.

This case concerns a legal research product. The plaintiff is Thomson Reuters, which makes the popular Westlaw legal research tool. As part of this tool, Thomson Reuters creates “headnotes,” which summarize key points of law in opinions and case holdings. These headnotes break down the law by category and subcategory. Once you find a headnote that addresses the legal issue you are researching, you can use Westlaw to find other cases containing the same headnote.

The defendant is Ross Intelligence. It wanted to produce a case-finding tool to compete with Thomson Reuters’ headnotes. Ross approached Thomson Reuters about purchasing a license to use its headnotes to create Ross’ new product, but Thomson Reuters declined to give Ross a license to use its product to create a competing product.

Spurned, Ross made a deal with a company named LegalEase to effectively get access to the headnotes. The facts are complicated, but, in essence, LegalEase is a Thomson Reuters licensee, and it made a product called “bulk memos,” which contain the Westlaw headnotes. Ross used these bulk memos (which means it used the headnotes) to train its AI legal research tool, which competes with Westlaw and its headnotes.

Importantly, the Westlaw headnotes themselves are not in Ross’s product’s output.

Technically, Ross’s technology is not GenAI because it outputs something like headnotes—short statements of law from opinions—rather than text that is custom-created in response to a user prompt, which is what GenAI outputs.

Thomson Reuters Sues Ross

Thomson Reuters sued Ross for copyright infringement. Ross shut down operations due to the Thomson Reuters litigation against it, but the litigation continued.

Ross argued that using headnotes in training its AI is fair use.

Generally speaking, copying someone else’s copyright property without permission is copyright infringement unless it constitutes a fair use. Determining whether there is fair use mainly requires asking whether the unauthorized copying and use of copyright property harmed the copyright owner. Under federal copyright law, assessing whether something constitutes fair use requires considering four factors: (1) the use’s purpose and character, including whether it is commercial or nonprofit; (2) the copyrighted work’s nature; (3) how much of the work was used and how substantial a part it was relative to the copyrighted work’s whole; and (4) how the use affects the copyrighted work’s market value.

The first and fourth factors are the most important ones.

Under the first factor, if the defendant’s copying is for a commercial purpose and if what it creates might compete with the copyright owner, that factor weighs in favor of the copyright owner.

Under the last factor, if the defendant’s copying reduces the market value of the plaintiff’s copyright property, that also weighs in favor of the copyright owner.

The Court Finds No Fair Use

Here, the court found that Ross makes a commercial product that competes with Westlaw’s headnotes and consequently hurts the headnotes’ market value. That’s pretty straightforward. For that reason, it granted summary judgment to Thomson Reuters on Ross’ fair use defense and found Ross liable for copyright infringement.

The court found that the second and third fair use factors favored Ross, but that wasn’t enough to keep Ross from losing.

Under the second factor, the court found that these headnotes have only thin copyright protection because they summarize court opinions. Court opinions are in the public domain; they are not copyright property.

The court also found that the third factor favored Ross. This part of the court’s opinion is crucial to the fate of GenAI companies.

The court noted that the only copying of the headnotes here is an “intermediate copying.” It noted that Ross was using de facto copies of the headnotes only in training its AI (an intermediate stage) and that the headnotes don’t appear in the output of Ross’ final product. That’s similar to what happens with GenAI—the unauthorized copies of others’ material are used only to train the AI, but the copies don’t (or, at least, are not supposed to) appear in the output.

The court found that because Thomson Reuters’s copyright property appears only in this intermediate training copy that the public never sees, this factor favored Ross. But despite that, Ross still lost.

The Court Distinguishes Ross’s AI Tool from GenAI

The court was careful in multiple places in its opinion to say that it was not considering a GenAI situation because Ross’s product doesn’t generate a custom output—it generates stock summaries of legal points akin to headnotes. Surely the court was careful to make that distinction because it knows of the many high-stakes copyright infringement cases pending against major GenAI operators.

Still, the logic of this opinion addresses the existential legal question facing GenAI makers. In both this case and with GenAI, the question is whether copying someone else’s copyright property as an intermediate step to use in training an AI constitutes fair use when the AI’s ultimate output doesn’t contain (or rarely contains) any of the copyright property of the plaintiff.

In support of their fair use defense, GenAI makers will likely argue that the output of their systems doesn’t compete with the material used in training data in the same way or to the same extent that Ross’s legal research product would compete with Westlaw’s headnotes.

It’s a messy factual picture. In some cases, the GenAI output does compete with training data, such as when a user asks the GenAI to write something in the style of a famous author or generate a picture in the style of a famous artist. In other cases, it doesn’t compete, such as when a user asks it to compose a poem about some current event. In between, there are many GenAI uses where whether the training data and GenAI output compete is debatable.

How Influential Will This Opinion Be?

This is a district court ruling, and so it doesn’t bind other courts. Other federal courts could reach different conclusions on the fair use issue.

Yet, this case will be influential because of the judge who wrote the opinion. While this case is in a federal trial court, Judge Stephanos Bibas—a Third Circuit judge sitting by designation—wrote the opinion. He is a Yale Law graduate, former clerk for retired Supreme Court Justice Anthony Kennedy, former law professor at the University of Pennsylvania Law School, and a respected jurist.

Other Legal Hurdles Thomson Reuters and Other Copyright Plaintiffs Must Surmount

The case isn’t over yet. Thomson Reuters still has work to do to prove some of its copyright ownership and registration. (You must register a copyright before you can sue for infringement of it.) For example, how Thomson Reuters registered its copyrights to the headnotes may impact the amount of damages it can recover. And we don’t yet know how big a financial hit Ross will suffer here.

The plaintiffs in the GenAI copyright infringement cases will face similar hurdles—proving copyright ownership, showing proper copyright registration, and, in some cases, proving enough damages to make litigation worthwhile.

Still, this ruling is blood in the water for content owners looking to sue GenAI makers. It will encourage more lawsuits.

The Biggest Fair Use Factor: Effect on Marketplace Value

Perhaps the biggest factor in causing Ross to lose its fair use defense, and the biggest factor in all fair use cases, is the marketplace impact of the defendant’s unauthorized copying.

It is reported that some GenAI makers have purchased licenses to use some material in training their GenAIs. Those GenAI makers may have done so because of fear of copyright infringement liability, even though no case has found liability until this case. Or they may have bought licenses to get access to material for training that is not publicly available or publicly available in good form.

This leads to circular reasoning by content owners. In arguing against fair use, content owners argue that when their content is used for AI training without their permission, they are deprived of the opportunity to earn revenue from licensing their content for AI training. But such owners have a right to demand license fees only if the unauthorized use for training isn’t a fair use. In the Ross case, the court noted the effect of Ross’s unlicensed conduct on the potential market value of Thomson Reuters potentially licensing the headnotes for AI training, but it didn’t address this circular reasoning problem. Other courts eventually must address it.

Will Courts Hearing GenAI Training Data Copyright Infringement Cases Follow this Case?

It’s possible that this embodies the old legal axiom that “bad facts make bad law.” Here, when Ross couldn’t persuade Thomson Reuters to sell a license to train its AI off the headnotes, it found a way to license them from a third party and did the training anyway. Setting aside the technicalities of the law, that smells bad.

If you disregard this opinion, GenAI makers have a strong fair use argument. An AI learns like a human being learns: from birth onward, everything you sense—everything you read, see, hear, touch, taste, and smell—sends information that educates your brain, enabling you to generate outputs in the form of writing, speech, and actions. After all, that’s why we call the technology “artificial intelligence.” Unless you as a human memorize and repeat a specific item of what you have read or heard, you are always synthesizing various ideas and information that you took in.

This human brain training might enable you to compete with whoever wrote the material. Perhaps if you read all of John Grisham’s novels, you could write like him and compete with him for book sales. But doing that would not make you a copyright infringer unless you stole blocks of his prose or details of his stories. Except in situations where some confidentiality obligation binds you, you are free to learn from others and to use that learning to compete with them.

Some plaintiffs suing GenAI makers claim that a GenAI will sometimes create an output that’s practically a copy of their copyright property. That’s a different situation. If that happens, copyright law is well-situated to address it. We have an established body of copyright law that doesn’t require significant development to resolve those cases. The big, valuable copyright battle is over GenAI training data, not its outputs.

What’s at Stake in this Training Data Copyright Litigation War?

The legal and societal question is whether you must purchase a license from the copyright owner to train a GenAI when that training could be described as merely a voyage of discovery and making connections about information in the world, similar to how a human learns.

There is a lot at stake here. Many content creators feel their livelihoods are threatened by what GenAI can create. Many technologists argue that GenAI creates huge opportunities for efficiency and advancement but fear that GenAI might not be sufficiently economically viable if practically all GenAI training data must be licensed.

Congress and the President could resolve this issue with legislation, but they tend to avoid getting involved in these legal fights. We will soon have rulings on fair use in GenAI cases. This issue will likely make it to the Supreme Court in a few years unless changes in AI technology make the issue moot.