AI Meets Copyright: Understanding New York Times v. OpenAI

Event Video

Listen & Download

Artificial intelligence is the most important technological tool being developed today, but the use of preexisting copyrighted works to train these AI systems is deeply controversial. At the end of 2023 the New York Times sued OpenAI and Microsoft, alleging that OpenAI's use of articles from the New York Times to train their ChatGPT large language model constitutes copyright infringement. An answer is due at the end of February, and it's expected the case will revolve on the question of whether the use of the copyrighted content of the Times was a fair use. The fair use analysis will likely turn on whether the use of copyrighted content to train a AI system "transforms" the work in a way which makes the use fair. The Supreme Court has spoken on this question twice recently, holding that Google's use of parts of Oracle's Java programming language to build the Android operating system was transformative, but that the licensing of a Andy Warhol work based on a photograph by Lynn Goldsmith was not transformative of Goldsmith's work. Also important and perhaps most on-point is a decision of the Ninth Circuit Court of Appeals that Google's Image Search system is transformative of the photographs it indexes and displays as thumbnails.


To help understand this case Professors Charles Duan from the American University Washington College of Law and Zvi Rosen of the Simmons School of Law at Southern Illinois University will be joined by Steven Tepp of Sentinel Worldwide, who is also a Lecturer at the George Washington University School of Law and formerly of the U.S. Copyright Office. John Moran of Holland & Knight will moderate the panel and provide additional perspective.

Featuring:

Charles Duan, Assistant Professor of Law, American University Washington College of Law

Zvi Rosen, Assistant Professor, Southern Illinois University School of Law

Steven M. Tepp, CoFounder, RightsClick

Moderator: John P. Moran, Of-Counsel, Holland & Knight

---

To register, click the link above.

*******

As always, the Federalist Society takes no position on particular legal or public policy issues; all expressions of opinion are those of the speaker.

Event Transcript

[Music]

 

Emily Manning:  Hello, everyone, and welcome to this Federalist Society virtual event. My name is Emily Manning, and I’m an Associate Director of Practice Groups with The Federalist Society.

 

Today, we’re excited to host a discussion titled, “AI Meets Copyright: Understanding New York Times v. OpenAI.” We’re joined today by Charle Duan, Zvi Rosen, Steven M. Tepp, and our moderator today is John P. Moran, Of-Counsel at Holland & Knight. John has experienced ligating many patent, trademark, and trade secret cases in federal district court and argues appeals at the U.S. Court of Appeals for the federal circuit and U.S. Court of Appeals for the Fourth Circuit. He has prosecuted or directly supervised the prosecution of hundreds of patent applications in many different technologies, including telecommunication systems and equipment, robotics, artificial intelligence, imaging technology, nuclear reactor instrumentation, semi-conductor devices and manufacturing processes and medical devices.

 

If you’d like to learn more about today’s speakers, their full bios can be viewed on our website, fedsoc.org. After our speakers give their opening remarks, we will turn to you, the audience, for questions. If you have a question, please enter it into the Q&A function at the bottom of your Zoom window, and we will do our best to answer as many as we can. Finally, I’ll note that, as always, all expressions of opinion today are those of our guest speakers, not The Federalist Society. With that, thank you for joining us today. And John, the floor is yours.

 

John P. Moran:  Thank you, Emily, for that introduction. Well, this afternoon’s panel, as Emily indicated, will discuss the copyright issues raised in the complaint brought by The New York Times against Microsoft and several OpenAI entity. The complaint, which was filed last December, has seven counts. Four of the counts are for copyright infringement counts. There’s a Digital Millenium Copyright Act count and a common law unfair competition by misappropriation of New York Times’s intellectual property. There’s also a trademark delusion count, which will not be addressed today.

 

In support of their copyright infringement allegations, New York Times included in the complaint a hundred examples of outputs that are nearly identical to the copyrighted New York Times content. Before we address the copyright issues, a bit of technology vocabulary at a very high level, may help the discussion. So to begin with, the complaint focuses on the OpenAI’s GPT models. As it relates to today’s discussion, the acronym GPT stands for Generative Pre-trained Transformer.

 

The Generative, which we will probably discuss today, indicates that the model takes inputs from users and generates an output, such as the hundred examples of New York Times content in the complaint. The P, for Pre-trained, indicates that the model is pre-trained on a volume of data, such as The New York Times content. And lastly, the T, for Transformer, very, very generally relates to a way of processing data to provide context for the data, such as a sequence of words.

 

One note that the complaint uses the term, imbedded. It appears that it uses that term in the non-artificial intelligence sense, that is, the sense that a stone is imbedded in concrete rather than the process of imbedding, converting words into corresponding numerical values. Those topics, we’ll probably -- those concepts, we’ll probably discuss during the panel’s discussion today.

 

So as Emily indicated, to discuss The New York Times’s allegations of copyright infringement, we have three experts. Zvi Rosen is an assistant professor, School of Law at Southern Illinois University. He was the Abraham Kaminstein Scholar in Residence at the United States Copyright Office and is particularly renowned for his expertise on copyright law, especially in its historical development. Charles Duan is an assistant professor at the American University, College of Law. He was previously a postdoctoral fellow at Cornell Tech, working with Professor James Grimmelmann. Charles focuses his research on the public effects of technology policy, intellectual property law, and primarily, patents and copyrights.

 

Steve Tepp is the President and CEO of Sentinel Worldwide, which is an intellectual property consultant. He’s also a lecturer in law at the George Washington University School of Law. More recently, Steve has co-founded RightsClick, which is a suite of software tools for independent creators to register, manage, and enforce their copyrights. As Emily indicated, we’ll start with opening statements. We’ll begin with Zvi, then Charles, and followed by Steve. So with that, I’ll turn it over to Zvi.

 

Zvi Rosen:  All right, everyone.  Good afternoon or morning, depending where you are. There are -- this case is really fascinating because we’re finally getting at a host of important issues regarding AI and the internet and what we call traditional producers of cultural works. We have New York Times, or many others that I think will probably be on its coattails. There’s a couple of legal issues I want to highlight, and I’d love some questions to follow this once my colleagues do their opening statements.

 

The initial issue here is really direct infringement, which is to say by talking all of these New York Times stories, they are effectively copying them or creating derivative works of them. A derivative work is a work which recasts, adapts, or transforms an existing work. The scope of derivative works right is, frankly, not terribly well-understood because it usually dovetails of copying, and it’s typically going to be one or the other. But that is a big part of what is being claimed here.

 

The resolution of that is really, I think, going to be a question of, A, is one of those activities happening. I tend to think it is, and honestly, I’m not quite positive how much even OpenAI -- I’m sure they’ll dispute it, but I think that’s probably an easier case, that, at least, some copying is occurring, although there are details of that I’m sure some of my colleagues will discuss. But then, fair use is going to be a major issue there, particularly whether or not it’s transformative. What transformative means is really a hard question.

 

The two recent sort of formative cases, the Acuff v. Campbell case, about 2 Live Crew’s “Pretty Woman” -- but I really think this case is going to be on a fair use conflict of the Google v. Oracle case, which found that the Java program called APIs, that they’re reimplementation or copying, depending on your perspective, I suppose, into the Android operation system was fair use and transformative. On the flip side, Warhol v. Goldsmith, holding that the Andy Warhol transformation of Lynn Goldsmith’s photo of Prince into an Andy Warhol silkscreen of Prince, and then, its licensing to Condé Nast was not fair use and not transformative. So that’s going to be a bit away, I suspect.

 

There was a motion to dismiss, which is pending, and that’s all we have right now. We don’t have -- and so, fair use, I don’t think is ripe yet. I think you have to wait for at least summary judgment of that, but I kind of suspect that’s gotten -- if it doesn’t get dismissed and we have to get to that, we’ll see on that. You have to show direct infringement to get to any of the other stuff, except some of the more -- except for the last one.

 

For contributory infringement, which is also alleged, a lot of these have to do specifically with Microsoft, which is not OpenAI but, as the complaint notes, is kind of an alter ego, and they have all sorts of rights in OpenAI. And they also own a lot of the IP. Contributory infringement is if the defendant had knowledge of direct infringement and defendant materially contributing to that infringement. Some loaded terms there. Knowledge -- the question we’re had about conflict, which happens a lot in the context of the internet and takedowns, that whether no one requires a sort of red flag or general knowledge.

 

There is a case involving Cox Communications and was very adequately flagging their users who are repeat infringers. The details of that case law are somehow — 26 years after the law was passed — still in flux, to a degree. And material contribution, I think that that one is easier to show here. I think it’s going to hinge on knowledge, but I’ve been wrong before. That’s the other key part.

 

Vicarious liability, the case that I always teach when I talk about vicarious liability is a case called Fonovisa v. Cherry Orchard [sic 00:10:15]. And it’s a case Fed held that a flea -- it’s contributory as well, but a case that held that a flea market that was basically a hotbed of pirated media was engaged in vicarious liability because if they had a right to control infringing activity, the infringing activity being the resale of pirated discs in stalls at a flea market that were rented out and deriving a financial commercial benefit from it, which was, once again, they were getting paid with fees from the stalls. Both of those are, I think -- are going to come up quite substantially.

 

The other sort of pseudo-IP issue I want to flag is this really interesting claim that New York Times owns Wirecutter, which is product recommendations, and they claim it is misappropriation because they are taking Wirecutter reviews and are not getting affiliate payments.  The lawyers for OpenAI are claiming this is preemptive of a copyright law to which there’s all-legal or equitable rights that are within the general scope of copyright are preempted. And on the other hand, you have this case, INS — International News Service — v. Associated Press, Supreme Court, 1918, holding that misappropriation of hot news -- in other words, news, which was fresh out of the battlelines in World War I, was a misappropriation and a violation of common law rights. That was not preempted. The scope of INS v. AP nowadays has been questioned, where some cases have limited it, but it’s not dead.

 

And so, I think it’s going to be interesting to see how that plays out. I think we’ll have -- I’m looking forward to some more of your questions. I think I’ll turn it over to Charles for more -- first for his thoughts.

 

Charles Duan:  All right. Thanks, Zvi, for that really excellent introduction to kind of what the major issues are in this case and what the kind of key doctrines of copyright and other law are that are at play. As you can see, this case has a lot of things going on. And so, what I’ll try to do is I’ll try to, number one, just go through kind of what’s been going on in the case specifically, so I’ll talk about particularly the motions to dismiss that have been filed and then take a couple of guesses. As you mentioned, we don’t know what the fair use defense is going to look like, but I’ll try to talk a little bit about what I’d expect, particularly trying to guess kind of what I would imagine OpenAI would try to argue and then maybe leave with a couple of broader thoughts about kind of where this fits into the larger debate over copyright and AI.

 

      So as far as the procedure of the lawsuit goes, we started with a complaint in the lawsuit filed in December 2023. And now, we have on the table two motions to dismiss, one from OpenAI that was filed late February and one from Microsoft that was filed about a week ago. These motions to dismiss don’t go to the entirety of the case, which is somewhat interesting. They only go to what the -- what Microsoft and OpenAI describe as sort of ancillary issues. And so, the ones that they talk about — in particular, the contributory liability issue — they say that there’s a lack of knowledge, as Zvi mentioned, so they say that that one should be dismissed.

 

      The common law misappropriation issue that Zvi mentioned, this hot news, INS v. AP case kind of theory, they say that that’s preempted by the Copyright Act, that copyright protection, being a federal law, overrides what the states -- what state protection is given there. And they point to a number of cases that show that the INS v. AP doctrine is kind of begrudgingly accepted at this point, and that’s kind of their argument for that. John mentioned also that there was this Digital Millennium Copyright Act count in the complaint, and that one is actually somewhat interesting, so I’ll just -- I’ll go into that just a little bit. Some of you may be familiar with the Digital Millennium Copyright Act in terms of its anti-circumvention provisions, the rules that say that you’re not allowed to kind of break digital rights management.

 

      This case actually deals with a different part of the DMCA, Section 1202, which relates to copyright management information, so basically, the inclusion of meta data, like authors or titles or copyright notices inside files, typically digital files. And so, the Times argues that, in training these generative AI systems, OpenAI and Microsoft removed that information and, as a result, violated Section 1202. The difficulty that they’re going to face in proving that, as Microsoft and OpenAI point out, is that, in order to show a violation of the section, you have to show what’s called a double scienter requirement, number one, that OpenAI knew that it was removing and, number two, that it knew that the result of removal or, at least, should have known that the result of removal would be further infringement. And so, OpenAI and Microsoft argue that, number one, there’s no evidence that this information was actually removed during the training process but, number two, that they won’t be able to satisfy the knowledge requirements.

 

      They also raise a time bar question. They say that there’s a three-year -- look-back period that limits the extent that the copyright allegations can go back. That’s actually a case that’s being considered by the Supreme Court right now.  It was just argued a couple of weeks ago. And so, that’s just another argument that they bring up. But this, again, doesn’t get to the substantive questions that Zvi mentioned, the questions of whether or not there actually is copyright infringement in the training of the systems using The New York Times and other articles.

 

      There are three points that the Times identifies as where the infringement could occur. Number one, they say that the collection of the articles to make the training data used to train these AI systems, that was an infringement because you are making a lot of copies in order to collect them. Second, they allege that the model itself — all of the data parameters, the, I think, 1.7 trillion numbers that make up the GPT systems that’s somewhere embedded in there — is all of the information necessary to replicate a lot of the articles, and therefore, the model itself is a potential infringement. And third, they say that when you use the model in such a way that it generates infringement content, that that use is sort of a public performance. It allows you to get the information out.

 

      In order to show that these -- in order to show infringement, what The Times would have to show is, number one, that this is copyrightable subject matter. There are some interesting questions there because a lot of the information that’s being drawn is factual, and so, maybe there’ll be that issue that comes out. Generally, factual information is not considered copyrightable, but like I said, that’s probably not going to be the lead argument. There also are questions of what exactly counts as an infringing act. Something that’s internally inside the model that nobody could actually see or understand, is that an infringement? There’s actually sort of an interesting question about that.

 

      But again, the large issue that OpenAI and Microsoft intend to raise — in fact, they say that they are -- they’re actually very excited, they say in their motions to dismiss, to litigate this issue — is the fair use question. Assuming that it is an act of copying or it is a derivative work to do all of those things I just mentioned, does the fair use doctrine permit it? And so, courts have used the fair use doctrine a variety of situations: Google v. Oracle in the software context; the Warhol case — that was an artistic use; the Campbell case, that was parody. Fair use is sort of this jack-of-all-trades doctrine. It ends up being used in all sorts of places.

 

      In that line, there are a number of cases that will help OpenAI quite a bit, although possibly to a limited extent. There was a case over Google Images where Google had collected a bunch of images and was displaying them using an image-search engine. Court said that because Google had really downsampled and they didn’t really serve as a replacement, that database of images was fair use. There’s a case called Iparadigms in which a company made a plagiarism detection program, and there was a question of whether or not the inputs to the plagiarism detection program, which were basically the essays that were being detected -- whether or not that was an infringement. And again, the Court said this is sort of a new tool. The actually articles aren’t retrievable.

 

      Similarly, with the Google Books case, it was alleged that Google’s scanning of a bunch of books to create the Google Books search engine was copyright infringement. And again, a court said the -- I think the Second Circuit said that this was fair use on the grounds that the mass scanning of books to provide a service that didn’t really replicate the value of the books themselves was allowable and, as a result, not a copyright infringement.  Courts usually apply this four-factor test in which they look at the nature of the copyrighted work, the nature of the use, the purpose and character of the use, the amount that was used, and finally — and probably this is the most important factor by many measures — the economic impact on the market for the original copyrighted work.

 

      And I think that that’s going to be the most interesting one to follow in this case because, on the one hand, these are incredibly valuable systems. Right? Artificial intelligence has huge potential in terms of business uses, commercial uses, uses for individual consumers. It can be used as a platform for many other technologies. Is that part of the market that inheres in the copyright that The New York Times has in all of its articles or that a novelist has in all of the novels that are used in training? Right? The novelist, or The New York Times, they would say, yes, the point of our articles is to provide information, and that information is being used to provide a valuable service through things like ChatGPT, and as a result, there should be some cut of that.

 

      OpenAI, of course, would argue the other way around. They would say, look, this is a completely different service. It doesn’t serve as a replacement for the original articles. It’s transformative in the way that Zvi mentioned, and transformative is something that the courts really looked at.

 

      So that, I think, is going to be sort of the parameter to the debate. We do have these cases about mass text and data mining, which are different from this case but provide some basis for understanding where fair use goes. And that transformative element and what effect the availability of these systems has on those copyrighted works, I think, is going to be really important, and something really to watch as this case progresses.

 

Steven M. Tepp:  All right. I think that makes it my turn. So let me begin by saying thank you to The Federalist Society for inviting me today and to Emily for organizing this panel and, of course, to John for his kind introduction and my fellow panelists for their opening remarks. Let me note that my remarks are my own and do not necessarily reflect the views of any client or employer.

 

      I want to begin by putting this case and others like it into a broader perspective. Those of us who have been working in copyright law and policy over the past 30 or so years — and I’m afraid my grey hair gives that away — have seen history repeat itself over and over. First, a new technology comes along and makes it easier than ever to copyright -- to obtain copyrighted works. Now, in the ideal scenario, that development is mutually beneficial. Creators can reach new audiences, expand audiences more easily, and the widespread availability of creative works drives demand for the technology. Everybody wins.

 

      But in practice, the operators of the technology in the past have often made choices that allocate to themselves the lion’s share of the income. In some cases, those choices have included willfully tolerating infringement on platforms, knowing full well that creators, especially independent creators, lack both the means and tools to achieve meaningful vindication of their rights. So here we are with generative AI systems built on large language models. It’s déjà vu all over again. Such systems require massive volumes of works in order to be capable of producing the commercially valuable outputs the designers seek to market. That fact is not in dispute.

 

      But how those works are obtained is a commercial choice. There is nothing in the nature of the technology that requires those works to be scraped without notice, without authorization, or without compensation, yet that’s precisely what’s happened. In the public policy sphere, people willing to defend those decisions often try to create a false dichotomy between the massive, unauthorized scraping and the existence of generative AI. The reality is that there are companies that have built generative AI systems on licensed materials. Those that choose to do otherwise are not engaged in a crusade for the betterment of humanity. They are commercial enterprises trying to avoid paying for critical inputs. As the chairwoman of the Federal Trade Commission recently said very plainly, firms cannot use claims of innovation as an excuse for lawbreaking.

 

      So let me turn to the particular legal issues in this case, and I’m going to focus on the direct copyright infringement issues. One would think that in a circumstance when computers were and are programmed to crawl the internet and copy literally billions of works — the largest copying effort in history — that it would be beyond serious contention that the reproduction right of those works has been implicated. And yet, OpenAI and others are trying to put that exact matter in dispute. Anyone who has even a passing understanding of how computers operate knows that computers must make copies in order to process what has been input into them. And The New York Times evidence shows that by inputting a certain set of prompts, substantially, if not strikingly, similar copies of their original works will be output by OpenAI. So it seems self-evident that copies of the original works must be in the computer memory in order for that to happen.

 

      Still -- and OpenAI goes to the contrary. If the internal operation of the system were transparent to the public, we would have real insight into the facts of how it operates. But despite the name it was given, OpenAI, like other generative AI systems, is, in fact, locked up tight. Perhaps some of this will come out in discovery, but in any event, I am deeply skeptical that there’s any serious argument other than that the unauthorized scraping of copyrighted works does implicate the reproduction right, which means, as my fellow panelists have already said, the real action in this case will be in the fair use argument.

 

      As has already been noted, the fair use assessment in the context of the first factor is likely to involve consideration of whether OpenAI’s copying constitutes a transformative use. While the term transformative has been part of copyright jurisprudence for a very long time, it was given special significance by the Supreme Court decision in the 2 Live Crew case, Campbell v. Acuff-Rose — to give a proper caption — in 1994. Since that time, lower courts have struggled to apply this term, sometimes resulting in extreme results, such as when Google’s verbatim copying of tens of millions of books was held to be highly transformative by the Second Circuit. Fortunately, the Supreme Court had occasion to revisit this doctrine in 2022 in Warhol Foundation v. Goldsmith and articulated a much more reasonable and workable framework. So I think the lower courts’ fair use decisions that predate Goldsmith are now of questionable applicability.

 

      The first factor begins with a contrast between nonprofit use, which is favored, and commercial use, which is disfavored. Prior to Goldsmith, some courts were finding that a transformative use not only negated the commerciality but essentially overtook all the other various factors as well. But in Goldsmith, the Supreme Court was much more measured, holding the weight of commerciality against fair use can be lessened by the degree to which the use is transformative, that is, has a further purpose or different character. It’s a sliding scale, not a Boolean analysis.

 

      So what constitutes transformative use? Some courts had gone so far in finding any new element of the use to be transformative that many commentators wondered if -- what, if anything, was left of the statutory right to authorize the creation of derivative works, as Zvi mentioned. The Goldsmith court wrote, “To make transformative use of an original must go beyond that required to qualify it as a derivative. . . A use that has a distinct purpose is justified because it furthers the goal of copyright, namely, to promote the progress of science and the arts within diminishing the incentive to create.” Quoting Authors Guild v. Google, the Court continued, “The more the appropriator is using the copied material for new transformative purposes . . . the less likely it is that the appropriation will serve as a substitute for the original or its plausible derivatives.” So we see that substitution is a key part of the analysis, one that has the power to disqualify use from transformative status.

 

      And The Times have shown that OpenAI is capable of reproducing substantially similar copies of its works. That’s powerful evidence on the substitution effect that OpenAI will have to contend with when this case moves forward. The Supreme Court, in Goldsmith, summarized its first factor analysis with the passage, “the first factor considers whether the use of a copyrighted work has a further purpose or different character, which is a matter of degree, and the degree of difference must be balanced against the commercial nature of the use.”

 

      Those who are sympathetic to the AI side will point to the creation of new works by the use of the resulting system. But this misses a key step. Generative AI systems do not create images or text or anything on their own initiative. Human interaction is required in the form of prompts, and without that human interaction, the resulting product would have no protective or creative expression. So it isn’t the AI system that generates new works, it is the human users.

 

      And it’s been good law for a very long time that the copier of works may not stand in the shoes of the users of those copies to justify the initial copying. That, of course, is Michigan Document Services v. Princeton University Press. We can talk about that if it comes up in the Q&A.

 

      As for the remaining fair use factors, it seems, to me, they all favor the copyright owner in varying degrees. The works were largely creative, although perhaps some news articles marginally less so. Still, I think it clear the second factor favors The Times. The entirety of every work was copied, so the third factor is strongly against fair use. And the fourth factor, harm to the current or potential market, very strongly favors The Times. Again, substitution is probably the strongest evidence of harm, followed closely by the existence of a licensing market for the same use, both of which we have here.

 

      So if OpenAI is going to prevail on fair use, it seems, to me, they will need a very strong finding on the first factor in order to overcome not only the commercial nature of the use but all the other factors as well. On the law, this seems unlikely. I can only imagine a finding of fair use, frankly, if the Court is taken in by the cool factor arguments. But it would be a travesty for creators’ works to lose essentially all protection in the AI age, and it would certainly harm the incentive to create, which is what the Goldsmith Court kept firmly in mind. Thanks.

 

John P. Moran:  All right. We have a couple of questions. But before I go there, I think you addressed this, Steve, and that is the substitution and the competitiveness in the market, and in particular, that particular fact, which the complaint, I believe, goes to some lengths to try to create a competitive market. And how does that get them out of or diminish the Google Books copying case?

 

Steven M. Tepp:  Well, I’m not sure the degree to which Google Books is still good law after Goldsmith, not that Goldsmith overturned that decision in any way, but the analysis was so different that it’s hard to make that case any longer that Google Books really is useful precedent. When you look at this AI -- generative AI systems in the most general way, and kind of departing for a moment from the particulars of the law but in a matter of social policy, we have systems where people’s work product was copied without their knowledge, permission, or compensation and is being used to design and implement technology that will then put many of those people out of business entirely or substantially reduce their income by competing directly with them.

 

      There’s also questions about, well, why did you copy this work versus that work. Does any individual work have particular value? Works were chosen because they are valuable for what they are. And so, there is value there. And I think at a -- the highest level of public policy analysis, value was taken; value has been created in companies that are already valuing themselves at over a trillion dollars, and there’s got to be a way to compensate the people whose works were taken to build those systems.

 

      It’s beyond this panel to talk about exactly how that works. Going back to the particulars of the copyright law, the fact -- this goes back to get to the reproduction right. It seems self-evident to me that copies were made, and I’ve just made the fair use analysis that I think is appropriate, so I’m -- I wouldn’t be surprised to see The Times prevail in this case, and -- but time will tell.

 

John P. Moran:  So we have a question that goes to ask that --

 

Zvi Rosen:  So can I jump in quickly on that?

 

John P. Moran:  Oh. I’m sorry.

 

Zvi Rosen:  I’m sorry.

 

John P. Moran:  Go ahead. Yes.

 

Zvi Rosen:  I just wanted to say I take a slightly different tack on Google Books. I think it’s very much a piece of Google Image Search, and I think it has to be read as fact specific. Whether or not it’s good law after Warhol, I don’t know, but I really think that the focus on Snippet View and directing people to booksellers in Google Books and the focus on directing people to websites in Google Images Search was key to decisions in terms of a market effect and with transformation, that the use was not a substitute in any way, at least in the Court’s view, but rather a channeling use of sorts. And so, I do think there has to be a difference there where there’s no channeling whatsoever here. But like Steve said, we’ll see.

 

Charles Duan:  Well, so if I can just kind of jump in very quickly, so one of the arguments that is sort of previewed in the motions to dismiss is this question of to what extent the actual substitutional use is actually relevant. And so, OpenAI and Microsoft both point to Exhibit J of the complaint, which is where you get sort of the original duplication or memorization, as they call it in the industry, of articles. And as those examples show, in order to get a -- in order to prompt ChatGPT to produce a New York Times article, you first have to give it the first half of the article, essentially. And so, their argument then is that, at that point, if you already have the first half of the article, it’s not much of a substitution to say that you get the second half, especially when the second half is available in other places on the internet.

 

      And so, that’s an interesting argument. I haven’t heard that before, and it’s sort of unique to this sort of situation. But I think that what it does mean is that it’s going to be a little more complicated, I think, to prove the substitution element compared to some of the other cases. That doesn’t mean that I think that’s a slam-dunk win for them. It certainly is the case that you can get a large chunk of copyrighted material. It’s just that it’s not the normal way that one would expect to retrieve it.

 

Zvi Rosen:  There’s a lot of focus on that this is a bug rather than what they want, which legally that shouldn’t be significant, but I don’t think that’s come up before. So who knows?

 

Charles Duan:  Yeah. My understanding is that there also is an amount of research on just how to stop, at a computer level, this sort of thing from happening. Is there a way that they can just put filters at the end, or can they use some sort of special training processes? That is sort of an interesting research question that I know is open. I think that the difficulty is that a lot of times, articles are available online in slightly different variations, like somebody might omit a paragraph, and that’s why it’s hard for -- it’s hard to simply detect identical copies. But it’s sort of interesting that there may be a technical solution in addition to the legal approach, and there’s a question of which one you think should lead. Right?

 

      On the one hand, maybe we want to say that the law encourages people to develop these sorts of technologies that avoid duplication. On the other hand, maybe we say -- maybe we’re worried that by imposing the law -- the hammer of the law too early, we prevent people from coming up with these sorts of technologies that can satisfy better middle grounds, and we end up -- one of the remedies that’s asked for is destruction of ChatGPT. And so, there’s -- there is a question of whether or not it would be premature to be applying this before we really know what the sort of opportunities of the technology ultimately look like.

 

Steven M. Tepp:  So that’s what I was referencing earlier when I said that some people want to look at this as copyright versus AI, and you can’t have both. Had OpenAI chosen to license the material it used, then no one would be asking to destroy ChatGPT nor is anyone asking to destroy AI systems that were built on licensed material. So putting that aside for a moment, the substitute effect has several layers to it.

 

      First and foremost, the fact that you can get that material out of ChatGPT is strong evidence that the copyrightable material is somewhere in the ChatGPT system, the OpenAI system, and therefore, the reproduction right is implicated. Then, on the question of whether the output is infringing and the substitution effect there, we kind of have two or three different layers. In traditional copyright analysis, obviously, substantial similarity is -- and access are the two prongs to proving copying and infringing of the reproduction. So the fact that it may be difficult or unusual to get ChatGPT to put that out doesn’t mean it’s less infringing. It just means it may take a little more work, but it’s still infringing.

 

      More generally -- more -- there’s a question about the degree to which these generative AI systems will overall diminish demand for the copyrighted works produced by the people who produce the works that were used to build the system. And that, admittedly, departs somewhat from a direct infringement analysis to more public policy question of is it right that the people whose works were taken without notice, without permission, and without compensation should be told, sorry, you’re out of luck, and these new systems that we built, with all the things we took from you, are now going to put you out of business.

 

John P. Moran:  We have a question from the audience, concerning -- generally, to the panelists together, and the specific question is relevance of Sega v. Accolade and the concept of intermediate copying. Can any of you guys speak to that?

 

Charles Duan:  Yeah. So this is -- so I was mentioning that there is a sort of question of whether or not purely internal use would count as an infringement. And there are a couple of cases that say that, at least, that’s an interesting question. So Sony -- the Sega, Accolade case and in the reverse engineering cases, they raise that question. I think there’s -- you probably remember. There’s the Cartoon Network case, right, which talks about transitory copying. So you have a number of cases that support the sort of idea that maybe internal copying or copying that is precedent to some other public use might not receive the same analysis.

 

      And so, that’s obviously going to be relevant here because -- to the extent that the allegation is that the copying is inside the model. The model is not something that anybody can see, and so, that might fall within those doctrines. Similarly, the training process, again, that’s not something anybody actually sees, and as a result, those aren’t -- those potentially aren’t necessarily the sorts of things that are of concern to copyright law in view of those cases. It's really kind of what’s going on at the output end that has the commercial impact.

 

      That’s where the fair use question is going to really come up. That’s where the market effect is going to come up. That’s where transformativeness is going to come out. Yeah. That’s at least part of where those cases fit in.

 

John P. Moran:  Well, we’ll follow up --

 

Zvi Rosen:  [CROSSTALK 00:41:33] --

 

John P. Moran:  -- go ahead, Zvi. I’m sorry.

 

Zvi Rosen:  I was going to say my view of both the Accolade case and Cartoon Network, if they boiled down to if you’re doing some internal copying for what’s otherwise a legal and noncopying use -- now, I will say, of course, Cartoon Network was plenty of copying, but it was all licensed. And in Sega v. Accolade, it’s interoperability copying, just so you could reverse engineer the interface for the game console. I think those cases basically boil down to, well, yeah, if you’re -- if it’s not actually infringing in any way, it’s just a way of getting to that conclusion and was otherwise totally legal. Intermediate copying is okay. But I don’t think that’s the case here. The whole problem is ChatGPT is spitting out lots of copyrighted content, and it’s copyrighted content and out]. I think that’s the problem.

 

Charles Duan:  Well, that’s sort of the question of the case. If it turns out that that is fair use, right, if it turns out that they accept the argument that this is a bug and de minimis spitting out of exact copy -- I don’t exactly how the Court would rule, but they say that that’s okay, then, the precedent copying should be okay as well. We don’t want, basically, copyrighted law to be focused on trivialities, in a sense. So kind of the question is -- it’s sort of a standard fall together situation where if the outputs end up being problematic, not protected under fair use, then the whole thing probably wouldn’t be fair use either. But if it is, then chances are the earlier copying is okay as well in view of those precedents.

 

Steven M. Tepp:  I think that’s a little bit too narrow in one respect. Sega v. Accolade was predicated on the fact -- the ruling was predicated on the fact that the resulting products that Accolade wanted to create were not competing with what they copied, in any way, not direct substitutes or even trying to create competing products as alternatives but rather games -- rather as opposed to the operating system, which is what they copied -- games that would simply work on that operating system. Whether generative AI systems are outputting substantially similar material to what was copied or even just generally competing with creators, it’s much more of a market effect than we had in any way in Sega v. Accolade.

 

      And I have to -- on the Cartoon Network case, I don’t really see much of an analogy there to any part of this case. That dealt with a -- it was -- well, all three aspects of the Cartoon Network case were, and remain, outlining -- outlier decisions on both the temporary copy issue, the -- and the public performance issue as well as the authorizing the copy or the -- I can’t think of the word. Anyway. None of those -- this is not about public performance. This is not about temporary copies. Those copies are in there for a long period of time.

 

Charles Duan:  I think they actually do allege that it’s a public performance, interestingly. I’m not exactly sure why, but I think that was actually in the complaint.

 

Steven M. Tepp:  Well, yes. But not in the -- it’s not the same sort of issue where, oh, there’s only one copy versus ten thousand copies, and ironically, the ten thousand copies are not a public performance, so said Cartoon Network. Any event, of all the holdings of Cartoon Network, that element is surely the weakest after the Supreme Court decision in Aereo, which, while not explicitly overruling Cartoon Network, articulated very clearly an analysis at odds with the Second Circuit’s analysis of public performance in Cartoon Network. I don’t see any way that that’s good law.

 

John P. Moran:  Does any of you have a view of the likelihood of a licensing solution to this, or do you think it’s going to resolve on -- by the Court -- Supreme Court, probably.

 

Zvi Rosen:  I think inevitably licensed. I think that -- I’m not saying -- I don’t want to say inevitably. You never know. This could be litigated for a dozen years, but you’ll note in their complaint that The New York Times is one of the most heavily used sources for OpenAI. But other sources — notably Associated Press, and there are others mentioned in the complaint — had been paid for licensing, so I think it’s there, but I think -- well, the old line is about litigation is negotiation by other means. I’m not sure it’s always true, but I think it is true here.

 

Charles Duan:  Yeah, I think that there -- as Steve mentioned, there are a number of companies that are out there developing generative AI systems that are based on licensing. The difficulty, of course, which is -- which I’m sure OpenAI would be happy to point to go, and I believe that they point to this in some of their papers -- is that a lot of the reason why these systems work particularly well is just the breadth of content, the fact that they have large quantities -- tremendously large quantities of data to work off of, which would be substantially harder to obtain by licensing.

 

      And so, while I think it’s correct that this isn’t a question of just AI versus licensing that you can have both at the same time. You are changing the parameters of what the technological development looks like. Right? And that could be a good thing. I’m not necessarily saying that it’s a bad thing if computer sciences are forced to work with a different set of legal parameters. There could be value in specifically working with licensed copies. But one thing to keep in mind is that there is also a pretty good chunk of public domain information that’s available that had been the basis for training of artificial intelligence for many years.

 

      And one of the reasons that companies have moved away from that is that it produced very strange systems. A lot of that material is hundreds of years old — material that’s officially out of copyright or material that is very strange to use as training data. The Enron email database was a common one that was used, and as people have pointed out, that ended up producing systems that incorporated all sorts of very strange biases. And so, I think that’s -- on the one hand, yeah, you can have a world in which licensing is prominent. You do end up with a different technological environment, and I think it’s an interesting question of what that environment looks like, and how you really feel about it, or what it does to the trajectory of the technology.

 

Steven M. Tepp:  Well, Charles, I think you just made a great case for the value of the copyrighted works that were copied in order to build generative AI systems. Does it take a little longer? Might it cost more to compensate creators for using their works to build your system? Well, yes, of course, the same way that it would take a lot longer if the AI companies didn’t have Nvidia chips because they didn’t want to pay for them. Their processing capacity would be greatly diminished, but they need them, and they want them, so they pay for them.

 

      And I would suggest that the social aspect of this, of telling creators, you’re the one input into generative AI that no one has to pay for. The coders, they get paid, developers. The chip makers, they get paid. The power utilities that provide all that electricity to the data farms, they get paid. But the creators, sorry, you’re out of luck. I think there’s a -- looked at in that context, it’s a pretty difficult argument to sustain.

 

Charles Duan:  But I think that they’re generally correct. This idea that if there’s some value out there, then possibly, we want to make sure that there’s compensation worth the value. The difficulty here, of course, is that copyright doesn’t cover every possible value of a work. Right? You have factual databases, which simply have no copyright protection because facts are not copyrightable. And so, there is sort of this larger question of how do you deal with the problem of value.

 

      Similarly, a lot of these systems use private data. Right? They use individual, personal data. That’s data that’s not copyright protectable at all. Right? And that is value, so one way you could solve that is by trying to create some sort of property right in private information, but that actually causes a lot of other problems, as a fair amount of scholarship puts out there. So I think that you want to separate out the questions of how do you deal with the value proposition of information from the copyright questions, and if you’re talking about just sort of the value propositions, maybe what you’re really asking for is some sort of regulatory agency, which we can debate that.

 

      But then, you also have to have the second question of is copyright the correct vehicle for doing that, given that copyright is historically limited to certain things, which are a little bit different from what exactly is going on with artificial intelligence systems -- the fact versus -- the idea versus expression, the fact versus expression dichotomies, and, of course, the fair use doctrine. These all play into kind of what the boundaries of copyright law are that don’t make value exactly align with the rights provided on federal law.

 

Steven M. Tepp:  I do agree there are elements of these issues that go beyond copyright law, and you’re seeing all sorts of proposals in those veins, from privacy protections to state and federal legislation on issues of publicity rights. Narrowing it to the copyright issues and -- because that is the focus of this panel, I do thing think that, of course, collections of data can be protectable as compilations by their selection, arrangement, and coordination, as I’m sure you know. And then, many of the works that were taken were taken precisely because they are valuable expression of contemporary American society. And we don’t want -- people aren’t going to pay for AI systems that speak in Old English or forsooth or what have you. So when I speak of the value, I’m speaking both, generally, beyond copyright but also in the context of the fourth factor, where there’s harm to the current and potential market for the works that were copied.

 

John P. Moran:  I have a question from the audience. This is a question of possible -- to the panel in its entirety. “Is there a place in this discussion for Sony, safe harbor?”

 

Zvi Rosen:  Yeah. For Sony, you said. Right?

 

John P. Moran:  Yes.

 

Zvi Rosen:  Yeah. So I actually pulled the lines from Sony. Everyone reads Sony [inaudible 00:53:09] first half, which is capable of commercially significant non-infringing uses. Of course, in Sony, the Court held that time shifting was okay, but librarying was probably not. And everyone forgets the second half. There’s room for it, but I don’t think it gets you there in this case.

 

      I think that, clearly, OpenAI -- well, that actually is an interesting question. Is it capable of non-infringing uses? It sort of depends how you define it. What point is non-infringing? And actually, a lot of the motion to dismiss was on statute of limitations, but I have to think discovery rule is going to bring the ingestion of a lot of that as well, potentially. Of course, then, we might be waiting to find out what Warner Chappell v. Nealy holds or if anything is held in that case.

 

Charles Duan:  This is a three-year statute-of-limitations case.

 

Zvi Rosen:  Yes.

 

John P. Moran:  Yeah.

 

Steven M. Tepp:  So on the Sony case — this is for those of you who aren’t familiar with it — this is a Sony Betamax issue from the early 1980s when the movie studios sued, claiming infringement by virtue of early video cassette recording of broadcast television. And as Zvi said, there was -- there were two findings. One was that recording a show that was broadcast over free broadcast television strictly for the purpose of watching it later at a more convenient time is a fair use but keeping libraries of those is not. And then, second, in terms of the contributory liability of the manufacturer of the Betamax machine, Sony in that case, that the existence of commercially significant substantial non-infringing uses meant the Court would not impute knowledge of the infringement — and knowledge being one of the two prongs of contributory liability.

 

      The Supreme Court, later in Grokster, made very clear that there is no safe harbor. This question was framed as a safe harbor in Sony. That is not the correct framing of the law. There is not a safe harbor. If there are commercially significant non-infringing uses, then Sony is still good law that the Court will not impute knowledge for that reason. That doesn’t mean the Court might not impute knowledge for another reason, which it did in Grokster, and that reason being, in that case, that the defendant had induced the infringements by the users of the Grokster system.

 

      So all of that, of course, is in the context of contributory liability, which is a doctrine of secondary or third-party liability in U.S. law. But there are direct infringement issues here that, I think, need to be resolved first. And that’s where the fair use comes in -- the fair use argument comes in, which Sony has nothing to say about in this context.

 

Charles Duan:  Yeah, so if I -- I know we’re pretty close to out of time, but if I can just say a couple of things about Sony. So Sony is simultaneously irrelevant and also highly relevant to this case. So the reason it’s not terrible relevant is that Sony dealt with this contributory infringement issue, and here the allegation is direct infringement. Right? And so, as a result, by doctrine, it doesn’t help us, other than for the time shifting and the question of what counts as fair use. But the way it is relevant is that Sony’s within a line of cases that deal with the intersection between copyright law and technology. Right?

 

      So you have a new technology, the VCR, which doesn’t, itself, say much about any particular copyrighted work but is a vehicle by which infringements can occur. And similarly, you had cases about Xerox machines, photographs. You can go all the way back to the printing press, if you really want, and you recognize you have these general purposes technologies that can enable copyright infringement, that can enable easier copying. How do you deal with that sort of balance? And really the answer comes down to what exactly are the boundaries of the copyright right. If we say that copyright involves any sort of copying at all, then, of course, all of these are infringement, and you wouldn’t be able to have printing presses, photographs, Xerox machines, any of these sorts of things.

 

      But of course, there is a fair use doctrine, and that fair use doctrine exists for a reason. It’s to set a boundary around what the copyright right protects and to allow for technological innovation beyond that space. And so, in a sense, what this case really is all about is what exactly is that line. Where does the line fall, such that an act that — colloquially we would call copying — falls outside of copying and falls within the space of permissible innovation. That’s a line that’s been pretty important throughout history in a lot of different technologies, and so, in that sense, this case is not terribly new. It’s simply an iteration of that same problem that has come up many, many times for copyright law and for a lot of other areas of law, how it intersects with new technologies.

 

Steven M. Tepp:  So that’s the false dichotomy of we wouldn’t have the printing press or Xerox machines if we didn’t have fair use because there’s no such thing as licensing. That’s, of course, not the case, nor do I believe that there’s some sort of penumbra around Sony or other copyright and technological innovation cases, and -- what I do agree with is that copyright has always been a law that has reacted to a variety of factors — developments in the marketplace, consumer preferences, and technological evolution — absolutely. And the Court looks at some of the particulars, and fair use has a key role in that, of course.

 

      So you -- we have decisions like Grokster, as I mentioned, Aereo, another one I mentioned, and many others where the Court said, no, this is a model based on infringement, and we’re not going to permit it. We have others where the Court, like in Sony, said, this is inherently meant to be an innocent business that could be used for something nefarious, but we don’t -- we’re not going to hold the manufacturer of the device secondarily liable just because it could be used in a bad way. In the case before us, these generative AI systems that made a conscious decision to copy copyrighted works by the billions, I think it’s a lot more like Aereo and Grokster than it is like the innocent business models.

 

John P. Moran:  Emily, I know we’re up against the hour, and I have just one question for the panel as a whole, and the question is what if we remove the technology from this discussion and just put it into a business sense, that, for some reason, the company starts a business and happens to go into The New York Times archives and carts it all off into their warehouse. And then, they hire a whole bunch of employees for people who come into the front counter and ask some questions, and they go back and scurry around in New York Times’s information and give them an answer or an essay. And that’s directly competing with what The New York Times uses its archives for.

 

      Is that a valid analogy? Is it technologically misplaced, or does that make a difference in the discussion, if you remove the technology from what’s actually happening with the copyrighted works?

 

Zvi Rosen:  John, I think your little description, it’s kind of close to a library, which is okay. But you do get into some -- but if you tweak it in two ways. One, if you say, what if they give copies of a Times article, if they cart it off, instead of just giving them summaries, or even if they cut and paste out sentences in sort of a ransom note style answer, I think you’re getting closer to the effect apparent here. And then, the flip side is what if they grab The New York Times from that -- from -- hot off the presses in the East Coast, wire summaries of a story to the West Coast, and print from there. And of course, that’s INS v. AP, which was held to be unlawful. So I don’t think you can make it technology independent because there’s all -- the technology dictates some of the legal questions.

 

Charles Duan:  To make things even worse, they do talk about hallucinations, the problem in which sometimes you will ask for a New York Times from ChatGPT, and it’ll give you something completely made up that has nothing to do with that, so the analogy really is a library reference desk that gives you, every once in a while, an exact copy of an article that it found, and every once in a while, it gives you totally useless factual information. It’s a little hard to draw analogies to that point, I think. But it does -- and I think it kind of shows why it’s a little difficult to work with analogies, and sometimes, it really is worth just figuring out exactly what’s going on with the technology at its proper level.

 

Steven M. Tepp:  I’ll just add that if-- what you described was a lot like a library, if it’s a nonprofit organization. You didn’t stipulate which way that organization was operating, nonprofit or commercial. But the point I want to make is Congress has taken pains to enact specific statutory exemptions for libraries to provide library patrons with a certain degree of service without going so far that it implicates the incentive to create in the first place, and libraries are not entitled to just take copies of everything that they want to have in their collections without paying for them, without permission, and so on.

 

      And in fact, in the online context, some of these issues are being litigated right now. There’s an entity that calls itself the Internet Archive that has created a website where it just lets people take copies of copyrighted works, and they want to call themselves a library, but they don’t qualify under any of the library exceptions. And thus far, in the pending litigation, they’ve been quite unsuccessful. It’s on appeal, so we’ll see where it goes. But I don’t -- I wouldn’t put a lot of money on them.

 

John P. Moran:  Over to you, Emily.

 

Emily Manning:  All right. On behalf of The Federalist Society, thank you all for joining us for this great discussion today. Thank you also to our audience for joining us. We greatly appreciate your participation. Check out our website, fedsoc.org, or follow us on all major social media platforms, @fedsoc, to stay up to date with announcements and upcoming webinars. Thank you once more for tuning in, and we are adjourned.

 

[Music]