Generative Artificial Intelligence And Copyright Law – Analysis

October 1, 2023October 1, 2023 CRS 0 Comments

By Christopher T. Zirpoli

Innovations in artificial intelligence (AI) are raising new questions about how copyright law principles such as authorship, infringement, and fair use will apply to content created or used by AI. So-called “generative AI” computer programs—such as Open AI’s DALL-E and ChatGPT programs, Stability AI’s Stable Diffusion program, and Midjourney’s self-titled program—are able to generate new images, texts, and other content (or “outputs”) in response to a user’s textual prompts (or “inputs”).

These generative AI programs are trained to generate such outputs partly by exposing them to large quantities of existing works such as writings, photos, paintings, and other artworks. This Legal Sidebar explores questions that courts and the U.S. Copyright Office have begun to confront regarding whether generative AI outputs may be copyrighted and how generative AI might infringe copyrights in other works.

Copyright in Works Created with Generative AI

The widespread use of generative AI programs raises the question of who, if anyone, may hold the copyright to content created using these programs.

Do AI Outputs Enjoy Copyright Protection?

The question of whether or not copyright protection may be afforded to AI outputs—such as images created by DALL-E or texts created by ChatGPT—likely hinges at least partly on the concept of “authorship.” The U.S. Constitution authorizes Congress to “secur[e] for limited Times to Authors . . . the exclusive Right to their . . . Writings.” Based on this authority, the Copyright Act affords copyright protection to “original works of authorship.” Although the Constitution and Copyright Act do not explicitly define who (or what) may be an “author,” the U.S. Copyright Office recognizes copyright only in works “created by a human being.” Courts have likewise declined to extend copyright protection to nonhuman authors, holding that a monkey who took a series of photos lacked standing to sue under the Copyright Act; that some human creativity was required to copyright a book purportedly inspired by celestial beings; and that a living garden could not be copyrighted as it lacked a human author.

A recent lawsuit challenged the human-authorship requirement in the context of works purportedly “authored” by AI. In June 2022, Stephen Thaler sued the Copyright Office for denying his application to register a visual artwork that he claims was authored “autonomously” by an AI program called the Creativity Machine. Dr. Thaler argued that human authorship is not required by the Copyright Act. On August 18, 2023, a federal district court granted summary judgment in favor of the Copyright Office. The court held that “human authorship is an essential part of a valid copyright claim,” reasoning that only human authors need copyright as an incentive to create works. Dr. Thaler has stated that he plans to appeal the decision.

Assuming that a copyrightable work requires a human author, works created by humans using generative AI could still be entitled to copyright protection, depending on the nature of human involvement in the creative process. However, a recent copyright proceeding and subsequent Copyright Registration Guidance indicate that the Copyright Office is unlikely to find the requisite human authorship where an AI program generates works in response to text prompts. In September 2022, Kris Kashtanova registered a copyright for a graphic novel illustrated with images that Midjourney generated in response to text inputs. In October 2022, the Copyright Office initiated cancellation proceedings, noting that Kashtanova had not disclosed the use of AI. Kashtanova responded by arguing that the images were made via “a creative, iterative process.” On February 21, 2023, the Copyright Office determined that the images were not copyrightable, deciding that Midjourney, rather than Kashtanova, authored the “visual material.” In March 2023, the Copyright Office released guidance stating that, when AI “determines the expressive elements of its output, the generated material is not the product of human authorship.”

Some commentators assert that some AI-generated works should receive copyright protection, arguing that AI programs are like other tools that human beings have used to create copyrighted works. For example, the Supreme Court has held since the 1884 case Burrow-Giles Lithographic Co. v. Sarony that photographs can be entitled to copyright protection where the photographer makes decisions regarding creative elements such as composition, arrangement, and lighting. Generative AI programs might be seen as a new tool analogous to the camera, as Kashtanova argued.

Other commentators and the Copyright Office dispute the photography analogy and question whether AI users exercise sufficient creative control for AI to be considered merely a tool. In Kashtanova’s case, the Copyright Office reasoned that Midjourney was not “a tool that [] Kashtanova controlled and guided to reach [their] desired image” because it “generates images in an unpredictable way.” The Copyright Office instead compared the AI user to “a client who hires an artist” and gives that artist only “general directions.” The office’s March 2023 guidance similarly claims that “users do not exercise ultimate creative control over how [generative AI] systems interpret prompts and generate materials.” One of Kashtanova’s lawyers, on the other hand, argues that the Copyright Act does not require such exacting creative control, noting that certain photographs and modern art incorporate a degree of happenstance.

Some commentators argue that the Copyright Act’s distinction between copyrightable “works” and noncopyrightable “ideas” supplies another reason that copyright should not protect AI-generated works. One law professor has suggested that the human user who enters a text prompt into an AI program—for instance, asking DALL-E “to produce a painting of hedgehogs having a tea party on the beach”—has “contributed nothing more than an idea” to the finished work. According to this argument, the output image lacks a human author and cannot be copyrighted.

While the Copyright Office’s actions indicate that it may be challenging to obtain copyright protection for AI-generated works, the issue remains unsettled. Applicants may file suit in U.S. district court to challenge the Copyright Office’s final decisions to refuse to register a copyright (as Dr. Thaler did), and it remains to be seen whether federal courts will agree with all of the office’s decisions. While the Copyright Office notes that courts sometimes give weight to the office’s experience and expertise in this field, courts will not necessarily adopt the office’s interpretations of the Copyright Act.

In addition, the Copyright Office’s guidance accepts that works “containing” AI-generated material may be copyrighted under some circumstances, such as “sufficiently creative” human arrangements or modifications of AI-generated material or works that combine AI-generated and human-authored material. The office states that the author may only claim copyright protection “for their own contributions” to such works, and they must identify and disclaim AI-generated parts of the work if they apply to register their copyright. In September 2023, for instance, the Copyright Office Review Board affirmed the office’s refusal to register a copyright for an artwork that was generated by Midjourney and then modified in various ways by the applicant, since the applicant did not disclaim the AI-generated material.

Who Owns the Copyright to Generative AI Outputs?

Assuming some AI-created works may be eligible for copyright protection, who owns that copyright? In general, the Copyright Act vests ownership “initially in the author or authors of the work.” Given the lack of judicial or Copyright Office decisions recognizing copyright in AI-created works to date, however, no clear rule has emerged identifying who the “author or authors” of these works could be. Returning to the photography analogy, the AI’s creator might be compared to the camera maker, while the AI user who prompts the creation of a specific work might be compared to the photographer who uses that camera to capture a specific image. On this view, the AI user would be considered the author and, therefore, the initial copyright owner. The creative choices involved in coding and training the AI, on the other hand, might give an AI’s creator a stronger claim to some form of authorship than the manufacturer of a camera.

Companies that provide AI software may attempt to allocate the respective ownership rights of the company and its users via contract, such as the company’s terms of service. OpenAI’s Terms of Use, for example, appear to assign any copyright to the user: “OpenAI hereby assigns to you all its right, title and interest in and to Output.” A previous version, by contrast, purported to give OpenAI such rights. As one scholar commented, OpenAI appears to “bypass most copyright questions through contract.”

Copyright Infringement by Generative AI

Generative AI also raises questions about copyright infringement. Commentators and courts have begun to address whether generative AI programs may infringe copyright in existing works, either by making copies of existing works to train the AI or by generating outputs that resemble those existing works.

Does the AI Training Process Infringe Copyright in Other Works?

AI systems are “trained” to create literary, visual, and other artistic works by exposing the program to large amounts of data, which may include text, images, and other works downloaded from the internet. This training process involves making digital copies of existing works. As the U.S. Patent and Trademark Office has described, this process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” OpenAI, for example, acknowledges that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “involves first making copies of the data to be analyzed” (although it now offers an option to remove images from training future image generation models). Creating such copies without permission may infringe the copyright holders’ exclusive right to make reproductions of their work.

AI companies may argue that their training processes constitute fair use and are therefore noninfringing. Whether or not copying constitutes fair use depends on four statutory factors under 17 U.S.C. § 107:

the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
the nature of the copyrighted work;
the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
the effect of the use upon the potential market for or value of the copyrighted work.

Some stakeholders argue that the use of copyrighted works to train AI programs should be considered a fair use under these factors. Regarding the first factor, OpenAI argues its purpose is “transformative” as opposed to “expressive” because the training process creates “a useful generative AI system.” OpenAI also contends that the third factor supports fair use because the copies are not made available to the public but are used only to train the program. For support, OpenAI cites The Authors Guild, Inc. v. Google, Inc., in which the U.S. Court of Appeals for the Second Circuit held that Google’s copying of entire books to create a searchable database that displayed excerpts of those books constituted fair use.

Regarding the fourth fair use factor, some generative AI applications have raised concern that training AI programs on copyrighted works allows them to generate similar works that compete with the originals. For example, an AI-generated song called “Heart on My Sleeve,” made to sound like the artists Drake and The Weeknd, was heard millions of times on streaming services. Universal Music Group, which has deals with both artists, argues that AI companies violate copyright by using these artists’ songs in training data. OpenAI states that its visual art program DALL-E 3 “is designed to decline requests that ask for an image in the style of a living artist.”

Plaintiffs have filed multiple lawsuits claiming the training process for AI programs infringed their copyrights in written and visual works. These include lawsuits by the Authors Guild and authors Paul Tremblay, Michael Chabon, Sarah Silverman, and others against OpenAI; separate lawsuits by Michael Chabon, Sarah Silverman, and others against Meta Platforms; proposed class action lawsuits against Alphabet Inc. and Stability AI and Midjourney; and a lawsuit by Getty Images against Stability AI. The Getty Images lawsuit, for instance, alleges that “Stability AI has copied at least 12 million copyrighted images from Getty Images’ websites . . . in order to train its Stable Diffusion model.” This lawsuit appears to dispute any characterization of fair use, arguing that Stable Diffusion is a commercial product, weighing against fair use under the first statutory factor, and that the program undermines the market for the original works, weighing against fair use under the fourth factor.

In September 2023, a U.S. district court ruled that a jury trial would be needed to determine whether it was fair use for an AI company to copy case summaries from Westlaw, a legal research platform, to train an AI program to quote pertinent passages from legal opinions in response to questions from a user. The court found that, while the defendant’s use was “undoubtedly commercial,” a jury would need to resolve factual disputes concerning whether the use was “transformative” (factor 1), to what extent the nature of the plaintiff’s work favored fair use (factor 2), whether the defendant copied more than needed to train the AI program (factor 3), and whether the AI program would constitute a “market substitute” for Westlaw (factor 4). While the AI program at issue might not be considered “generative” AI, the same kinds of facts might be relevant to a court’s fair-use analysis of making copies to train generative AI models.

Do AI Outputs Infringe Copyrights in Other Works?

AI programs might also infringe copyright by generating outputs that resemble existing works. Under U.S. case law, copyright owners may be able to show that such outputs infringe their copyrights if the AI program both (1) had access to their works and (2) created “substantially similar” outputs.

First, to establish copyright infringement, a plaintiff must prove the infringer “actually copied” the underlying work. This is sometimes proven circumstantially by evidence that the infringer “had access to the work.” For AI outputs, access might be shown by evidence that the AI program was trained using the underlying work. For instance, the underlying work might be part of a publicly accessible internet site that was downloaded or “scraped” to train the AI program.

Second, a plaintiff must prove the new work is “substantially similar” to the underlying work to establish infringement. The substantial similarity test is difficult to define and varies across U.S. courts. Courts have variously described the test as requiring, for example, that the works have “a substantially similar total concept and feel” or “overall look and feel” or that “the ordinary reasonable person would fail to differentiate between the two works.” Leading cases have also stated that this determination considers both “the qualitative and quantitative significance of the copied portion in relation to the plaintiff’s work as a whole.” For AI-generated outputs, no less than traditional works, the “substantial similarity” analysis may require courts to make these kinds of comparisons between the AI output and the underlying work.

There is significant disagreement as to how likely it is that generative AI programs will copy existing works in their outputs. OpenAI argues that “[w]ell-constructed AI systems generally do not regenerate, in any nontrivial portion, unaltered data from any particular work in their training corpus.” Thus, OpenAI states, infringement “is an unlikely accidental outcome.” By contrast, the Getty Images lawsuit alleges that “Stable Diffusion at times produces images that are highly similar to and derivative of the Getty Images.” One study has found “a significant amount of copying” in less than 2% of the images created by Stable Diffusion, but the authors claimed that their methodology “likely underestimates the true rate” of copying.

Two kinds of AI outputs may raise special concerns. First, some AI programs may be used to create works involving existing fictional characters. These works may run a heightened risk of copyright infringement insofar as characters sometimes enjoy copyright protection in and of themselves. Second, some AI programs may be prompted to create artistic or literary works “in the style of” a particular artist or author, although—as noted above—some AI programs may now be designed to “decline” such prompts. These outputs are not necessarily infringing, as copyright law generally prohibits the copying of specific works rather than an artist’s overall style. Regarding the AI-generated song “Heart on My Sleeve,” for instance, one commentator notes that the imitation of Drake’s voice appears not to violate copyright law, although it may raise concerns under state right-of-publicity laws. Nevertheless, some artists are concerned that AI programs are uniquely capable of mass-producing works that copy their style, potentially undercutting the value of their work. Plaintiffs in one lawsuit against Stable Diffusion, for example, claim that few human artists can successfully mimic another artist’s style, whereas “AI Image Products do so with ease.”

A final question is who is (or should be) liable if generative AI outputs do infringe copyrights in existing works. Under current doctrines, both the AI user and the AI company could potentially be liable. For instance, even if a user were directly liable for infringement, the AI company could potentially face liability under the doctrine of “vicarious infringement,” which applies to defendants who have “the right and ability to supervise the infringing activity” and “a direct financial interest in such activities.” The lawsuit against Stable Diffusion, for instance, claims that the defendant AI companies are vicariously liable for copyright infringement. One complication of AI programs is that the user might not be aware of—or have access to—a work that was copied in response to the user’s prompt. Under current law, this may make it challenging to analyze whether the user is liable for copyright infringement.

Considerations for Congress

Congress may consider whether any of the copyright law questions raised by generative AI programs require amendments to the Copyright Act or other legislation. Congress may, for example, consider legislation clarifying whether AI-generated works are copyrightable, who should be considered the author of such works, or when the process of training generative AI programs constitutes fair use. Given how little opportunity the courts and Copyright Office have had to address these issues, Congress may adopt a wait-and-see approach. As the courts gain experience handling cases involving generative AI, they may be able to provide greater guidance and predictability in this area through judicial opinions. Based on the outcomes of these cases, Congress may reassess whether legislative action is needed.

About the author: Christopher T. Zirpoli, Legislative Attorney

Source: This article was published by the Congressional Research Service (CRS)