Advertisement

The Internet Archive Lawsuit Marks an Ending

Internet Archive headquarters, San Francisco, California. Courtesy IA

At every turn, access to information is being cut off or paywalled. At what cost?

It’s the new maxim of this anti-information age: if libraries didn’t already exist, if these institutions where the public had free access to books and a wider range of media weren’t established millennia ago, they would probably be sued to the ground by today’s publishers and internet service providers alike.

Last week, the Internet Archive lost its appeal in its quest to lend out digitised books without explicit approval from publishers. The Second Circuit Court of appeals in the US sided with publishers in their claim that free digital lending cut into their bottom lines.

The Internet Archive’s Open Library lets patrons check out books for a limited amount of time. The library only has as many digital versions as it has physical copies, a system called Controlled Digital Lending (CDL). Whether they’re housed at the Internet Archive’s own facilities or in the shelves of its 130-plus partner libraries across the US, each of these books was legitimately purchased from publishers. As the Electronic Frontier Foundation, part of IA’s defence team, wrote in a statement: ‘Libraries have paid publishers billions of dollars for the books in their print collections, and are investing enormous resources in digitizing in order to preserve those texts.’

In March 2020, as COVID-19 spread, the Internet Archive moved to temporarily allow an unlimited number of people to access the same copies, a project called National Emergency Library that aimed to provide lockdown-bound students and professionals unable to access physical libraries at least some of the resources they need to do their work. While based in the US, the Internet Archive would be globally accessible online. Then, in June 2020, publishers Hachette, HarperCollins, Penguin Random House and Wiley filed a lawsuit for the infringement of their copyrights of 127 works.

Publishers claimed that as long as readers could be patrons of a free digital library, their physical and e-book sales would suffer. The Internet Archive’s defence brought in experts who conducted studies that found that the closure of the National Emergency Library did not affect publisher’s sales of physical copies and e-books; in fact, they decreased in some cases. While publishers did not ‘provide empirical data to support [their] observation’, the court nevertheless decided it was ‘self-evident’ that publishers would suffer ‘market harm’ in the future if the Internet Archive’s practices were to become widespread.

The ruling triggered the immediate removal of over 500,000 books. Over 1,300 of these books are either banned or challenged in the US; if they haven’t been removed from school and public libraries, they might soon be. The Open Library used to be the last shred of hope the average person had of free access to such books. Now that they’ve been removed, the book bans feel totalising. On Twitter/X, Internet Archive has mentioned the oh-so-ironic removal of George Orwell’s 1984 and Ray Bradbury’s Fahrenheit 451.

The main hall at the Internet Archive headquarters, 2013. Photo: Jason ‘Textfiles’ Scott/Flicker

The president of the Association of American Publishers told Wired that the ruling ‘upholds the rights of authors and publishers to license and be compensated for their books and other creative works’, and that this kind of free digital lending was ‘appropriating the value of derivative works that are a key part of the author’s copyright bundle’. In other words: that it’s the free research resources and the prospect of a wider readership through digital lending that’s truly making it hard to be a writer these days. But, as Authors Alliance writes, this is a ‘gross generalization and mischaracterization’ of authors’ experience. ‘Authors are researchers. Authors are readers. IA’s digital library helps authors create new work and supports their interests in having their works read.’

These digital-first interventions are rippling through a continuum of information networks as the public’s access to books and quality information is being attacked from all sides, with library closures and book bans taking over policy. More than half a million books have been knocked overnight out of this major digital lending library, while a recent lawsuit from Universal Music Group Sony Music Entertainment could result in high-enough damages to put an end to the Internet Archive, taking the Wayback Machine (which since 1996 has archived billions of now-defunct webpages) out with it.

At every turn, access to information is being cut off or paywalled: more rare, banned and out-of-print books will be forgotten; more research resources will be further concentrated in expensive-to-live-in metropolitan cities as local libraries close down; whatever is left will be enshrined behind institutional login credentials. A publishing sector with a penchant for monopolies and white writers will become even more powerful. And for the regular internet user, an even thicker fog of AI-generated muck will block off any meaningful effort to find answers to even the simplest questions.

In theory, major publishers’ ability to keep lawyers on retainer should mean that writers can count on publishers to protect their work from being cannibalised by generative AI businesses. Or, they could sell out writers for some easy cash by formalising the very dynamics that undermine and devalue their work. In fact, earlier this year Open AI announced ‘pacts’ with Vox and The Atlantic to license their content for the ChatGPT artificial-intelligence chatbot, after finalising similar deals with Reddit, the Financial Times, Dotdash Meredith and News Corp.

Ironically, the very generative AI products that threaten to siphon off even more value from writers’ works rely on them for their continual survival: the best generative AI models depend on massively large sets of high-quality language data – also known as news articles, scientific papers and books.

This is the raw material needed to make smart devices talk and chat; it’s where AI-powered search results get their answers from, where the ‘intelligence’ of ‘Artificial Intelligence’ comes from. A 2022 research paper estimates large language model developers could run out of high-quality language stock by 2027. After that, Google’s AI search assistant might be forced to retrieve its answers from a fast-growing vat of other AI-generated factoids and search query responses – synthetic data, this is called. While the Internet Archive faces extinction-level lawsuits for the success of its lending library and readers everywhere are deprived of another basic resource, generative AI businesses get to use centuries worth of copyrighted books, for free.

Michelle Santiago Cortés is a writer and critic based in Puerto Rico

Most recent

Advertisement
Advertisement

We use cookies to understand how you use our site and to improve your experience. This includes personalizing content. By continuing to use our site, you accept our use of cookies, revised Privacy.

arrow-leftarrow-rightarrow-downfacebookfullscreen-offfullscreeninstagramlinkedinlistloupepauseplaysound-offsound-ontwitterwechatx