The Atlantic has good reporting on the use of pirated books to train A.I. at Meta (Mark Zuckerberg’s company, which owns Facebook and Instagram). Discouraged by the potential cost of going the legal route—paying to license books in order to train their A.I. program—Meta instead worked around the law and turned to Library Genesis, or LibGen, a pirate library. Meta appears to have obtained Zuckerberg’s permission to download and use LibGen’s data set, pirating millions of books. It’s alleged that OpenAI has done the same.
Critical reaction to the news is likely to home in on tech barons like Zuckerberg, understandably. But leave some room for anger at LibGen. A darling of certain online punk-ish sorts, LibGen is depicted by supporters as a resource for people “who are not committing any crimes” but simply “want to read.” (Readers who can’t or don’t want to pay for books have, of course, real libraries at their disposal, but it’s edgier just to erase legitimate institutions that pay authors and publishers.) I’ve written before about the ways book piracy attaches itself to liberatory rhetoric while actually advancing reactionary goals like austerity. If you need further evidence that pirate libraries like LibGen are not, in fact, stick-it-to-The-Man punks, perhaps the fact that enormous corporations like Meta are using them to build A.I. will be persuasive.
None of this is an accident. The push for “open” has always shared a lot of DNA with Silicon Valley’s dreams of A.I. and, ultimately, the singularity. If you want to train machines by ingesting books, your work is made easier (and cheaper) when there’s a widespread belief that “paywalling” books is somehow evil.
In other words: Big Tech isn’t misusing pirate libraries, but using them in ways that strike me as baked into the ideological position that information wants to be free. It’s up to authors, publishers, librarians, and readers to decide whether that’s the side they want to support, or the future they want to see.
If only Meta would trade: open access to its algorithms for access to author’s content. In a dream world.
Derek - Curious if you have an opinion about web-dot-archive-dot-org a.k.a. Internet Archive a.k.a. Wayback Machine. On the one hand, I've definitely used it to get around paywalls.* On the other, the Wayback Machine gets stuff up that would be lost to history if the Wayback Machine weren't collecting it, e.g., in 2010 there was a long brilliant Singles Jukebox thread about Ke$ha, it spilled over into 51 comments which caused the Singles Jukebox site to screw up and only show the 51st comment (it's by me!) whereas the Wayback Machine has all 51 comments. And when the music magazine Stylus went dead its site did too, so the only way to find my friend Dave Moore's 2007 Stylus piece on The Bluffer's Guide To Teenpop or his interview with Brie Larson is at the Wayback Machine. Also, the Wayback Machine's got my Disco Tex essay, from when I typed it up for my friend Mark Sinker's website (now not operable) sometime in the early '00s. Here's Dave's Brie Larson interview:
https://web.archive.org/web/20110523104133/http://www.stylusmagazine.com/articles/pop_playground/sugar-shock-013-bunnies-traps-and-slip-n-slides-an-interview-with-brie-larson.htm
*But this is something of a PITA: have to get a URL by hovering over a link with my cursor, copy that URL from the bottom left corner of my screen by hand, and type it in digit by digit.