I completely understand why some authors are extremely upset about finding out that their works were used to train AI. It feels wrong. It feels exploitive. (I do not understand their lawsuits, because I think they’re very much confused about how copyright law works. )
But, to me, many of the complaints about this amount to a similar discussion to ones we’ve had in the past, regarding concerns about if works were released without copyright, what would happen if someone “bad” reused them. This sort of thought experiment is silly, because once a work is released and enters the messy real world, it’s entirely possible for things to happen that the original creator disagrees with or hates. Someone can interpret the work in ridiculous ways. Or it can inspire bad people to do bad things. Or any of a long list of other possibilities.
The original author has the right to speak up about the bad things, or to denounce the bad people, but the simple fact is that once you’ve released a work into the world, the original author no longer has control over how that work is used and interpreted by the world. Releasing a work into the world is an act of losing control over that work and what others can do in response to it. Or how or why others are inspired by it.
But, when it comes to the AI fights, many are insisting that they want to do exactly that around AI, and much of this came to a head recently when The Atlantic released a tool that allowed anyone to search to see which authors were included in the Books3 dataset (one of multiple collections of books that have been used to train AI). This lead to a lot of people (both authors and non-authors) screaming about the evils of AI, and about how wrong it was that such books were included.
But, again, that’s the nature of releasing a work to the public. People read it. Machines might also read it. And they might use what they learn in that work to do something else. And you might like that and you might not, but it’s not really your call.
That’s why I was happy to see Ian Bogost publish an article explaining why he’s happy that his books were found in Books3, saying what those two other authors I spoke to wouldn’t say publicly. Ian is getting screamed at all over social media for this article, with most of it apparently based on the title and not on the substance. But it’s worth reading.
Whether or not Meta’s behavior amounts to infringement is a matter for the courts to decide. Permission is a different matter. One of the facts (and pleasures) of authorship is that one’s work will be used in unpredictable ways. The philosopher Jacques Derrida liked to talk about “dissemination,” which I take to mean that, like a plant releasing its seed, an author separates from their published work. Their readers (or viewers, or listeners) not only can but must make sense of that work in different contexts. A retiree cracks a Haruki Murakami novel recommended by a grandchild. A high-school kid skims Shakespeare for a class. My mother’s tree trimmer reads my book on play at her suggestion. A lack of permission underlies all of these uses, as it underlies influence in general: When successful, art exceeds its creator’s plans.
But internet culture recasts permission as a moral right. Many authors are online, and they can tell you if and when you’re wrong about their work. Also online are swarms of fans who will evangelize their received ideas of what a book, a movie, or an album really means and snuff out the “wrong” accounts. The Books3 imbroglio reflects the same impulse to believe that some interpretations of a work are out of bounds.
Perhaps Meta is an unappealing reader. Perhaps chopping prose into tokens is not how I would like to be read. But then, who am I to say what my work is good for, how it might benefit someone—even a near-trillion-dollar company? To bemoan this one unexpected use for my writing is to undermine all of the other unexpected uses for it. Speaking as a writer, that makes me feel bad.
More importantly, Bogost notes that the entire point of Books3 originally was to make sure that AI wasn’t just controlled by corporate juggernauts:
The Books3 database was itself uploaded in resistance to the corporate juggernauts. The person who first posted the repository has described it as the only way for open-source, grassroots AI projects to compete with huge commercial enterprises. He was trying to return some control of the future to ordinary people, including book authors. In the meantime, Meta contends that the next generation of its AI model—which may or may not still include Books3 in its training data—is “free for research and commercial use,” a statement that demands scrutiny but also complicates this saga. So does the fact that hours after The Atlantic published a search tool for Books3, one writer distributed a link that allows you to access the feature without subscribing to this magazine. In other words: a free way for people to be outraged about people getting writers’ work for free.
I’m not sure what I make of all this, as a citizen of the future no less than as a book author. Theft is an original sin of the internet. Sometimes we call it piracy (when software is uploaded to USENET, or books to Books3); other times it’s seen as innovation (when Google processed and indexed the entire internet without permission) or even liberation. AI merely iterates this ambiguity. I’m having trouble drawing any novel or definitive conclusions about the Books3 story based on the day-old knowledge that some of my writing, along with trillions more chunks of words from, perhaps, Amazon reviews and Reddit grouses, have made their way into an AI training set.
I get that it feels bad that your works are being used in ways you disapprove of, but that is the nature of releasing something into the world. And the underlying point of the Books3 database is to spread access to information to everyone. And that’s a good thing that should be supported, in the nature of folks like Aaron Swartz.
It’s the same reason why, even as lots of news sites are proactively blocking AI scanning bots, I’m actually hoping that more of them will scan and use Techdirt’s words to do more and to be better. The more information shared, the more we can do with it, and that’s a good thing.
I understand the underlying concerns, but that’s just part of what happens when you release a work to the world. Part of releasing something into the world is coming to terms with the fact that you no longer own how people will read it or be inspired by it, or what lessons they will take from it.
Organisational Structures | Technology and Science | Military, IT and Lifestyle consultancy | Social, Broadcast & Cross Media | Flying aircraft