Why Italy’s Piracy Shield destroys huge internet companies and small businesses with no recourse (unless you are rich) and can lay out the entire internet in Italy to… protect against football streaming?!

Walled Culture has been following the sorry saga of Italy’s automated blocking system Piracy Shield for a year now. Blocklists are drawn up by copyright companies, without any review, or the possibility of any objections, and those blocks must be enforced within 30 minutes. Needless to say, such a ham-fisted and biased approach to copyright infringement is already producing some horrendous blunders.

For example, back in March Walled Culture reported that one of Cloudflare’s Internet addresses had been blocked by Piracy Shield. There were over 40 million domains associated with the blocked address – which shows how this crude approach can cause significant collateral damage to millions of sites not involved in any alleged copyright infringement.

Every new system has teething troubles, although not normally on this scale. But any hope that Italy’s national telecoms regulator, Autorità per le Garanzie nelle Comunicazioni (Authority for Communications Guarantees, AGCOM), the body running Piracy Shield, would have learned from the Cloudflare fiasco in order to stop it happening again was dispelled by what took place in October. TorrentFreak explains:

After blocking Cloudflare to prevent IPTV piracy just a few months ago, on Saturday the rightsholders behind Piracy Shield ordered Italy’s ISPs to block Google Drive. The subsequent nationwide blackout, affecting millions of Italians, wasn’t just a hapless IP address blunder. This was the reckless blocking of a Google.com subdomain that many 10-year-olds could identify as being important. Reckless people and internet infrastructure, what could possibly go wrong next?

The following day, there was a public discussion online involving the current and former AGCOM Commissioners, as well as various experts in relevant areas. The current AGCOM Commissioner Capitanio showed no sense of remorse for what happened. According to TorrentFreak’s report on the discussion:

Capitanio’s own focus on blocking to protect football was absolute. There was no concern expressed towards Google or the millions of users affected by the extended blackout, only defense of the Piracy Shield system.

Moreover:

AGCOM’s chief then went on to complain about Google’s refusal to delete Android apps already installed on users devices and other measures AGCOM regularly demands, none of which are required by law.

It seems that Capitanio regards even the current, one-sided and extreme Piracy Shield as too weak, and was trying to persuade Google to go even further than the law required – a typical copyright maximalist attitude. But worse was to come. Another participant in the discussion, former member of the Italian parliament, IT expert, and founder of Rialto Venture Capital, Stefano Quintarelli, pointed out a deeply worrying possibility:

the inherent insecurity of the Piracy Shield platform introduces a “huge systemic vulnerability” that eclipses the fight against piracy. Italy now has a system in place designed to dramatically disrupt internet communications and since no system is entirely secure, what happens if a bad actor somehow gains control?

Quintarelli says that if the Piracy Shield platform were to be infiltrated and maliciously exploited, essential services like hospitals, transportation systems, government functions, and critical infrastructure would be exposed to catastrophic blocking.

In other words, by placing the sanctity of copyright above all else, the Piracy Shield system could be turned against any aspect of Italian society with just a few keyboard commands. A malicious actor that managed to gain access to a system that has twice demonstrated a complete lack of even the most basic controls and checks could wreak havoc on computers and networks throughout Italy in a few seconds. Moreover, the damage could easily go well beyond the inconvenience of millions of people being blocked from accessing their files on Google Drive. A skilled intruder could carry out widespread sabotage of vital services and infrastructure that would cost billions of euros to rectify, and could even lead to the loss of lives.

No wonder, then, that an AGCOM board member, Elisa Giomi, has gone public with her concerns about the system. Giomi’s detailed rundown of Piracy Shield’s long-standing problems was posted in Italian on LinkedIn; TorrentFreak has a translation, and summarises the current situation as follows:

Despite a series of failures concerning Italy’s IPTV blocking platform Piracy Shield and the revelation that the ‘free’ platform will cost €2m per year, telecoms regulator AGCOM insists that all is going to plan. After breaking ranks, AGCOM board member Elisa Giomi called for the suspension of Piracy Shield while decrying its toll on public resources. When she was warned for her criticism, coupled with a threat of financial implications, Giomi came out fighting.

It’s clear that the Piracy Shield tragedy is far from over. It’s good to see courageous figures like Giomi joining the chorus of disapproval.

Source: Why Italy’s Piracy Shield risks moving from tiresome digital farce to serious national tragedy – Walled Culture

Italy, copyright – retarded doesn’t even begin to describe it.

BBC Gives Away huge Sound Effects Library, with readable and sensible terms of use

BBC Sound Effects website top

Terms for using our content

A few rules to stop you (and us) getting in trouble.

a) Don’t mess with our content
What do we mean by that? This sort of thing:

  • Removing or altering BBC logos, and copyright notices from the content (if there are any)
  • Not removing content from your device or systems when we ask you to. This might happen when we take down content either temporarily or permanently, which we can do at any time, without notice.
b) Don’t use our content for harmful or offensive purposes
Here’s a list of things that may harm or offend:

  • Insulting, misleading, discriminating or defaming (damaging people’s reputations)
  • Promoting pornography, tobacco or weapons
  • Putting children at risk
  • Anything illegal. Like using hate speech, inciting terrorism or breaking privacy law
  • Anything that would harm the BBC’s reputation
  • Using our content for political or social campaigning purposes or for fundraising.
c) Don’t make it look like our content costs money

If you put our content on a site that charges for content, you have to say it is free-to-view.

d) Don’t make our content more prominent than non-BBC content

Otherwise it might look like we’re endorsing you. Which we’re not allowed to do.

Also, use our content alongside other stuff (e.g. your own editorial text). You can’t make a service of your own that contains only our content.

Speaking of which…

e) Don’t exaggerate your relationship with the BBC

You can’t say we endorse, promote, supply or approve of you.

And you can’t say you have exclusive access to our content.

f) Don’t associate our content with advertising or sponsorship
That means you can’t:

  • Put any other content between the link to our content and the content itself. So no ads or short videos people have to sit through
  • Put ads next to or over it
  • Put any ads in a web page or app that contain mostly our content
  • Put ads related to their subject alongside our content. So no trainer ads with an image of shoes
  • Add extra content that means you’d earn money from our content.
g) Don’t be misleading about where our content came from

You can’t remove or alter the copyright notice, or imply that someone else made it.

h) Don’t pretend to be the BBC
That includes:

  • Using our brands, trade marks or logos without our permission
  • Using or mentioning our content in press releases and other marketing materials
  • Making money from our content. You can’t charge people to view our images, for example
  • Sharing our content. For example, no uploading to social media sites. Sharing links is OK.

Source: Licensing | BBC Sound Effects

This is how licenses should be written. Well done, BBC.

Epic Allows Internet Archive To Distribute For Free ‘Unreal’ & ‘Unreal Tournament’ Forever

One of the most frustrating aspects in the ongoing conversation around the preservation of older video games, also known as cultural output, is the collision of IP rights and some publishers’ unwillingness to both continue to support and make available these older games and their refusal to release those same games into the public domain so that others can do so. It creates this crazy situation in which a company insists on retaining its copyrights over a video game that it has effectively disappeared with no good or legitimate way for the public to preserve them. As I’ve argued for some time now, this breaks the copyright contract with the public and should come with repercussions. The whole bargain that is copyright law is that creative works are granted a limited monopoly on the production of that work, with that work eventually arriving into the public domain. If that arrival is not allowed to occur, the bargain is broken, and not by anyone who would supposedly “infringe” on the copyright of that work.

[…]

But it just doesn’t have to be like this. Companies could be willing to give up their iron-fisted control over their IP for these older games they aren’t willing to support or preserve themselves and let others do it for them. And if you need a real world example of that, you need look only at how Epic is working with The Internet Archive to do exactly that.

Epic, now primarily known for Fortnite and the Unreal Engine, has given permission for two of the most significant video games ever made, Unreal and Unreal Tournament, to be freely accessed via the Internet Archive. As spotted by RPS, via ResetEra, the OldUnreal group announced the move on their Discord, along with instructions for how to easily download and play them on modern machines.

Huge kudos to Epic for being cool with this, because while it shouldn’t be unusual to happily let people freely share a three-decade-old game you don’t sell any more, it’s vanishingly rare. And if you remain in any doubt, we just got word back from Epic confirming they’re on board.

“We can confirm that Unreal 1 and Unreal Tournament are available on archive.org,” a spokesperson told us by email, “and people are free to independently link to and play these versions.”

Importantly, OldUnreal and The Internet Archive very much know what they’re doing here. Grabbing the ZIP file for the game sleekly pulls the ISO directly from The Internet Archive, installs it, and there are instructions for how to get the game up and running on modern hardware. This is obviously a labor of love from fans dedicated toward keeping these two excellent games alive.

[…]

But this is just two games. What would be really nice to see is this become a trend, or, better yet, a program run by The Internet Archive. Don’t want to bother to preserve your old game? No problem, let the IA do it for you!

Source: Epic Allows Internet Archive To Distribute For Free ‘Unreal’ & ‘Unreal Tournament’ Forever | Techdirt

Now the copyright industry wants to apply deep, automated blocking to the Internet’s core routers

A central theme of Walled Culture the book (free digital versions available) and this blog is that the copyright industry is never satisfied. Now matter how long the term of copyright, publishers and recording companies want more. No matter how harsh the punishments for infringement, the copyright intermediaries want them to be even more severe.

Another manifestation of this insatiability is seen in the ever-widening use of Internet site blocking. What began as a highly-targeted one-off in the UK, when a court ordered the Newzbin2 site to be blocked, has become a favoured method of the copyright industry for cutting off access to thousands of sites around the world, including many blocked by mistake. Even more worryingly, the approach has led to blocks being implemented in some key parts of the Internet’s infrastructure that have no involvement with the material that flows through them: they are just a pipe. For example, last year we wrote about courts ordering the content delivery network Cloudflare to block sites. But even that isn’t enough it seems. A post on TorrentFreak reports on a move to embed site blocking at the very heart of the Internet. This emerges from an interview about the Brazilian telecoms regulator Anatel:

In an interview with Tele.Sintese, outgoing Anatel board member Artur Coimbra recalls the lack of internet infrastructure in Brazil as recently as 2010. As head of the National Broadband Plan under the Ministry of Communications, that’s something he personally addressed. For Anatel today, blocking access to pirate websites and preventing unauthorized devices from communicating online is all in a day’s work.

Here’s the key revelation spotted by TorrentFreak:

“The second step, which we still need to evaluate because some companies want it, and others are more hesitant, is to allow Anatel to have access to the core routers to place a direct order on the router,” Coimbra reveals, referencing IPTV [Internet Protocol television] blocking.

“In these cases, these companies do not need to have someone on call to receive the [blocking] order and then implement it.”

Later on, Coimbra clarifies how far along this plan is:

“Participation is voluntary. We are still testing with some companies. So, it will take some time until it actually happens,” Coimbra says. “I can’t say [how long]. Our inspection team is carrying out tests with some operators, I can’t say which ones.”

Even if this is still in the testing phase, and only with “some” companies, it’s a terrible precedent. It means that blocking – and thus censorship – can be applied automatically, possibly without judicial oversight, to some of the most fundamental parts of the Internet’s plumbing. Once that happens, it will spread, just as the original single site block in the UK has spread worldwide. There’s even a hint that might already be happening. Asked if such blocking is being applied anywhere else, Coimbra replies:

“I don’t know. Maybe in Spain and Portugal, which are more advanced countries in this fight. But I don’t have that information,” Coimbra responds, randomly naming two countries with which Brazil has consulted extensively on blocking matters.

Although it’s not clear from that whether Spain and Portugal are indeed taking this route, the fact that Coimbra suggests that they might be is deeply troubling. And even if they aren’t, we can be sure that the copyright industry will keep demanding Internet blocks and censorship at the deepest level until they get them.

Source: Now the copyright industry wants to apply deep, automated blocking to the Internet’s core routers – Walled Culture

Judge: Just Because AI Trains On Your Publication, Doesn’t Mean It Infringes On Your Copyright. Another case thrown out.

I get that a lot of people don’t like the big AI companies and how they scrape the web. But these copyright lawsuits being filed against them are absolute garbage. And you want that to be the case, because if it goes the other way, it will do real damage to the open web by further entrenching the largest companies. If you don’t like the AI companies find another path, because copyright is not the answer.

So far, we’ve seen that these cases aren’t doing all that well, though many are still ongoing.

Last week, a judge tossed out one of the early ones against OpenAI, brought by Raw Story and Alternet.

Part of the problem is that these lawsuits assume, incorrectly, that these AI services really are, as some people falsely call them, “plagiarism machines.” The assumption is that they’re just copying everything and then handing out snippets of it.

But that’s not how it works. It is much more akin to reading all these works and then being able to make suggestions based on an understanding of how similar things kinda look, though from memory, not from having access to the originals.

Some of this case focused on whether or not OpenAI removed copyright management information (CMI) from the works that they were being trained on. This always felt like an extreme long shot, and the court finds Raw Story’s arguments wholly unconvincing in part because they don’t show any work that OpenAI distributed without their copyright management info.

For one thing, Plaintiffs are wrong that Section 1202 “grant[ s] the copyright owner the sole prerogative to decide how future iterations of the work may differ from the version the owner published.” Other provisions of the Copyright Act afford such protections, see 17 U.S.C. § 106, but not Section 1202. Section 1202 protects copyright owners from specified interferences with the integrity of a work’s CMI. In other words, Defendants may, absent permission, reproduce or even create derivatives of Plaintiffs’ works-without incurring liability under Section 1202-as long as Defendants keep Plaintiffs’ CMI intact. Indeed, the legislative history of the DMCA indicates that the Act’s purpose was not to guard against property-based injury. Rather, it was to “ensure the integrity of the electronic marketplace by preventing fraud and misinformation,” and to bring the United States into compliance with its obligations to do so under the World Intellectual Property Organization (WIPO) Copyright Treaty, art. 12(1) (“Obligations concerning Rights Management Information”) and WIPO Performances and Phonograms Treaty….

Moreover, I am not convinced that the mere removal of identifying information from a copyrighted work-absent dissemination-has any historical or common-law analogue.

Then there’s the bigger point, which is that the judge, Colleen McMahon, has a better understanding of how ChatGPT works than the plaintiffs and notes that just because ChatGPT was trained on pretty much the entire internet, that doesn’t mean it’s going to infringe on Raw Story’s copyright:

Plaintiffs allege that ChatGPT has been trained on “a scrape of most of the internet,” Compl. , 29, which includes massive amounts of information from innumerable sources on almost any given subject. Plaintiffs have nowhere alleged that the information in their articles is copyrighted, nor could they do so. When a user inputs a question into ChatGPT, ChatGPT synthesizes the relevant information in its repository into an answer. Given the quantity of information contained in the repository, the likelihood that ChatGPT would output plagiarized content from one of Plaintiffs’ articles seems remote.

Finally, the judge basically says, “Look, I get it, you’re upset that ChatGPT read your stuff, but you don’t have an actual legal claim here.”

Let us be clear about what is really at stake here. The alleged injury for which Plaintiffs truly seek redress is not the exclusion of CMI from Defendants’ training sets, but rather Defendants’ use of Plaintiffs’ articles to develop ChatGPT without compensation to Plaintiffs. See Compl. ~ 57 (“The OpenAI Defendants have acknowledged that use of copyright-protected works to train ChatGPT requires a license to that content, and in some instances, have entered licensing agreements with large copyright owners … They are also in licensing talks with other copyright owners in the news industry, but have offered no compensation to Plaintiffs.”). Whether or not that type of injury satisfies the injury-in-fact requirement, it is not the type of harm that has been “elevated” by Section 1202(b )(i) of the DMCA. See Spokeo, 578 U.S. at 341 (Congress may “elevate to the status of legally cognizable injuries, de facto injuries that were previously inadequate in law.”). Whether there is another statute or legal theory that does elevate this type of harm remains to be seen. But that question is not before the Court today.

While the judge dismisses the case with prejudice and says they can try again, it would appear that she is skeptical they could do so with any reasonable chance of success:

In the event of dismissal Plaintiffs seek leave to file an amended complaint. I cannot ascertain whether amendment would be futile without seeing a proposed amended pleading. I am skeptical about Plaintiffs’ ability to allege a cognizable injury but, at least as to injunctive relief, I am prepared to consider an amended pleading.

I totally get why publishers are annoyed and why they keep suing. But copyright is the wrong tool for the job. Hopefully, more courts will make this clear and we can get past all of these lawsuits.

Source: Judge: Just Because AI Trains On Your Publication, Doesn’t Mean It Infringes On Your Copyright | Techdirt

Feds Say You Don’t Have a Right to Check Out Retro Video Games Like Library Books. Want you to pirate them apparently.

Most of the world’s video games from close to 50 years of history are effectively, legally dead. A Video Games History Foundation study found you can’t buy nearly 90% of games from before 2010. Preservationists have been looking for ways to allow people to legally access gaming history, but the U.S. Copyright Office dealt them a heavy blow Friday. Feds declared that you or any researcher has no right to access old games under the Digital Millennium Copyright Act, or DMCA.

Groups like the VGHF and the Software Preservation Network have been putting their weight behind an exemption to the DMCA surrounding video game access. The law says that you can’t remotely access old, defunct games that are still under copyright without a license, even though they’re not available for purchase. Current rules in the DMCA restrict libraries and repositories of old games to one person at a time, in person.

The foundation’s proposed exemption would have allowed more than one person at a time to access the content stored in museums, archives, and libraries. This would allow players to access a piece of video game history like they would if they checked out an ebook from a library. The VGHF and SPN argued that if the museum has several copies of a game in its possession, then it should be able to allow as many people to access the game as there are copies available.

In the Copyright Office’s decision dated Oct. 18 (found on Page 30), Director Shira Perlmutter agreed with multiple industry groups, including the Entertainment Software Association. She recommended the Library of Congress keep the same restrictions. Section 1201 of the DMCA restricts “unauthorized” access to copyrighted works, including games. However, it allows the Library of Congress to allow some classes of people to circumvent those restrictions.

In a statement, the VGHF said lobbying efforts from rightsholders “continue to hold back progress.” The group pointed to comments from a representative from the ESA. An attorney for the ESA told Ars Technica, “I don’t think there is at the moment any combinations of limitations that ESA members would support to provide remote access.”

Video game preservationists said these game repositories could provide full-screen popups of copyright notices to anybody who checked out a game. They would also restrict access to a time limit or force users to access via “technological controls,” like a purpose-built distribution of streaming platforms.

Industry groups argued that those museums didn’t have “appropriate safeguards” to prevent users from distributing the games once they had them in hand. They also argued that there’s a “substantial market” for older or classic games, and a new, free library to access games would “jeopardize” this market. Perlmutter agreed with the industry groups.

“While the Register appreciates that proponents have suggested broad safeguards that could deter recreational uses of video games in some cases, she believes that such requirements are not specific enough to conclude that they would prevent market harms,” she wrote.

Do libraries that lend books hurt the literary industry? In many cases, publishers see libraries as free advertising for their products. It creates word of mouth, and since libraries only have a limited number of copies, those who want a book to read for longer are incentivized to purchase one. The video game industry is so effective at shooting itself in the foot that it doesn’t even recognize when third-party preservationists are actively about to help them for no cost on the publishers’ part.

If there is such a substantial market for classic games, why are so many still unavailable for purchase? Players will inevitably turn to piracy or emulation if there’s no easy-to-access way of playing older games.

“The game industry’s absolutist position… forces researchers to explore extra-legal methods to access the vast majority of out-of-print video games that are otherwise unavailable,” the VGHF wrote.

Source: Feds Say You Don’t Have a Right to Check Out Retro Video Games Like Library Books

Juicy Licensing Deals With AI Companies Show That Publishers Don’t Actually Care About Creators

One of the many interesting aspects of the current enthusiasm for generative AI is the way that it has electrified the formerly rather sleepy world of copyright. Where before publishers thought they had successfully locked down more or less everything digital with copyright, they now find themselves confronted with deep-pocketed companies – both established ones like Google and Microsoft, and newer ones like OpenAI – that want to overturn the previous norms of using copyright material. In particular, the latter group want to train their AI systems on huge quantities of text, images, videos and sounds.

As Walled Culture has reported, this has led to a spate of lawsuits from the copyright world, desperate to retain their control over digital material. They have framed this as an act of solidarity with the poor exploited creators. It’s a shrewd move, and one that seems to be gaining traction. Lots of writers and artists think they are being robbed of something by Big AI, even though that view is based on a misunderstanding of how generative AI works. However, in the light of stories like one in The Bookseller, they might want to reconsider their views about who exactly is being evil here:

Academic publisher Wiley has revealed it is set to make $44 million (£33 million) from Artificial Intelligence (AI) partnerships that it is not giving authors the opportunity to opt-out from.

As to whether authors would share in that bounty:

A spokesperson confirmed that Wiley authors are set to receive remuneration for the licensing of their work based on their “contractual terms”.

That might mean they get nothing, if there is no explicit clause in their contract about sharing AI licensing income. For example, here’s what is happening with the publisher Taylor & Francis:

In July, authors hit out another academic publisher, Taylor & Francis, the parent company of Routledge, over an AI deal with Microsoft worth $10 million, claiming they were not given the opportunity to opt out and are receiving no extra payment for the use of their research by the tech company. T&F later confirmed it was set to make $75 million from two AI partnership deals.

It’s not just in the world of academic publishing that deals are being struck. Back in July, Forbes reported on a “flurry of AI licensing activity”:

The most active area for individual deals right now by far—judging from publicly known deals—is news and journalism. Over the past year, organizations including Vox Media (parent of New York magazine, The Verge, and Eater), News Corp (Wall Street Journal, New York Post, The Times (London)), Dotdash Meredith (People, Entertainment Weekly, InStyle), Time, The Atlantic, Financial Times, and European giants such as Le Monde of France, Axel Springer of Germany, and Prisa Media of Spain have each made licensing deals with OpenAI.

In the absence of any public promises to pass on some of the money these licensing deals will bring, it is not unreasonable to assume that journalists won’t be seeing much if any of it, just as they aren’t seeing much from the link tax.

The increasing number of such licensing deals between publishers and AI companies shows that the former aren’t really too worried about the latter ingesting huge quantities of material for training their AI systems, provided they get paid. And the fact that there is no sign of this money being passed on in its entirety to the people who actually created that material, also confirms that publishers don’t really care about creators. In other words, it’s pretty much what was the status quo before generative AI came along. For doing nothing, the intermediaries are extracting money from the digital giants by invoking the creators and their copyrights. Those creators do all the work, but once again see little to no benefit from the deals that are being signed behind closed doors.

Source: Juicy Licensing Deals With AI Companies Show That Publishers Don’t Actually Care About Creators | Techdirt

German court: LAION’s generative AI training dataset is legal thanks to EU copyright exceptions

The copyright world is currently trying to assert its control over the new world of generative AI through a number of lawsuits, several of which have been discussed previously on Walled Culture. We now have our first decision in this area, from the regional court in Hamburg. Andres Guadamuz has provided an excellent detailed analysis of a ruling that is important for the German judges’ discussion of how EU copyright law applies to various aspects of generative AI. The case concerns the freely-available dataset from LAION (Large-scale Artificial Intelligence Open Network), a German non-profit. As the LAION FAQ says: “LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images.” Guadamuz explains:

The case was brought by German photographer Robert Kneschke, who found that some of his photographs had been included in the LAION dataset. He requested the images to be removed, but LAION argued that they had no images, only links to where the images could be found online. Kneschke argued that the process of collecting the dataset had included making copies of the images to extract information, and that this amounted to copyright infringement.

LAION admitted making copies, but said that it was in compliance with the exception for text and data mining (TDM) present in German law, which is a transposition of Article 3 of the 2019 EU Copyright Directive. The German judges agreed:

The court argued that while LAION had been used by commercial organisations, the dataset itself had been released to the public free of charge, and no evidence was presented that any commercial body had control over its operations. Therefore, the dataset is non-commercial and for scientific research. So LAION’s actions are covered by section 60d of the German Copyright Act

That’s good news for LAION and its dataset, but perhaps more interesting for the general field of generative AI is the court’s discussion of how the EU Copyright Directive and its exceptions apply to AI training. It’s a key question because copyright companies claim that they don’t, and that when such training involves copyright material, permission is needed to use it. Guadamuz summarises that point of view as follows:

the argument is that the legislators didn’t intend to cover generative AI when they passed the [EU Copyright Directive], so text and data mining does not cover the training of a model, just the making of a copy to extract information from it. The argument is that making a copy to extract information to create a dataset is fine, as the court agreed here, but the making of a copy in order to extract information to make a model is not. I somehow think that this completely misses the way in which a model is trained; a dataset can have copies of a work, or in the case of LAION, links to the copies of the work. A trained model doesn’t contain copies of the works with which it was trained, and regurgitation of works in the training data in an output is another legal issue entirely.

The judgment from the Hamburg court says that while legislators may not have been aware of generative AI model training in 2019, when they drew up the EU Copyright Directive, they certainly are now. The judges use the EU’s 2024 AI Act as evidence of this, citing a paragraph that makes explicit reference to AI models complying with the text and data mining regulation in the earlier Copyright Directive.

As Guadamuz writes in his post, this is an important point, but the legal impact may be limited. The judgment is only the view of a local German court, so other jurisdictions may produce different results. Moreover, the original plaintiff Robert Kneschke may appeal and overturn the decision. Furthermore, the ruling only concerns the use of text and data mining to create a training dataset, not the actual training itself, although the judges’ thoughts on the latter indicate that it would be legal too. In other words, this local outbreak of good sense in Germany is welcome, but we are still a long way from complete legal clarity on the training of generative AI systems on copyright material.

Source: German court: LAION’s generative AI training dataset is legal thanks to EU copyright exceptions – Walled Culture

Penguin Random House is adding an AI warning to its books’ copyright pages fwiw

Penguin Random House, the trade publisher, is adding language to the copyright pages of its books to prohibit the use of those books to train AI.

The Bookseller reports that new books and reprints of older titles from the publisher will now include the statement, “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems.”

While the use of copyrighted material to train AI models is currently being fought over in multiple lawsuits, Penguin Random House appears to be the first major publisher to update its copyright pages to reflect these new concerns.

The update doesn’t mean Penguin Random House is completely opposed to the use of AI in book publishing. In August, it outlined an initial approach to generative AI, saying it will “vigorously defend the intellectual property that belongs to our authors and artists” while also promising to “use generative AI tools selectively and responsibly, where we see a clear case that they can advance our goals.”

Source: Penguin Random House is adding an AI warning to its books’ copyright pages | TechCrunch

Penguin spins it in support of authors, but the whole copyright thing only really fills the pockets of the publishers (eg. Juicy licensing deals with AI companies show that publishers don’t really care about creators). This will probably not hold up in court.

Italy is losing its mind because of copyright: it just made its awful Piracy Shield even worse

Walled Culture has been writing about Italy’s Piracy Shield system for a year now. It was clear from early on that its approach of blocking Internet addresses (IP addresses) to fight alleged copyright infringement – particularly the streaming of football matches – was flawed, and risked turning into another fiasco like France’s failed Hadopi law. The central issue with Piracy Shield is summed up in a recent post on the Disruptive Competition Blog:

The problem is that Italy’s Piracy Shield enables the blocking of content at the IP address and DNS level, which is particularly problematic in this time of shared IP addresses. It would be similar to arguing that if in a big shopping mall, in which dozens of shops share the same address, one shop owner is found to sell bootleg vinyl records with pirated music, the entire mall needs to be closed and all shops are forced to go out of business.

As that post points out, Italy’s IP blocking suffers from several underlying problems. One is overblocking, which has already happened, as Walled Culture noted back in March. Another issue is lack of transparency:

The Piracy Shield that has been implemented in Italy is fully automated, which prevents any transparency on the notified IP addresses and lacks checks and balances performed by third parties, who could verify whether the notified IP addresses are exclusively dedicated to piracy (and should be blocked) or not.

Piracy Shield isn’t working, and causes serious collateral damage, but instead of recognising this, its supporters have doubled down, and have just convinced the Italian parliament to pass amendments making it even worse, reported here by TorrentFreak:

VPN and DNS services anywhere on planet earth will be required to join Piracy Shield and start blocking pirate sites, most likely at their own expense, just like Italian ISPs are required to do already.

Moving forward, if pirate sites share an IP address with entirely innocent sites, and the innocent sites are outnumbered, ISPs, VPNs and DNS services will be legally required to block them all.

A new offence has been created that is aimed at service providers, including network access providers, who fail to report promptly illegal conduct by their users to the judicial authorities in Italy or the police there. Maximum punishment is not just a fine, but imprisonment for up to one year. Just why this is absurd is made clear by this LinkedIn comment by Diego Ciulli, Head of Government Affairs and Public Policy, Google Italy (translation by DeepL):

Under the label of ‘combating piracy’, the Senate yesterday approved a regulation obliging digital platforms to notify the judicial authorities of all copyright infringements – present, past and future – of which they become aware. Do you know how many there are in Google’s case? Currently, 9,756,931,770.

In short, the Senate is asking us to flood the judiciary with almost 10 billion URLs – and foresees jail time if we miss a single notification.

If the rule is not corrected, the risk is to do the opposite of the spirit of the law: flooding the judiciary, and taking resources away from the fight against piracy.

The new law will make running an Internet access service so risky that many will probably just give up, reducing consumer choice. Freedom of speech will be curtailed, online security weakened, and Italy’s digital infrastructure will be degraded. The end result of this law will be an overall impoverishment of Italian Internet users, Italian business, and the Italian economy. And all because of one industry’s obsession with policing copyright at all costs

Source: Italy is losing its mind because of copyright: it just made its awful Piracy Shield even worse – Walled Culture

Second Circuit Says Libraries Disincentivize Authors To Write Books By Lending Them For Free

What would you think if an author told you they would have written a book, but they wouldn’t bother because it would be available to be borrowed for free from a library? You’d probably think they were delusional. Yet that argument has now carried the day in putting a knife into the back of the extremely useful Open Library from the Internet Archive.

The Second Circuit has upheld the lower court ruling and found that the Internet Archive’s Open Library is not fair use and therefore infringes on the copyright of publishers (we had filed an amicus brief in support of the Archive asking them to remember the fundamental purpose of copyright law and the First Amendment, which the Court ignored).

Even though this outcome was always a strong possibility, the final ruling is just incredibly damaging, especially in that it suggests that all libraries are bad for authors and cause them to no longer want to write. I only wish I were joking. Towards the end of the ruling (as we’ll get to below) it says that while having freely lent out books may help the public in the “short-term” the “long-term” consequences would be that “there would be little motivation to produce new works.

[…]

As you’ll recall, the Open Library is no different than a regular library. It obtains books legally (either through purchase or donation) and then lends out one-to-one copies of those books. It’s just that it lends out digital copies of them. To keep it identical to a regular library, it makes sure that only one digital copy can be lent out for every physical copy it holds. Courts have already determined that digitizing physical books is fair use, and the Open Library has been tremendously helpful to all sorts of people.

The only ones truly annoyed by this are the publishers, who have always hated libraries and have long seen the shift to digital as an open excuse to effectively harm libraries. With licensed ebooks, the publishers have jacked up the prices so that (unlike with regular books), the library can’t just buy a single copy from any supplier and lend it out. Rather, publishers have made it prohibitively expensive to get ebook licenses, which come with ridiculous restrictions on how frequently books can be lent and more.

[…]

The key part of the case is whether or not the Internet Archive’s scanning and lending of books is fair use. The Second Circuit says that it fails the fair use four factors test. On the question of transformative use, the Internet Archive argued that because it was using technology to make lending of books more convenient and efficient, it was clearly transformative. Unfortunately, the court disagrees:

We conclude that IA’s use of the Works is not transformative. IA creates digital copies of the Works and distributes those copies to its users in full, for free. Its digital copies do not provide criticism, commentary, or information about the originals. Nor do they “add[] something new, with a further purpose or different character, altering the [originals] with new expression, meaning or message.” Campbell, 510 U.S. at 579. Instead, IA’s digital books serve the same exact purpose as the originals: making authors’ works available to read. IA’s Free Digital Library is meant to―and does―substitute for the original Works

The panel is not convinced by the massive change in making physical books digitally lendable:

True, there is some “change” involved in the conversion of print books to digital copies. See Infinity Broadcast Corp. v. Kirkwood, 150 F.3d 104, 108 n.2 (2d Cir. 1998) (“[A] change in format . . . is not technically a transformation.”). But the degree of change does not “go beyond that required to qualify as derivative.” Warhol II, 598 U.S. at 529. Unlike transformative works, derivative works “ordinarily are those that re-present the protected aspects of the original work, i.e., its expressive content, converted into an altered form.” Google Books, 804 F.3d at 225. To be transformative, a use must do “something more than repackage or republish the original copyrighted work.” Authors Guild, Inc. v. HathiTrust, 755 F.3d 87, 96 (2d Cir. 2014); see also TVEyes, 883 F.3d at 177 (“[A] use of copyrighted material that merely repackages or republishes the original is unlikely to be deemed a fair use.” (internal quotation marks omitted)). Changing the medium of a work is a derivative use rather than a transformative one.

But, that’s not what a derivative work is? A derivative work is not scanning a book. Scanning a book is making a copy. A derivative work is something like making a movie out of a book. So, this analysis is just fundamentally wrong in saying that this is a derivative work, and thus the rest of the analysis is kinda wonky based on that error.

Tragically, the Court then undermines the important ruling in the Betamax/VCR case that found “time shifting” (recording stuff off your TV) to be fair use, even as it absolutely was repackaging the same content for the same purpose. The Court says that doesn’t matter because it “predated our use of the word ‘transformative’ as a term of art.” But that doesn’t wipe out the case as a binding precedent, even though the Court here acts as though it does.

Sony was decided long before modern technology made it possible for one to view virtually any content at any time. Put in context, the “time-shifting” permitted by the defendant’s tape recorders in Sony was a unique efficiency not widely available at the time, and certainly not offered by the plaintiff-television producer.

So because content is more widely available, this kind of shifting is no longer fair use? How does that make any sense at all?

Then the Court says (incorrectly — as we’ll explain shortly) that there’s really nothing new or different about what the Open Library does:

Here, by contrast, IA’s Free Digital Library offers few efficiencies beyond those already offered by Publishers’ own eBooks.

The problem, though, is that this isn’t quite true. Getting licensed ebooks out from libraries is a difficult and cumbersome practice and requires each library to have a vast ebook collection that none can possibly afford. As this lawsuit went down, more and more authors came out of the woodwork, explaining how research they had done for their books was only possible because of the Open Library and would have been impossible via a traditional library given the lending restrictions and availability restrictions.

[…]

From there, the Court explores whether or not the Internet Archive’s use here was commercial. The lower court said it was because, ridiculously, the Internet Archive had donation links on library pages. Thankfully, the panel here sees how problematic that would be for every non-profit:

We likewise reject the proposition that IA’s solicitation of donations renders its use of the Works commercial. IA does not solicit donations specifically in connection with its digital book lending services―nearly every page on IA’s website contains a link to “Donate” to IA. App’x 6091. Thus, as with its partnership with BWB, any link between the funds IA receives from donations and its use of the Works is too attenuated to render the use commercial. Swatch, 756 F.3d at 83. To hold otherwise would greatly restrain the ability of nonprofits to seek donations while making fair use of copyrighted works. See ASTM I, 896 F.3d at 449 (rejecting the argument that because free distribution of copyrighted industry standards enhanced a nonprofit organization’s fundraising appeal, the use was commercial).

It also disagrees that this use is commercial because there’s a referral link for people to go and buy a copy of the book, saying that’s “too attenuated”:

Any link between the funds IA receives from its partnership with BWB and its use of the Works is too attenuated for us to characterize the use as commercial on that basis

Even so, the lack of commerciality isn’t enough to protect the project on the first factor analysis, and it goes to the publishers.

[…]

Source: Second Circuit Says Libraries Disincentivize Authors To Write Books By Lending Them For Free | Techdirt

There is a lot more, but it’s safe to say that the courts in the US and copyright laws have run amok and are only feeding the rich to the detriment of the poor. Denying people libraries is a step beyond.

Internet Archive loses appeal – 4 greedy publishers shut down major library in insane luddite US law system

The Internet Archive’s appeal could spell further trouble for the non-profit, as it is in the middle of a another copyright lawsuit with music publishers that could cost more than $400m if it loses.

The Internet Archive has been dealt a serious blow in court, as it lost an appeal case to share scanned books without the approval of publishers.

The loss could lead to serious repercussions for the non-profit, as hundreds of thousands of digital books have been removed from its library. The Internet Archive is also in the middle of another copyright lawsuit from multiple music labels for digitising vintage records.

What is the Internet Archive?

Based in San Francisco, the Internet Archive is one of the world’s most well-known libraries for scanned copies of millions of physical books that it lends to people all over the globe for free.

The non-profit organisation claims its mission is to provide “universal access to all knowledge” and has been archiving digital content for years such as books, movies, music, software and more.

The archive claims to have more than 20m freely downloadable books and texts, along with a collection of 2.3m modern e-books that can be borrowed – similar to a library. But while supporters say the Internet Archive is a valuable source of easily accessible information, its critics claim it breaches copyright laws.

What caused the major publisher lawsuit?

The Internet Archive let users access its vast digital library for years before the lawsuit began, but a decision during the Covid-19 pandemic prompted the legal response.

Previously, only a limited number of individuals were allowed to borrow a digital book from the non-profit’s Open Library service, a principle that the archive referred to as controlled digital lending.

But this rule was relaxed during the pandemic and led to the creation of the archive’s National Emergency Library, which meant an unlimited number of people could access the same e-books. After this decision, the major publishers launched their lawsuit and the archive went back to its controlled lending practices.

The four publishers – Hachette, Penguin Random House, Wiley, and HarperCollins – said the Internet Archive was conducting copyright infringement through its practices. But the lawsuit went after both library services and had a major impact – in June 2024, the Internet Archive said more than 500,000 books had been removed from its library as a result of the lawsuit.

The non-profit’s founder Brewster Kahle previously said libraries are “under attack at an unprecedented scale”, with a mix of book bans, defunding and “overzealous lawsuits like the one brought against our library”.

From a loss to an appeal

Unfortunately for the digital library, a judge sided in favour of the publishers on 24 March 2023, agreeing with their claims that the Internet Archive’s practices constitutes “wilful digital piracy on an industrial scale” that hurts both writers and publishers.

The archive appealed this decision later that year, but the appeals court determined that it is not “fair use” for a non-profit to scan copyright-protected print books in their entirety and distribute those digital copies online. The appeals court also said there is not enough of a change from a printed copy to a digital one to constitute fair use.

“We conclude that IA’s use of the works is not transformative,” the appeals court said. “IA creates digital copies of the works and distributes those copies to its users in full, for free. Its digital copies do not provide criticism, commentary, or information about the originals.”

The appeals court did disagree with the previous court’s verdict that the Internet Archive’s use of these copyrighted materials is “commercial in nature” and said it is “undisputed that IA is a nonprofit entity and that it distributes its digital books for free”.

What does this mean for the Internet Archive?

The archive’s director of library services Chris Freeland said the non-profit is “disappointed” in the decision by the appeals court and that it is “reviewing the court’s opinion and will continue to defend the rights of libraries to own, lend and preserve books”.

Freeland also shared a link to readers where they can sign an open letter asking publishers to restore access to the 500,000 books removed from the archive’s library.

The loss also presents a bad precedent for the archive’s Great 78 Project, which is focused on the discovery and preservation of 78rpm records. The Internet Archive has been working to digitise millions of these recordings to preserve them, adding that the disks they were recorded onto are made of brittle material and can be easily broken.

“We aim to bring to light the decisions by music collectors over the decades and a digital reference collection of underrepresented artists and genres,” the Internet Archive says on the project page.

“The digitisation will make this less commonly available music accessible to researchers in a format where it can be manipulated and studied without harming the physical artefacts.”

But multiple music labels are suing the Internet Archive for this project and claims it has “wilfully reproduced” thousands of protected sound recordings without copyright authorisation. The music labels are seeking damages of up to $150,000 for each protected sound recording infringed in the lawsuit, which could lead to payments of more than $412m if the court rules against the Internet Archive.

Source: What you need to know about the Internet Archive’s appeal loss

UK Once Again Denies A Passport Over Applicant’s Name Due To Intellectual Property Concerns – again

I can’t believe this, but it happened again. Almost exactly a decade ago, Tim Cushing wrote about a bonkers story out of the UK in which a passport applicant who’s middle name was “Skywalker” was denied the passport due to purported trademark or copyright concerns. The question that ought to immediately leap to mind should be: wait, nothing about a name or its appearance on a passport amounts to either creative expression being copied, nor use in commerce, meaning that neither copyright nor trademark law ought to apply in the slightest.

And you would have thought that coming out of that whole episode, proper guidance would have been given to the UK’s passport office so that this kind of stupidity doesn’t happen again. Unfortunately, it did happen again. A UK woman attempted to get a passport for her daughter, who she named Khaleesi, only to have it refused over the trademark for the Game of Thrones character that held the same fictional title.

Lucy, 39, from Swindon in Wiltshire, said the Passport Office initially refused the application for Khaleesi, six.

Officials said they were unable to issue a passport unless Warner Brothers gave permission because it owned the name’s trademark. But the authority has since apologised for the error.

“I was absolutely devastated, we were so looking forward to our first holiday together,” Lucy said.

While any intellectual property concerns over a passport are absolutely silly, I would argue that trademark law makes even less sense here than copyright would. Again, trademark law is designed specifically to protect the public from being confused as to the source of a good or service in commerce. There is no good or service nor commerce here. Lucy would simply like to take her own child across national borders. That’s it. Lucy had to consult with an attorney due to this insanity, which didn’t initially yield the proper result.

After seeking legal advice, her solicitors discovered that while there is a trademark for Game of Thrones, it is for goods and services – but not for a person’s name.

“That information was sent to the Passport Office who said I would need a letter from Warner Brothers to confirm my daughter is able to use that name,” she said.

This amounts to a restriction on the rights and freedoms of a child in a free country as a result of the choice their parent’s made about their name. Whatever your thoughts on IP laws in general, that simply cannot be the aim of literally any of them.

Now, once the media got a hold of all of this, the Passport Office eventually relented, said it made an error in denying the passport, and has put the application through. But even the government’s explanation doesn’t fully make sense.

Official explained there had been a misunderstanding and the guidance staff had originally given applies only to people changing their names.

“He advised me that they should be able to process my daughter’s passport now, ” she said.

Why would the changing of a name be any different? My name is my name, not a creative expression, nor a use in commerce. If I elect to change my name from “Timothy Geigner” to “Timothy Mickey Mouse Geigner”, none of that equates to an infringement of Disney’s rights, copyright nor trademark. It’s just my name. It would only be if I attempted to use my new name in commerce or as part of an expression that I might run afoul of either trademark or copyright law.

What this really is is the pervasive cancer that is ownership culture. It’s only with ownership culture that you get a passport official somehow thinking that Warner Bros. production of a fantasy show means a six year old can’t get a passport.

Source: UK Once Again Denies A Passport Over Applicant’s Name Due To Intellectual Property Concerns | Techdirt

Suno & Udio To RIAA: Your Music Is Copyrighted, You Can’t Copyright Styles

AI music generators Suno and Udio responded to the lawsuits filed by the major recording labels, arguing that their platforms are tools for making new, original music that “didn’t and often couldn’t previously exist.”

“Those genres and styles — the recognizable sounds of opera, or jazz, or rap music — are not something that anyone owns,” the companies said. “Our intellectual property laws have always been carefully calibrated to avoid allowing anyone to monopolize a form of artistic expression, whether a sonnet or a pop song. IP rights can attach to a particular recorded rendition of a song in one of those genres or styles. But not to the genre or style itself.” TorrentFreak reports: “[The labels] frame their concern as one about ‘copies’ of their recordings made in the process of developing the technology — that is, copies never heard or seen by anyone, made solely to analyze the sonic and stylistic patterns of the universe of pre-existing musical expression. But what the major record labels really don’t want is competition.” The labels’ position is that any competition must be legal, and the AI companies state quite clearly that the law permits the use of copyrighted works in these circumstances. Suno and Udio also make it clear that snippets of copyrighted music aren’t stored as a library of pre-existing content in the neural networks of their AI models, “outputting a collage of ‘samples’ stitched together from existing recordings” when prompted by users.

“[The neural networks were] constructed by showing the program tens of millions of instances of different kinds of recordings,” Suno explains. “From analyzing their constitutive elements, the model derived a staggeringly complex collection of statistical insights about the auditory characteristics of those recordings — what types of sounds tend to appear in which kinds of music; what the shape of a pop song tends to look like; how the drum beat typically varies from country to rock to hip-hop; what the guitar tone tends to sound like in those different genres; and so on.” These models are vast stores, not of copyrighted music, the defendants say, but information about what musical styles consist of, and it’s from that information new music is made.

Most copyright lawsuits in the music industry are about reproduction and public distribution of identified copyright works, but that’s certainly not the case here. “The Complaint explicitly disavows any contention that any output ever generated by Udio has infringed their rights. While it includes a variety of examples of outputs that allegedly resemble certain pre-existing songs, the Complaint goes out of its way to say that it is not alleging that those outputs constitute actionable copyright infringement.” With Udio declaring that, as a matter of law, “that key point makes all the difference,” Suno’s conclusion is served raw. “That concession will ultimately prove fatal to Plaintiffs’ claims. It is fair use under copyright law to make a copy of a protected work as part of a back-end technological process, invisible to the public, in the service of creating an ultimately non-infringing new product.” Noting that Congress enacted the first copyright law in 1791, Suno says that in the 233 years since, not a single case has ever reached a contrary conclusion.

In addition to addressing allegations unique to their individual cases, the AI companies accuse the labels of various types of anti-competitive behavior. Imposing conditions to prevent streaming services obtaining licensed music from smaller labels at lower rates, seeking to impose a “no AI” policy on licensees, to claims that they “may have responded to outreach from potential commercial counterparties by engaging in one or more concerted refusals to deal.” The defendants say this type of behavior is fueled by the labels’ dominant control of copyrighted works and by extension, the overall market. Here, however, ownership of copyrighted music is trumped by the existence and knowledge of musical styles, over which nobody can claim ownership or seek to control. “No one owns musical styles. Developing a tool to empower many more people to create music, by scrupulously analyzing what the building blocks of different styles consist of, is a quintessential fair use under longstanding and unbroken copyright doctrine. “Plaintiffs’ contrary vision is fundamentally inconsistent with the law and its underlying values.”
You can read Suno and Udio’s answers to the RIAA’s lawsuits here (PDF) and here (PDF).

Source: Suno & Udio To RIAA: Your Music Is Copyrighted, You Can’t Copyright Styles

US Congress Wants To Let Private Companies Own The Law – set standards you must comply with but can’t actually find or see easily

It sounds absolutely batty that there is a strong, bipartisan push to lock up aspects of our law behind copyright. But it’s happening. Even worse, the push is on to include this effort to lock up the law in the “must pass” National Defense Authorization Act (NDAA). This is the bill that Congress lights up like a Christmas tree with the various bills they know they can’t pass normally, every year.

And this year, they’re pushing the Pro Codes Act, a dangerous bill to lock up the law that has bipartisan support.

[…]

There are lots of standards out there, often developed by industry groups. These standards can be on all sorts of subjects, such as building codes or consumer safety or indicators for hazardous materials. The list goes on and on and on. Indeed, the National Institute of Standards and Technology has a database of over 27,000 such standards that are “included by reference” into law.

This is where things get wonky. Since many of these standards are put together by private organizations (companies, standards bodies, whatever), some of them could qualify for copyright. But, then, lawmakers will often require certain products and services to meet those standards. That is, the laws will “reference” those standards (for example, how to have a building be built in a safe or non-polluting manner).

Many people, myself included, believe that the law must be public. How can the rule of law make any sense at all if the public cannot freely access and read the law? Thus, we believe that when a standard gets “incorporated by reference” into the law, it should become public domain, for the simple fact that the law itself must be public domain.

[…]

Two years ago, there was a pretty big victory, noting that his publishing of standards that are “incorporated by reference” is fair use.

But industry standards bodies hate this, because often a large part of their own revenue stream comes from selling access to the standards they create, including those referenced by laws.

So they lobbied Congress to push this Pro Codes Act, which explicitly says that technical standards incorporated by reference retain copyright. To try to stave off criticism (and to mischaracterize the bill publicly), the law says that standards bodies retain the copyright if the standards body makes the standard available on a free publicly accessible online source.

[…]

They added this last part to head off criticism that the law is “locked up.” They say things like “see, under this law, the law has to be freely available online.”

But that’s missing the point. It still means that the law itself is only available from one source, in one format. And while it has to be “publicly accessible online at no monetary cost,” that does not mean that it has to be publicly accessible in an easy or useful manner. It does not mean that there won’t be limitations on access or usage.

It is locking up the law.

But, because the law says that those standards must be released online free of cost, it allows the supporters of this law, like Issa, to falsely portray the law as “enhancing public access” to the laws.

That’s a lie.

[…]

t flies in the face of the very fundamental concept that “no one can own the law,” as the Supreme Court itself recently said. And to try and shove it into a must pass bill about funding the military is just ridiculously cynical, while demonstrating that its backers know it can’t pass through regular process.

Instead, this is an attempt by Congress to say, yes, some companies do get to own the law, so long as they put up a limited, difficult to use website by which you can see parts of the law.

Library groups and civil society groups are pushing back on this (disclaimer: we signed onto this letter). Please add your voice and tell Congress not to lock up the law.

Source: Congress Wants To Let Private Companies Own The Law | Techdirt

Inputs, Outputs, and Fair Uses: Unpacking Responses to Journalists’ Copyright Lawsuits

The complaints against OpenAI and Microsoft in New York Times Company v. Microsoft Corporation and Daily News, LP v. Microsoft Corporation include multiple theories––for instance, vicarious copyright infringement, contributory copyright infringement, and improper removal of copyright information. Those theories, however, are ancillary to both complaints’ primary cause of action: direct copyright infringement. While the defendants’ motions to dismiss focus primarily on jettisoning the ancillary claims and acknowledge that “development of record evidence” is necessary for resolving the direct infringement claims, they nonetheless offer insight on how the direct infringement fight might unfurl.

Direct Infringement Via Inputs and Outputs: The Daily News plaintiffs claim that by “building training datasets containing” their copyrighted works without permission, the defendants directly infringe the plaintiffs’ copyrights. Inputting copyrighted material to train Gen AI tools, they aver, constitutes direct infringement. Regarding outputs, the Daily News plaintiffs assert that “by disseminating generative output containing copies and derivatives of the” plaintiffs’ content, the defendants’ tools also infringe the plaintiffs’ copyrights. The Daily News’s input (illicit training) and output (disseminating copies) allegations track earlier contentions of The New York Times Company.

Fair Use Inputs and “Fringe” Outputs: OpenAI’s June arguments in Daily News frame “the core issue”––one OpenAI says “is for a later stage of the litigation” because discovery must first generate a factual record––facing New York City-based federal judge Sidney Stein as “whether using copyrighted content to train a generative AI model is fair use under copyright law.” Fair use, a defense to copyright infringement, involves analyzing four statutory factors: 1) the purpose and character of the allegedly infringing use; 2) the nature of copyrighted work allegedly infringed upon; 3) the amount of the copyrighted work infringed upon and whether the amount, even if small, nonetheless goes to the heart of the work; and 4) whether the infringing use will harm the market value of (or serve as a market substitute for) the original copyrighted work.

So, how might ingesting copyrighted journalistic content––the training or input aspect of the alleged infringement––be a protected fair use? Microsoft argues in Daily News that its “and OpenAI’s tools [don’t] exploit the protected expression in the Plaintiffs’ digital content.” (emphasis added). That’s a key point because copyright law does not protect things like facts, “titles, names, short phrases, and slogans.” OpenAI asserts, in response to The New York Times Company’s lawsuit, that “no one . . . gets to monopolize facts or the rules of language.” Learning semantic rules and patterns of “language, grammar, and syntax”––predicting which words are statistically most likely to follow others––is, at bottom, the purpose of the fair use to which OpenAI and Microsoft say they’re putting newspaper articles. They’re ostensibly just leveraging copyrighted articles “internally” (emphasis in original) to identify and learn language patterns, not to reproduce the articles in which those words appear.

More fundamentally, OpenAI and Microsoft aren’t attempting to disseminate copies of what copyright law is intended to incentivize and protect––“original works of authorship” and “writings.” They aren’t, the defendants claim, trying to unfairly produce market substitutes for actual newspaper articles.

How, then, do they counter the newspapers’ output infringement allegations that the defendants’ tools sometimes produce verbatim versions of the newspapers’ copyrighted articles? OpenAI contends such regurgitative outcomes “depend on an elaborate effort [by the defendants] to coax such outputs from OpenAI’s products, in a way that violates the operative OpenAI terms of service and that no normal user would ever even attempt.” Regurgitations otherwise are “rare” and “unintended,” the company adds. Barring settlements, courts will examine the input and output infringement battles in the coming months and years.

Source: Inputs, Outputs, and Fair Uses: Unpacking Responses to Journalists’ Copyright Lawsuits | American Enterprise Institute – AEI

Sharing material used to be the norm for newspapers, and should be for LLMs

Even though parents insist that it is good and right to share things, the copyright world has succeeded in establishing the contrary as the norm. Now, sharing is deemed a bad, possibly illegal thing. But it was not always thus, as a fascinating speech by Ryan Cordell, Associate Professor in the School of Information Sciences and Department of English at the University of Illinois Urbana-Champaign, underlines. In the US in the nineteenth century, newspaper material was explicitly not protected by copyright, and was routinely exchanged between titles:

Nineteenth-century editors’ attitude toward text reuse is exemplified in a selection that circulated in the last decade of the century, though often abbreviated from the version I cite here, which insists that “an editor’s selections from his contemporaries” are “quite often the best test of his editorial ability, and that the function of his scissors are not merely to fill up vacant spaces, but to reproduce the brightest and best thoughts…from all sources at the editor’s command.” While noting that sloppy or lazy selection will produce “a stupid issue,” this piece claims that just as often “the editor opens his exchanges, and finds a feast for eyes, heart and soul…that his space is inadequate to contain.” This piece ends by insisting “a newspaper’s real value is not the amount of original matter it contains, but the average quality of all the matter appearing in its columns whether original or selected.”

Material was not only copied verbatim, but modified and built upon in the process. As a result of this constant exchange, alteration and enhancement, newspaper readers in the US enjoyed a rich ecosystem of information, and a large number of titles flourished, since the cost of producing suitable material for each of them was shared and thus reduced.

That historical fact in itself is interesting. It’s also important at a time when newspaper publishers are some of the most aggressive in demanding ever stronger – and ever more disproportionate – copyright protection for their products, for example through “link taxes”. But Cordell’s speech is not simply backward looking. It goes on to make another fascinating observation, this time about large language models (LLMs):

We can see in the nineteenth-century newspaper exchanges a massive system for recycling and remediating culture. I do not wish to slip into hyperbole or anachronism, and will not claim historical newspapers as a precise analogue for twenty-first century AI or large language models. But it is striking how often metaphors drawn from earlier media appear in our attempts to understand and explain these new technologies.

The whole speech is well worth reading as a useful reminder that the current copyright panic over LLMs is in part because we have forgotten that sharing material and helping others to build on it was once the norm. And despite blinkered and selfish views to the contrary, it is still the right thing to do, just as parents continue to tell their children.

Source: Sharing material used to be the norm for newspapers, and should be for LLMs – Walled Culture

Paramount Axes Decades Of Comedy Central History In Latest Round Of Brunchlord Dysfunction

Last month we noted how the brunchlords in charge of Paramount (CBS) decided to eliminate decades of MTV News journalism history as part of their ongoing “cost saving” efforts. It was just the latest casualty in an ever-consolidating and very broken U.S. media business routinely run by some of the least competent people imaginable.

We’ve noted how with streaming growth slowing, there’s no longer money to be made goosing stock valuations via subscriber growth. So media giants (and the incompetent brunchlords that usually fail upward into positions of unearned power within them) have turned their attention to all the usual tricks: layoffs, pointless megamergers, price hikes, and more and more weird and costly consumer restrictions.

Part of that equation also involves being too cheap to preserve history, as we’ve seen countless times when a journalism or media company implodes and then immediately disappears not just staffers but decades of their hard work. Usually (and this is from my experience as a freelancer) without any warning or consideration of the impact whatsoever.

Paramount has been struggling after its ingenious strategy of making worse and worse streaming content while charging more and more money somehow hasn’t panned out. While the company looks around for merger and acquisition partners, they’ve effectively taken a hatchet to company staff and history.

First with the recent destruction of the MTV News archives and a major round of layoffs, and now with the elimination of years of Comedy Central history. Last week, as part of additional cost cutting moves, the company basically gutted the Comedy Central website, eliminating years of archived video history of numerous programs ranging from old South Park clips to episodes of the The Colbert Report.

A website message and press statement by the company informs users that they can simply head over to the Paramount+ streaming app to watch older content:

As part of broader website changes across Paramount, we have introduced more streamlined versions of our sites, driving fans to Paramount+ to watch their favorite shows.”

Except older episodes of The Daily Show and The Colbert Report can no longer be found on Paramount+, also due to layoffs and cost cutting efforts at the company. Paramount is roughly $14 billion in debt due to mismanagement, and a recent plan to merge with Skydance was scuttled at the last second.

Eventually Paramount will find somebody else to merge with in order to bump stock valuations, nab a fat tax cut, and justify excessive executive compensation (look at me, I’m a savvy dealmaker!). At which point, as we saw with the disastrous AT&T–>Time Warner–>Discovery series of mergers, an entirely new wave of layoffs, quality erosion, and chaos will begin as they struggle to pay off deal debt.

It’s all so profoundly pointless, and at no point does anything like product quality, customer satisfaction, employee welfare, or the preservation of history enter into it. The executives spearheading this repeated trajectory from ill-conceived business models to mindless mergers will simply be promoted to bigger and better ventures because there’s simply no financial incentive to learn from historical missteps.

The executives at the top of the heap usually make out like bandits utterly regardless of competency or outcomes, so why change anything?

Source: Paramount Axes Decades Of Comedy Central History In Latest Round Of Brunchlord Dysfunction | Techdirt

Record labels sue AI music generators for ‘massive infringement of recorded music’

Major music labels are taking on AI startups that they believe trained on their songs without paying. Universal Music Group, Warner Music Group and Sony Music Group sued the music generators Suno and Udio for allegedly infringing on copyrighted works on a “massive scale.”

The Recording Industry Association of America (RIAA) initiated the lawsuits and wants to establish that “nothing that exempts AI technology from copyright law or that excuses AI companies from playing by the rules.”

The music labels’ lawsuits in US federal court accuse Suno and Udio of scraping their copyrighted tracks from the internet. The filings against the AI companies reportedly demand injunctions against future use and damages of up to $150,000 per infringed work. (That sounds like it could add up to a monumental sum if the court finds them liable.) The suits appear aimed at establishing licensed training as the only acceptable industry framework for AI moving forward — while instilling fear in companies that train their models without consent.

Screenshot of the Udio AI music generator homescreen.
Udio

Suno AI and Udio AI (Uncharted Labs run the latter) are startups with software that generates music based on text inputs. The former is a partner of Microsoft for its CoPilot music generation tool. The RIAA claims the services’ reproduced tracks are uncannily similar to existing works to the degree that they must have been trained on copyrighted songs. It also claims the companies didn’t deny that they trained on copyright works, instead shielding themselves behind their training being “confidential business information” and standard industry practices.

According to The Wall Street Journal, the lawsuits accuse the AI generators of creating songs that sounded remarkably similar to The Temptations’ “My Girl,” Green Day’s “American Idiot,” and Mariah Carey’s “All I Want for Christmas Is You,” among others. They also claim the AI services produced indistinguishable vocals from artists like Lin-Manuel Miranda, Bruce Springsteen, Michael Jackson and ABBA.

Wired reports that one example cited in the lawsuit details how one of the AI tools reproduced a song that sounded nearly identical to Chuck Berry’s pioneering classic “Johnny B. Goode,” using the prompt, “1950s rock and roll, rhythm & blues, 12 bar blues, rockabilly, energetic male vocalist, singer guitarist,” along with some of Berry’s lyrics. The suit claims the generator almost perfectly generated the original track’s “Go, Johnny, go, go” chorus.

Screenshot for the Suno AI webpage.
Suno

To be clear, the RIAA isn’t advocating based on the principle that all AI training on copyrighted works is wrong. Instead, it’s saying it’s illegal to do so without licensing and consent, i.e., when the labels (and, likely to a lesser degree, the artists) don’t make any money off of it.

[…]

Source: Record labels sue AI music generators for ‘massive infringement of recorded music’

So they are not only claiming that stuff inspired by stuff a computer listened to is different from stuff inspired by stuff a person listened to, but they are also claiming copyright on something from the 1950’s?!

Forbes accuses Perplexity AI of bypassing robots.txt web standard to scrape content, Tollbit startup gains publicity by baselessly accusing everyone of doing this too in open letter. Why do we listen to this shit?

[…]

A letter to publishers seen by Reuters on Friday, which does not name the AI companies or the publishers affected, comes amid a public dispute between AI search startup Perplexity and media outlet Forbes involving the same web standard and a broader debate between tech and media firms over the value of content in the age of generative AI.

The business media publisher publicly accused Perplexity of plagiarizing its investigative stories in AI-generated summaries without citing Forbes or asking for its permission.

A Wired investigation published this week found Perplexity likely bypassing efforts to block its web crawler via the Robots Exclusion Protocol, or “robots.txt,” a widely accepted standard meant to determine which parts of a site are allowed to be crawled.

Perplexity declined a Reuters request for comment on the dispute.

The News Media Alliance, a trade group representing more than 2,200 U.S.-based publishers, expressed concern about the impact that ignoring “do not crawl” signals could have on its members.

“Without the ability to opt out of massive scraping, we cannot monetize our valuable content and pay journalists. This could seriously harm our industry,” said Danielle Coffey, president of the group.

Source: Exclusive-Multiple AI companies bypassing web standard to scrape publisher sites, licensing firm says

So the original clickbait headline comes from a content licensing startup scaring content providers up but with no details whatsoever. Why is this even news?!

500,000 Books Have Been Deleted From The Internet Archive’s Lending Library by Greedy Publishers

If you found out that 500,000 books had been removed from your local public library, at the demands of big publishers who refused to let them buy and lend new copies, and were further suing the library for damages, wouldn’t you think that would be a major news story? Wouldn’t you think many people would be up in arms about it?

It’s happening right now with the Internet Archive, and it’s getting almost no attention.

As we’ve discussed at great length, the Internet Archive’s Open Library system is indistinguishable from the economics of how a regular library works. The Archive either purchases physical books or has them donated (just like a physical library). It then lends them out on a one-to-one basis (leaving aside a brief moment where it took down that barrier when basically all libraries were shut down due to pandemic lockdowns), such that when someone “borrows” a digital copy of a book, no one else can borrow that same copy.

And yet, for all of the benefits of such a system in enabling more people to be able to access information, without changing the basic economics of how libraries have always worked, the big publishers all sued the Internet Archive. The publishers won the first round of that lawsuit. And while the court (somewhat surprisingly!) did not order the immediate closure of the Open Library, it did require the Internet Archive to remove any books upon request from publishers (though only if the publishers made those books available as eBooks elsewhere).

As the case has moved into the appeals stage (where we have filed an amicus brief), the Archive has revealed that around 500,000 books have been removed from the open library.

The Archive has put together an open letter to publishers, requesting that they restore access to this knowledge and information — a request that will almost certainly fall on extremely deaf ears.

We purchase and acquire books—yes, physical, paper books—and make them available for one person at a time to check out and read online. This work is important for readers and authors alike, as many younger and low-income readers can only read if books are free to borrow, and many authors’ books will only be discovered or preserved through the work of librarians. We use industry-standard technology to prevent our books from being downloaded and redistributed—the same technology used by corporate publishers.

But the publishers suing our library say we shouldn’t be allowed to lend the books we own. They have forced us to remove more than half a million books from our library, and that’s why we are appealing. 

The Archive also has a huge collection of quotes from people who have been impacted negatively by all of this. Losing access to knowledge is a terrible, terrible thing, driven by publishers who have always hated the fundamental concept of libraries and are very much using this case as an attack on the fundamental principle of lending books.

[…]

And, why? Because copyright and DRM systems allow publishers to massively overcharge for eBooks. This is what’s really the underlying factor here. Libraries in the past could pay the regular price for a book and then lend it out. But with eBook licensing, they are able to charge exorbitant monopoly rents, while artificially limiting how many books libraries can even buy.

I don’t think many people realize the extreme nature of the pricing situation here. As we’ve noted, a book that might cost $29.99 retail can cost $1,300 for an eBook license, and that license may include restrictions, such as having to relicense after a certain number of lends, or saying a library may only be allowed to purchase a single eBook license at a time.

The ones who changed the way libraries work is not the Internet Archive. It’s the publishers. They’re abusing copyright and DRM to fundamentally kill the very concept of a library, and this lawsuit is a part of that strategy.

Source: 500,000 Books Have Been Deleted From The Internet Archive’s Lending Library | Techdirt

We are losing vast swathes of our digital past, and copyright stops us saving it

It is hard to imagine the world without the Web. Collectively, we routinely access billions of Web pages without thinking about it. But we often take it for granted that the material we want to access will be there, both now and in the future. We all hit the dreaded “404 not found” error from time to time, but merely pass on to other pages. What we tend to ignore is how these online error messages are a flashing warning signal that something bad is happening to the World Wide Web. Just how bad is revealed in a new report from the Pew Research Center, based on an examination of half a million Web pages, which found:

A quarter of all webpages that existed at one point between 2013 and 2023 are no longer accessible, as of October 2023. In most cases, this is because an individual page was deleted or removed on an otherwise functional website.

For older content, this trend is even starker. Some 38% of webpages that existed in 2013 are not available today, compared with 8% of pages that existed in 2023.

This digital decay occurs at slightly different rates for different online material:

23% of news webpages contain at least one broken link, as do 21% of webpages from government sites. News sites with a high level of site traffic and those with less are about equally likely to contain broken links. Local-level government webpages (those belonging to city governments) are especially likely to have broken links.

54% of Wikipedia pages contain at least one link in their “References” section that points to a page that no longer exists.

These figures show that the problem we discussed a few weeks ago – that access to academic knowledge is at risk – is in fact far wider, and applies to just about everything that is online. Although the reasons for material disappearing vary greatly, the key obstacle to addressing that loss is the same across all fields. The copyright industry’s obsessive control of material, and the punitive laws that can be deployed against even the most trivial copyright infringement, mean that routine and multiple backup copies of key or historic online material are rarely made.

The main exception to that rule is the sterling work carried out by the Internet Archive, which was founded by Brewster Kahle, whose Kahle/Austin Foundation supports this blog. At the time of writing the Internet Archive holds copies of an astonishing 866 billion Web pages, many in multiple versions that chart their changes over time. It is a unique and invaluable resource.

It is also being sued by publishers for daring to share in a controlled way some of its holdings. That is, the one bulwark against losing vast swathes of our digital culture is being attacked by an industry that is largely to blame for the problem the Internet Archive is trying to solve. It’s another important reason why we must move away from the copyright system, and nullify the power it has to destroy, rather than create, our culture.

Source: We are losing vast swathes of our digital past, and copyright stops us saving it – Walled Culture

First-mover advantage found in the arts shows copyright isn’t necessary to protect innovative creativity

One of the arguments sometimes made in defence of copyright is that without it, creators would be unable to compete with the hordes of copycats that would spring up as soon as their works became popular. Copyright is needed, supporters say, to prevent less innovative creators from producing works that are closely based on new, successful ideas. However, this approach has led to constant arguments and court cases over how close a “closely based” work can be before it infringes on the copyright of others. A good example of this is the 2022 lawsuit involving Ed Sheeran, where is was argued that using just four notes of a scale constituted copyright infringement of someone else’s song employing the same tiny motif. A fascinating new paper looks at things from a different angle. It draws on the idea of “first-mover advantage”, the fact that:

individuals that move to a new market niche early on (“first movers”) obtain advantages that may lead to larger success, compared to those who move to this niche later. First movers enjoy a temporary near-monopoly: since they enter a niche early, they have little to no competition, and so they can charge larger prices and spend more time building a loyal customer base.

The paper explores the idea in detail for the world of music. Here, first-mover advantage means:

The artists and music producers who recognize the hidden potential of a new artistic technique, genre, or style, have bigger chances of reaching success. Having an artistic innovation that your competitors do not have or cannot quickly acquire may become advantageous on the winner-take-all artistic market.

Analysing nearly 700,000 songs across 110 different musical genres, the researchers found evidence that first-mover advantage was present in 91 of the genres. The authors point out that there is also anecdotal evidence of first-mover advantage in other arts:

For example, Agatha Christie—one of the recognized founders of “classical” detective novel—is also one of the best-selling authors ever. Similarly, William Gibson’s novel Neuromancer—a canonical work in the genre of cyberpunk—is also one of the earliest books in this strand of science fiction. In films, the cult classic The Blair Witch Project is the first recognized member of the highly successful genre of found-footage horror fiction.

Although copyright may be present, first-mover advantage does not require it to operate – it is simply a function of being early with a new idea, which means that competition is scarce or non-existent. If further research confirms the wider presence of first-mover advantage in the creative world – for example, even where sharing-friendly CC licences are used – it will knock down yet another flimsy defence of copyright’s flawed and outdated intellectual monopoly

Source: First-mover advantage in the arts means copyright isn’t necessary to protect innovative creativity – Walled Culture

Japan’s Push To Make All Research Open Access is Taking Shape

The Japanese government is pushing ahead with a plan to make Japan’s publicly funded research output free to read. From a report: In June, the science ministry will assign funding to universities to build the infrastructure needed to make research papers free to read on a national scale. The move follows the ministry’s announcement in February that researchers who receive government funding will be required to make their papers freely available to read on the institutional repositories from January 2025. The Japanese plan “is expected to enhance the long-term traceability of research information, facilitate secondary research and promote collaboration,” says Kazuki Ide, a health-sciences and public-policy scholar at Osaka University in Suita, Japan, who has written about open access in Japan.

The nation is one of the first Asian countries to make notable advances towards making more research open access (OA) and among the first countries in the world to forge a nationwide plan for OA. The plan follows in the footsteps of the influential Plan S, introduced six years ago by a group of research funders in the United States and Europe known as cOAlition S, to accelerate the move to OA publishing. The United States also implemented an OA mandate in 2022 that requires all research funded by US taxpayers to be freely available from 2026. When the Ministry of Education, Culture, Sports, Science and Technology (MEXT) announced Japan’s pivot to OA in February, it also said that it would invest around $63 million to standardize institutional repositories — websites dedicated to hosting scientific papers, their underlying data and other materials — ensuring that there will be a mechanism for making research in Japan open.

Source: https://science.slashdot.org/story/24/05/31/1748243/japans-push-to-make-all-research-open-access-is-taking-shape?utm_source=rss1.0mainlinkanon&utm_medium=feed

Quite ironic that the original article is behind a paywall at Nature.com 🙂

Anyway, if the public paid for it, then the public should get it. A bit hugely late, but well done.

Adobe threatens to sue Nintendo emulator Delta for its look-alike logo

Delta, an emulator that can play Nintendo games, had to change its logo after Adobe threatened legal action. You’d think it would face trouble from Nintendo, seeing as it has been going after emulators these days, but no. It’s Adobe who’s going after the developer, which told TechCrunch that it first received an email from the company’s lawyer on May 7. Adobe warned Delta that their logos are too similar, with its app icon infringing on the well-known Adobe “A,” and asked it to change its logo so it wouldn’t violate the company’s rights. Delta reportedly received an email from Apple, as well, telling the developer that Adobe asked it to take down the emulator app.

A purple icon.
Delta

If you’ll recall, Apple started allowing retro game emulators on the App Store, as long as they don’t offer pirated games for download. Delta was one of the first to be approved for listing and was at the top of Apple’s charts for a while, which is probably why it caught Adobe’s attention. At the time of writing, it sits at number six in the ranking for apps in Entertainment with 17,100 ratings.

The developer told both Adobe and Apple that its logo was a stylized version of the Greek letter “delta,” and not the uppercase letter A. Regardless, it debuted a new logo, which looks someone took a sword to its old one to cut it in half. It’s a temporary solution, though — the developer said it’s releasing the “final” version of its new logo when Delta 1.6 comes out.

Source: Adobe threatens to sue Nintendo emulator Delta for its look-alike logo