Judge Seems (Correctly) Skeptical Of AI Copyright Lawsui

Over the last few months there have been a flurry of lawsuits against AI companies, with most of them being focused on copyright claims. The site ChatGPTIsEatingTheWorld has been tracking all the lawsuits, which currently lists 11 lawsuits, seven of which are copyright claims. Five of those are from the same lawyers: Joseph Saveri and Matthew Butterick, who seem to want to corner the market on “suing AI companies for copyright.”

We already covered just how bad their two separate (though they’re currently trying to combine them, and no one can explain to me why it made sense to file them separately in the first place) lawsuits on behalf of authors are, as they show little understanding of how copyright actually works. But their original lawsuit against Stability AI, MidJourney, and DeviantArt was even worse, as we noted back in April. As we said at the time, they don’t allege a single act of infringement, but rather make vague statements about how what these AI tools are doing must be infringing.

(Also, the lawyers seemed to totally misunderstand what DeviantArt was doing, in that it was using open source tools to better enable DeviantArt artists to prevent their works from being used as inspiration in AI systems, and claimed that was infringing… but that’s a different issue).

It appears that the judge overseeing that lawsuit has noticed just how weak the claims are. Though we don’t have a written opinion yet, Reuters reports that Judge William Orrick was pretty clear at least week’s hearing that the case, as currently argued, has no chance.

U.S. District Judge William Orrick said during a hearing in San Francisco on Wednesday that he was inclined to dismiss most of a lawsuit brought by a group of artists against generative artificial intelligence companies, though he would allow them to file a new complaint.

Orrick said that the artists should more clearly state and differentiate their claims against Stability AI, Midjourney and DeviantArt, and that they should be able to “provide more facts” about the alleged copyright infringement because they have access to Stability’s relevant source code.

“Otherwise, it seems implausible that their works are involved,” Orrick said, noting that the systems have been trained on “five billion compressed images.”

Again, the theory of the lawsuit seemed to be that AI companies cut up little pieces of the content they train on and create a “collage” in response. Except, that’s not at all how it works. And since the complaint can’t show any specific work that has been infringed on by the output, the case seems like a loser. And it’s good the judge sees that.

He also recognizes that merely being inspired by someone else’s art doesn’t make the new art infringing:

“I don’t think the claim regarding output images is plausible at the moment, because there’s no substantial similarity” between images created by the artists and the AI systems, Orrick said.

It seems likely that Saveri and crew will file an amended complaint to try to more competently make this argument, but since the underlying technology doesn’t fundamentally do what the lawsuit pretends it does, it’s difficult to see how it can succeed.

But, of course, this is copyright, and copyright caselaw doesn’t always follow logic or what the law itself says. So it’s no surprise that Saveri and Butterick are trying multiple lawsuits with these theories. They might just find a judge confused enough to buy it.

Source: Judge Seems (Correctly) Skeptical Of AI Copyright Lawsuit | Techdirt

AI Creation and Copyright: Unraveling the debate on originality, ownership

Whether AI systems can create original work sparks intense discussions among philosophers, jurists, and computer scientists and touches on issues of ownership, copyright, and economic competition, writes Stefano Quintarelli.

Stefano Quintarelli is an information technology specialist, a former member of the Italian Parliament and a former member of the European Commission’s High-level expert group on Artificial Intelligence.

Did Jesus Christ own the clothing he wore? In Umberto Eco’s landmark book “The Name of the Rose,” this is the central issue that is hotly debated by senior clergy, leading to internecine warfare. What is at stake is nothing less than the legitimacy of the church to own private property — and for clergy to grow rich in the process — since if Jesus did it then it would be permitted for his faithful servants.

Although it may not affect the future of an entity quite as dominant as the Catholic church, a similarly vexatious question is sparking heated debate today: Namely, can artificial intelligence systems create original work, or are they just parrots who repeat what they are told? Is what they do similar to human intelligence, or are they just echoes of things that have already been created by others?

The recent boom in generative artificial intelligence (AI) tools such as ChatGPT has spurred a flurry of multilateral initiatives as regulators attempt to respond to the breakneck pace of development of AI systems, write Carisa Nietsche and Camille Ford.

In this case, the debate is not among senior clergy, but philosophers, jurists and computer scientists (those who specialise in the workings of the human brain seem to be virtually absent from the discussion). Instead of threatening the wealth of the church, the answer to the question of machine intelligence raises issues that affect the ownership and wealth that flows from all human works.

Large language models (LLMs) such as Bard and ChatGPT are built by ingesting huge quantities of written material from the internet and learning to make connections and correlations between the words and phrases in those texts. The question is: When an AI engine produces something, is it generating a new creative work, as a human would, or is it merely generating a derivative work?

If the answer is that a machine does not ‘learn’ and therefore only synthesises or parrots existing work and does not ‘create’ then for legal and copyright purposes, its output could be considered a work derived from existing texts and therefore not its own creative work with all the rights that would be included.

In the early years of the commercial web, there was a similar debate over whether hyperlinks and short excerpts from articles or web pages should be considered derivative works. Those who believed they were, argued that Google should have to pay royalties on those links and excerpts when it included them in its search results.

My position at the time was that links with short excerpts should not be considered a derivative work, but rather a new kind of service that helped bring those works to a different audience, and therefore didn’t compete with the economic interests of the authors of those works or owners of those sites. Not only did the links or excerpts not cause them harm, they did the exact opposite.

Bard, Google’s eagerly awaited response to ChatGPT, was launched in Europe on Thursday (13 July), following delays in complying with the EU’s data protection rules.

This argument and extensions of it formed the basis for the birth of economic giants such as Google and Facebook, which could not have existed if they had to pay a ‘link tax’ for the content they indexed and linked to (although recent laws in countries such as Australia and Canada have changed that to some extent, and have forced Google and Facebook to pay newspapers for linking to their content).

But large language models don’t just produce links or excerpts. The responses they provide don’t lead the user to the original texts or sites, but instead become a substitute for them. The audience is arguably the same, and therefore there is undoubtedly economic competition. A large language model could become the only interface for access to and economic exploitation of that information.

It seems obvious that this will become a political issue in both the US and Europe, although the question of its legality could result in different answers, since the United States has a legal tradition of ‘fair use’, which allows companies such as Google to use work in various ways without having to license it, and without infringing on an owner’s copyright.

In Europe, no such tradition exists (British Commonwealth countries have a similar concept called ‘fair dealing,’ but it is much weaker).

It’s probably not a coincidence that the companies that created these AI engines are reluctant to say what texts or content the models were built on, since transparency could facilitate possible findings of copyright infringement (a number of prominent authors are currently suing OpenAI, owner of ChatGPT, because they believe their work was ingested by its large language model without permission).

The rise of generative AI models like ChatGPT and Midjourney AI, able to produce incredibly realistic content, poses an unprecedented challenge to the creative sector. We discuss what this new generation of Artificial Intelligence means for this sector and human …

A problem within the problem is that the players who promote these systems typically enjoy dominant positions in their respective markets, and are therefore not subject to special obligations to open up or become transparent. This is what happened with the ‘right to link’ that led the web giants to become gatekeepers — the freedom to link or excerpt created huge value that caused them to become dominant.

It’s not clear that the solution to these problems is to further restrict copyright so as to limit the creation of new large language models. In thinking about what measures to apply and how to evolve copyright in the age of artificial intelligence, we ought to think about rules that will also help open up downstream markets, not cement the market power that existing gatekeepers already have.

When the creators of AI talk about building systems that are smarter than humans, and defend their models as more than just ‘stochastic parrots’ who repeat whatever they are told, we need to keep in mind that these are more than just purely philosophical statements. There are significant economic interests involved, including the future exploitation of the wealth of information produced to date. And that rivals anything except possibly the wealth of the Catholic Church in the 1300’s, when Eco’s hero, William Of Baskerville, was asking questions about private property.

Source: AI Creation and Copyright: Unraveling the debate on originality, ownership – EURACTIV.com

US-EU AI Code of Conduct: First Step Towards Transatlantic Pillar of Global AI Governance?

The European Union, G7, United States, and United Kingdom have announced initiatives aiming to establish governance regimes and guidelines around the technology’s use.

Amidst these efforts, an announcement made in late May by EU Executive Vice-President Margrethe Vestager at the close of the Fourth Trade and Technology Council (TTC) Ministerial in Sweden revealed an upcoming U.S.-EU “AI Code of Conduct.”

This measure represents a first step in laying the transatlantic foundations for global AI governance.

The AI Code of Conduct was presented as a joint U.S.-EU initiative to produce a draft set of voluntary commitments for businesses to adopt. It aims to bridge the gap between different jurisdictions by developing a set of non-binding international standards for companies developing AI systems ahead of legislation being passed in any country.

The initiative aims to go beyond the EU and U.S. to eventually involve other countries, including Indonesia and India, and ultimately be presented before the G7.

At present, questions remain surrounding the scope of the Code of Conduct and whether it will contain monitoring or enforcement mechanisms.

The AI Code of Conduct – coupled with other TTC deliverables emerging from the U.S.-EU Joint AI Roadmap – signals a path forward for the emergence of a transatlantic pillar of global AI governance.

Importantly, this approach circumvents questions of regulatory alignment and creates room for a broader set of multilateral actors, as well as the private sector.

[…]

Source: US-EU AI Code of Conduct: First Step Towards Transatlantic Pillar of Global AI Governance? – EURACTIV.com

More than 1,300 UK experts call AI a force for good

An open letter signed by more than 1,300 experts says AI is a “force for good, not a threat to humanity”.

It was organised by BCS, the Chartered Institute for IT, to counter “AI doom”.

Rashik Parmar, BCS chief executive, said it showed the UK tech community didn’t believe the “nightmare scenario of evil robot overlords”.

In March, tech leaders including Elon Musk, who recently launched an AI business, signed a letter calling for a pause in developing powerful systems.

[…]

The letter argues: “The UK can help lead the way in setting professional and technical standards in AI roles, supported by a robust code of conduct, international collaboration and fully resourced regulation”.

By doing so, it says Britain “can become a global byword for high-quality, ethical, inclusive AI”.

In the autumn UK Prime Minister Rishi Sunak will host a global summit on AI regulation.

Source: More than 1,300 experts call AI a force for good – BBC News

AI shows classroom conversations predict academic success

Who would have thought in-class, off-topic dialog can be a significant predictor of a student’s success in school? Scientists at Tsinghua University had a hunch and decided to deep-dive into how and AI may help an under-studied segment of the education pool: K-6th grade students learning in live, online classes.

By analyzing the classroom dialogs of these children, scientists at Tsinghua University developed neural network models to predict what behaviors may lead to a more successful student.

[…]

The researchers published their results in the Journal of Social Computing on March, 31. Valid findings were drawn from the data recorded and the models used that can be used to accurately predict .

“The most important message from this paper is that high-performing students, regardless of whether they are enrolled in STEM or non-STEM courses, consistently exhibit more , higher-level interactions concerning , and active participation in off-topic dialogs throughout the lesson,” said Jarder Luo, author and researcher of the study.

The implication here is that above the other markers of a successful student, which are cognition and positive emotion, the most important predictor of performance for STEM and non-STEM students is the interactive type of that student. In STEM students, the most crucial situation interactive types play in learning is during the middle stage of the lesson. In contrast, non-STEM students’ interactive types have about the same effect on the student’s performance during the middle and summary stages of the lesson.

Interactive dialog between students helps to streamline and integrate along with knowledge building; these open conversations help the young students navigate conversations generally, but more specifically conversations on topics the student is likely not very familiar with. This could be why the data so strongly suggests students more active in classroom dialog are typically higher-performing.

Additionally, the study also found that meta-cognition, that is, “thinking about thinking” is found to be more prevalent in higher-performing, non-STEM students than their STEM counterparts. This could be in part because science is often taught in a way that builds on a basis of knowledge, whereas other areas of study require a bit more planning and evaluation of the material.

[…]

Source: How AI can use classroom conversations to predict academic success

More information: Yuanyi Zhen et al, Prediction of Academic Performance of Students in Online Live Classroom Interactions—An Analysis Using Natural Language Processing and Deep Learning Methods, Journal of Social Computing (2023). DOI: 10.23919/JSC.2023.0007

Paris 2024 Olympics: Concern over French plan for AI surveillance

Under a recent law, police will be able to use CCTV algorithms to pick up anomalies such as crowd rushes, fights or unattended bags.

The law explicitly rules out using facial recognition technology, as adopted by China, for example, in order to trace “suspicious” individuals.

But opponents say it is a thin end of the wedge. Even though the experimental period allowed by the law ends in March 2025, they fear the French government’s real aim is to make the new security provisions permanent.

“We’ve seen this before at previous Olympic Games like in Japan, Brazil and Greece. What were supposed to be special security arrangements for the special circumstances of the games, ended up being normalised,” says Noémie Levain, of the digital rights campaign group La Quadrature du Net (Squaring the Web).

[…]

“We will not – and cannot by law – provide facial recognition, so this is a wholly different operation from what you see in China,” he says.

“What makes us attractive is that we provide security, but within the framework of the law and ethics.”

But according to digital rights activist Noémie Levain, this is only a “narrative” that developers are using to sell their product – knowing full well that the government will almost certainly favour French companies over foreign firms when it comes to awarding the Olympics contracts.

“They say it makes all the difference that here there will be no facial recognition. We say it is essentially the same,” she says.

“AI video monitoring is a surveillance tool which allows the state to analyse our bodies, our behaviour, and decide whether it is normal or suspicious. Even without facial recognition, it enables mass control.

“We see it as just as scary as what is happening in China. It’s the same principle of losing the right to be anonymous, the right to act how we want to act in public, the right not to be watched.”

Source: Paris 2024 Olympics: Concern over French plan for AI surveillance – BBC News

Stability AI releases Stable Doodle, a sketch-to-image tool

Stability AI, the startup behind the image-generating model Stable Diffusion, is launching a new service that turns sketches into images.

The sketch-to-image service, Stable Doodle, leverages the latest Stable Diffusion model to analyze the outline of a sketch and generate a “visually pleasing” artistic rendition of it. It’s available starting today through ClipDrop, a platform Stability acquired in March through its purchase of Init ML, an AI startup founded by ex-Googlers,

[…]

Under the hood, powering Stable Doodle is a Stable Diffusion model — Stable Diffusion XL — paired with a “conditional control solution” developed by one of Tencent’s R&D divisions, the Applied Research Center (ARC). Called T2I-Adapter, the control solution both allows Stable Diffusion XL to accept sketches as input and guides the model to enable better fine-tuning of the output artwork.

[…]

Source: Stability AI releases Stable Doodle, a sketch-to-image tool | TechCrunch

Find it at https://clipdrop.co/stable-doodle

AI System Identified Drug Trafficker by Scanning Driving Patterns

Police in New York recently managed to identify and apprehend a drug trafficker seemingly by magic. The perp in question, David Zayas, was traveling through the small upstate town of Scarsdale when he was pulled over by Westchester County police. When cops searched Zayas’ vehicle they found a large amount of crack cocaine, a gun, and over $34,000 in cash in his vehicle. The arrestee later pleaded guilty to a drug trafficking charge.

How exactly did cops know Zayas fit the bill for drug trafficking?

Forbes reports that authorities used the services of a company called Rekor to analyze traffic patterns regionally and, in the course of that analysis, the program identified Zayas as suspicious.

For years, cops have used license plate reading systems to look out for drivers who might have an expired license or are wanted for prior violations. Now, however, AI integrations seem to be making the tech frighteningly good at identifying other kinds of criminality just by observing driver behavior.

Rekor describes itself as an AI-driven “roadway intelligence” platform and it contracts with police departments and other public agencies all across the country. It also works with private businesses. Using Rekor’s software, New York cops were able to sift through a gigantic database of information culled from regional roadways by its county-wide ALPR [automatic license plate recognition] system. That system—which Forbes says is made up of 480 cameras distributed throughout the region—routinely scans 16 million vehicles a week, capturing identifying data points like a vehicle’s license plate number, make, and model. By recording and reverse-engineering vehicle trajectories as they travel across the state, cops can apparently use software to assess whether particular routes are suspicious or not.

In this case, Rekor helped police to assess the route that Zayas’ car was taking on a multi-year basis. The algorithm—which found that the driver was routinely making trips back and forth between Massachusetts and certain areas of upstate New York—determined that Zayas’ routes were “known to be used by narcotics pushers and [involved]…conspicuously short stays,” Forbes writes. As a result, the program deemed Zayas’s activity consistent with that of a drug trafficker.

Artificial intelligence has been getting a lot of attention in recent months due to the disruptions it’s made to the media and software industries but less attention has been paid to how this new technology will inevitably supercharge existing surveillance systems. If cops can already ID a drug trafficker with the click of a button, just think how good this tech will be in ten years’ time. As regulations evolve, one would hope governments will figure out how to reasonably deploy this technology without leading us right off the cliff into Minority Report territory. I mean, they probably won’t, but a guy can dream, can’t he?

Source: AI System Identified Drug Trafficker by Scanning Driving Patterns

There is no way at all that this could possibly go wrong, right? See the comments in the link.

China sets AI rules – not just risk based (EU AI Act), but also ideological

Chinese authorities published the nation’s rules governing generative AI on Thursday, including protections that aren’t in place elsewhere in the world.

Some of the rules require operators of generative AI to ensure their services “adhere to the core values of socialism” and don’t produce output that includes “incitement to subvert state power.” AIs are also required to avoid inciting secession, undermining national unity and social stability, or promoting terrorism.

Generative AI services behind the Great Firewall are also not to promote prohibited content that provokes ethnic hatred and discrimination, violence, obscenity, or “false and harmful information.” Those content-related rules don’t deviate from an April 2023 draft.

But deeper in, there’s a hint that China fancies digital public goods for generative AI. The doc calls for promotion of public training data resource platforms and collaborative sharing of model-making hardware to improve its utilization rates.

Authorities also want “orderly opening of public data classification, and [to] expand high-quality public training data resources.”

Another requirement is for AI to be developed with known secure tools: the doc calls for chips, software, tools, computing power and data resources to be proven quantities.

AI operators must also respect the intellectual property rights of data used in models, secure consent of individuals before including personal information, and work to “improve the quality of training data, and enhance the authenticity, accuracy, objectivity, and diversity of training data.”

As developers create algorithms, they’re required to ensure they don’t discriminate based on ethnicity, belief, country, region, gender, age, occupation, or health.

Operators are also required to secure licenses for their Ais under most circumstances.

AI deployed outside China has already run afoul of some of Beijing’s requirements. Just last week OpenAI was sued by novelists and comedians for training on their works without permission. Facial recognition tools used by the UK’s Metropolitan Police have displayed bias.

Hardly a week passes without one of China’s tech giants unveiling further AI services. Last week Alibaba announced a text-to-image service, and Huawei discussed a third-gen weather prediction AI.

The new rules come into force on August 15. Chinese orgs tempted to cut corners and/or flout the rules have the very recent example of Beijing’s massive fines imposed on Ant Group and Tencent as a reminder that straying from the rules will lead to pain – and possibly years of punishment.

Source: China sets AI rules that protect IP, people, and The Party • The Register

A Bunch Of Authors Sue OpenAI Claiming Copyright Infringement, Because They Don’t Understand Copyright

You may have seen some headlines recently about some authors filing lawsuits against OpenAI. The lawsuits (plural, though I’m confused why it’s separate attempts at filing a class action lawsuit, rather than a single one) began last week, when authors Paul Tremblay and Mona Awad sued OpenAI and various subsidiaries, claiming copyright infringement in how OpenAI trained its models. They got a lot more attention over the weekend when another class action lawsuit was filed against OpenAI with comedian Sarah Silverman as the lead plaintiff, along with Christopher Golden and Richard Kadrey. The same day the same three plaintiffs (though with Kadrey now listed as the top plaintiff) also sued Meta, though the complaint is basically the same.

All three cases were filed by Joseph Saveri, a plaintiffs class action lawyer who specializes in antitrust litigation. As with all too many class action lawyers, the goal is generally enriching the class action lawyers, rather than actually stopping any actual wrong. Saveri is not a copyright expert, and the lawsuits… show that. There are a ton of assumptions about how Saveri seems to think copyright law works, which is entirely inconsistent with how it actually works.

The complaints are basically all the same, and what it comes down to is the argument that AI systems were trained on copyright-covered material (duh) and that somehow violates their copyrights.

Much of the material in OpenAI’s training datasets, however, comes from copyrighted works—including books written by Plaintiffs—that were copied by OpenAI without consent, without credit, and without compensation

But… this is both wrong and not quite how copyright law works. Training an LLM does not require “copying” the work in question, but rather reading it. To some extent, this lawsuit is basically arguing that merely reading a copyright-covered work is, itself, copyright infringement.

Under this definition, all search engines would be copyright infringing, because effectively they’re doing the same thing: scanning web pages and learning from what they find to build an index. But we’ve already had courts say that’s not even remotely true. If the courts have decided that search engines scanning content on the web to build an index is clearly transformative fair use, so to would be scanning internet content for training an LLM. Arguably the latter case is way more transformative.

And this is the way it should be, because otherwise, it would basically be saying that anyone reading a work by someone else, and then being inspired to create something new would be infringing on the works they were inspired by. I recognize that the Blurred Lines case sorta went in the opposite direction when it came to music, but more recent decisions have really chipped away at Blurred Lines, and even the recording industry (the recording industry!) is arguing that the Blurred Lines case extended copyright too far.

But, if you look at the details of these lawsuits, they’re not arguing any actual copying (which, you know, is kind of important for their to be copyright infringement), but just that the LLMs have learned from the works of the authors who are suing. The evidence there is, well… extraordinarily weak.

For example, in the Tremblay case, they asked ChatGPT to “summarize” his book “The Cabin at the End of the World,” and ChatGPT does so. They do the same in the Silverman case, with her book “The Bedwetter.” If those are infringing, so is every book report by every schoolchild ever. That’s just not how copyright law works.

The lawsuit tries one other tactic here to argue infringement, beyond just “the LLMs read our books.” It also claims that the corpus of data used to train the LLMs was itself infringing.

For instance, in its June 2018 paper introducing GPT-1 (called “Improving Language Understanding by Generative Pre-Training”), OpenAI revealed that it trained GPT-1 on BookCorpus, a collection of “over 7,000 unique unpublished books from a variety of genres including Adventure, Fantasy, and Romance.” OpenAI confirmed why a dataset of books was so valuable: “Crucially, it contains long stretches of contiguous text, which allows the generative model to learn to condition on long-range information.” Hundreds of large language models have been trained on BookCorpus, including those made by OpenAI, Google, Amazon, and others.

BookCorpus, however, is a controversial dataset. It was assembled in 2015 by a team of AI researchers for the purpose of training language models. They copied the books from a website called Smashwords that hosts self-published novels, that are available to readers at no cost. Those novels, however, are largely under copyright. They were copied into the BookCorpus dataset without consent, credit, or compensation to the authors.

If that’s the case, then they could make the argument that BookCorpus itself is infringing on copyright (though, again, I’d argue there’s a very strong fair use claim under the Perfect 10 cases), but that’s separate from the question of whether or not training on that data is infringing.

And that’s also true of the other claims of secret pirated copies of books that the complaint insists OpenAI must have relied on:

As noted in Paragraph 32, supra, the OpenAI Books2 dataset can be estimated to contain about 294,000 titles. The only “internet-based books corpora” that have ever offered that much material are notorious “shadow library” websites like Library Genesis (aka LibGen), Z-Library (aka Bok), Sci-Hub, and Bibliotik. The books aggregated by these websites have also been available in bulk via torrent systems. These flagrantly illegal shadow libraries have long been of interest to the AI-training community: for instance, an AI training dataset published in December 2020 by EleutherAI called “Books3” includes a recreation of the Bibliotik collection and contains nearly 200,000 books. On information and belief, the OpenAI Books2 dataset includes books copied from these “shadow libraries,” because those are the most sources of trainable books most similar in nature and size to OpenAI’s description of Books2.

Again, think of the implications if this is copyright infringement. If a musician were inspired to create music in a certain genre after hearing pirated songs in that genre, would that make the songs they created infringing? No one thinks that makes sense except the most extreme copyright maximalists. But that’s not how the law actually works.

This entire line of cases is just based on a total and complete misunderstanding of copyright law. I completely understand that many creative folks are worried and scared about AI, and in particular that it was trained on their works, and can often (if imperfectly) create works inspired by them. But… that’s also how human creativity works.

Humans read, listen, watch, learn from, and are inspired by those who came before them. And then they synthesize that with other things, and create new works, often seeking to emulate the styles of those they learned from. AI systems and LLMs are doing the same thing. It’s not infringing to learn from and be inspired by the works of others. It’s not infringing to write a book report style summary of the works of others.

I understand the emotional appeal of these kinds of lawsuits, but the legal reality is that these cases seem doomed to fail, and possibly in a way that will leave the plaintiffs having to pay legal fees (since in copyright legal fee awards are much more common).

That said, if we’ve learned anything at all in the past two plus decades of lawsuits about copyright and the internet, courts will sometimes bend over backwards to rewrite copyright law to pretend it says what they want it to say, rather than what it does say. If that happens here, however, it would be a huge loss to human creativity.

Source: A Bunch Of Authors Sue OpenAI Claiming Copyright Infringement, Because They Don’t Understand Copyright | Techdirt

Hollywood studios proposed AI contract that would give them likeness rights ‘for the rest of eternity’

During today’s press conference in which Hollywood actors confirmed that they were going on strike, Duncan Crabtree-Ireland, SAG-AFTRA’s chief negotiator, revealed a proposal from Hollywood studios that sounds ripped right out of a Black Mirror episode.

In a statement about the strike, the Alliance of Motion Picture and Television Producers (AMPTP) said that its proposal included “a groundbreaking AI proposal that protects actors’ digital likenesses for SAG-AFTRA members.”

“If you think that’s a groundbreaking proposal, I suggest you think again.”

When asked about the proposal during the press conference, Crabtree-Ireland said that “This ‘groundbreaking’ AI proposal that they gave us yesterday, they proposed that our background performers should be able to be scanned, get one day’s pay, and their companies should own that scan, their image, their likeness and should be able to use it for the rest of eternity on any project they want, with no consent and no compensation. So if you think that’s a groundbreaking proposal, I suggest you think again.”

In response, AMPTP spokesperson Scott Rowe sent out a statement denying the claims made during SAG-AFTRA’s press conference. “The claim made today by SAG-AFTRA leadership that the digital replicas of background actors may be used in perpetuity with no consent or compensation is false. In fact, the current AMPTP proposal only permits a company to use the digital replica of a background actor in the motion picture for which the background actor is employed. Any other use requires the background actor’s consent and bargaining for the use, subject to a minimum payment.”

The use of generative AI has been one of the major sticking points in negotiations between the two sides (it’s also a major issue behind the writers strike), and in her opening statement of the press conference, SAG-AFTRA president Fran Drescher said that “If we don’t stand tall right now, we are all going to be in trouble, we are all going to be in jeopardy of being replaced by machines.”

Source: Hollywood studios proposed AI contract that would give them likeness rights ‘for the rest of eternity’ – The Verge

How AI could help local newsrooms remain afloat in a sea of misinformation – read and learn, Gizmodo staffers

It didn’t take long for the downsides of a generative AI-empowered newsroom to make themselves obvious, between CNet’s secret chatbot reviews editor last November and Buzzfeed’s subsequent mass layoffs of human staff in favor of AI-generated “content” creators. The specter of being replaced by a “good enough AI” looms large in many a journalist’s mind these days with as many as a third of the nation’s newsrooms expected to shutter by the middle of the decade.

But AI doesn’t have to necessarily be an existential threat to the field. As six research teams showed at NYU Media Lab’s AI & Local News Initiative demo day in late June, the technology may also be the key to foundationally transforming the way local news is gathered and produced.

Now in its second year, the initiative is tasked with helping local news organizations to “harness the power of artificial intelligence to drive success.” It’s backed as part of a larger $3 million grant from the Knight Foundation which is funding four such programs in total in partnership with the Associated Press, Brown Institute’s Local News Lab, NYC Media Lab and the Partnership on AI.

This year’s cohort included a mix of teams from academia and private industry, coming together over the course of the 12-week development course to build “AI applications for local news to empower journalists, support the sustainability of news organizations and provide quality information for local news audiences,” NYU Tandon’s news service reported.

“There’s value in being able to bring together people who are working on these problems from a lot of different angles,” Matt Macvey, Community and Project Lead for the initiative, told Engadget, “and that that’s what we’ve tried to facilitate.”

“It also creates an opportunity because … if these news organizations that are out there doing good work are able to keep communicating their value and maintain trust with their readers,” he continued. “I think we could get an information ecosystem where a trusted news source becomes even more valued when it becomes easier [for anyone] to make low-quality [AI generated] content.”

[…]

“Bangla AI will search for information relevant to the people of the Bengali community that has been published in mainstream media … then it will translate for them. So when journalists use Bangla AI, they will see the information in Bengali rather than in English.” The system will also generate summaries of mainstream media posts both in English and Bengali, freeing up local journalists to cover more important news than rewriting wire copy.

Similarly, the team from Chequeado, a non-profit organization fighting disinformation in the public discourse showed off the latest developments of its Chequeabot platform, Monitorio. It leverages AI and natural language processing capabilities to streamline fact-checking efforts in Spanish-language media. Its dashboard continually monitors social media in search of trending misinformation and alerts fact checkers so they can blunt the piece’s virality.

“One of the greatest promises of things like this and Bangla AI,” Chequeado team member Marcos Barroso said during the demo, “is the ability for this kind of technology to go to an under-resourced newsroom and improve their capacity, and allow them to be more efficient.”

The Newsroom AI team from Cornell University hope that their writing assistant platform will help do for journalists what Copilot did for coders – eliminate drudge work. Newsroom can automate a number of common tasks including transcription and information organization, image and headline generation, and SEO implementation. The system will reportedly even write articles in a journalist’s personal style if fed enough training examples.

On the audio side, New York public radio WNYC’s team spent its time developing and prototyping a speech-to-text model that will generate real-time captioning and transcription for its live broadcasts. WNYC is the largest public media station in New York, reaching 2 million visitors monthly through its news website.

“Our live broadcast doesn’t have a meaningful entry point right now for deaf or hard of hearing audiences,” WNYC team member, Sam Guzik, said during the demo. “So, we really want to think about as we’re looking to the future is, ‘how can we make our audio more accessible to those folks who can’t hear?’”

Utilizing AI to perform the speech-to-text transformation alleviates one of the biggest sticking points of modern closed-captioning: that it’s expensive and resource-intensive to turn around quickly when you have humans do it. “Speech-to-text models are relatively low cost,” Guzik continued. “They can operate at scale and they support an API driven architecture that would tie into our experiences.”

The result is a proof-of-concept audio player for the WNYC website that generates accurate closed captioning of whatever clip is currently being played. The system can go a step further by summarizing the contents of that clip in a few bullet points, simply by clicking a button on the audio player.

[…]

the Graham Media Group created an automated natural language text prompter to nudge the comments sections of local news articles closer towards civility.

“The comment-bot posts the first comment on stories to guide conversations and hopefully grow participation and drive users deeper into our engagement funnels,” GMG team member Dustin Block said during the demo. This solves two significant challenges that human comment moderation faces: preventing the loudest voices from dominating the discussion and providing form and structure to the conversation, he explained.

”The bot scans and understands news articles using the GPT 3.5 Turbo API. It generates thought-provoking starters and then it encourages discussions,” he continued. “It’s crafted to be friendly.”

Whether the AI revolution remains friendly to the journalists it’s presumably augmenting remains to be seen, though Macvey isn’t worried. “Most news organizations, especially local news organizations, are so tight on resources and staff that there’s more happening out there than they can cover,” he said. “So I think tools like AI and [the automations seen during the demo day] enable the journalists and editorial staff more bandwidth.”

Source: How AI could help local newsrooms remain afloat in a sea of misinformation | Engadget

The reason I cite Gizmodo here is because their AI / ML reporting is always on the negative, doom and gloom side. AI offers opportunities and it’s not going away.

How An AI-Written ‘Star Wars’ Story Shows Yet Again the Luddism at Gizmodo

G/O Media is the owner of top sites like Gizmodo, Kotaku, Quartz, and the Onion. Last month they announced “modest tests” of AI-generated content on their sites — and it didn’t go over well within the company, reports the Washington Post.

Soon the Deputy Editor of Gizmodo’s science fiction section io9 was flagging 18 “concerns, corrections and comments” about an AI-generated story by “Gizmodo Bot” on the chronological order of Star Wars movies and TV shows. “I have never had to deal with this basic level of incompetence with any of the colleagues that I have ever worked with,” James Whitbrook told the Post in an interview. “If these AI [chatbots] can’t even do something as basic as put a Star Wars movie in order one after the other, I don’t think you can trust it to [report] any kind of accurate information.” The irony that the turmoil was happening at Gizmodo, a publication dedicated to covering technology, was undeniable… Merrill Brown, the editorial director of G/O Media, wrote that because G/O Media owns several sites that cover technology, it has a responsibility to “do all we can to develop AI initiatives relatively early in the evolution of the technology.” “These features aren’t replacing work currently being done by writers and editors,” Brown said in announcing to staffers that the company would roll out a trial to test “our editorial and technological thinking about use of AI.”

“There will be errors, and they’ll be corrected as swiftly as possible,” he promised… In a Slack message reviewed by The Post, Brown told disgruntled employees Thursday that the company is “eager to thoughtfully gather and act on feedback…” The note drew 16 thumbs down emoji, 11 wastebasket emoji, six clown emoji, two face palm emoji and two poop emoji, according to screenshots of the Slack conversation…

Earlier this week, Lea Goldman, the deputy editorial director at G/O Media, notified employees on Slack that the company had “commenced limited testing” of AI-generated stories on four of its sites, including A.V. Club, Deadspin, Gizmodo and The Takeout, according to messages The Post viewed… Employees quickly messaged back with concern and skepticism. “None of our job descriptions include editing or reviewing AI-produced content,” one employee said. “If you wanted an article on the order of the Star Wars movies you … could’ve just asked,” said another. “AI is a solution looking for a problem,” a worker said. “We have talented writers who know what we’re doing. So effectively all you’re doing is wasting everyone’s time.”
The Post spotted four AI-generated stories on the company’s sites, including io9, Deadspin, and its food site The Takeout.

Source: How An AI-Written ‘Star Wars’ Story Created Chaos at Gizmodo – Slashdot

If you look at Gizmodo reporting on AI, you see it’s full of doom and gloom – the writers there know what’s coming and allthough they are smart enough to understand what AI is, they can’t fathom the opportunities it brings, unfortunately. The way this article is written gives a clue: an assistant editor didn’t read the published article beforehand (the entitlement shines through, but let’s be clear, this editor has no right to second guess the actual editor), the job descriptions quote (who ever had a complete job description – and the description may have said simply “editing or reviewing” without the AI bit in there – and why should it have an AI bit in there at all?).

Valve All But Bans AI-Generated Content from Steam Games

Game developers looking to distribute their playable creations via Valve’s popular Steam hub may have trouble if they’re looking to use AI during the creative process. The game publisher and distributor says that Steam will no longer tolerate products that were generated using copyright-infringing AI content. Since that’s a policy that could apply to most—if not all—of AI-generated content, it’s hard not to see this move as an outright AI ban by the platform.

Valve’s policy was initially spotted by a Redditor who claimed that the platform had rejected a game they submitted over copyright concerns. “I tried to release a game about a month ago, with a few assets that were fairly obviously AI generated,” said the dev, revealing that they’d been met with an email stating that Valve could not ship their game unless they could “affirmatively confirm that you own the rights to all of the IP used in the data set that trained the AI to create the assets in your game.” Because the developer could not affirmatively prove this, their game was ultimately rejected.

When reached for comment by Gizmodo, Valve spokesperson Kaci Boyle clarified that the company was not trying to discourage the use of AI outright but that usage needed to comply with existing copyright law.

“The introduction of AI can sometimes make it harder to show that a developer has sufficient rights in using AI to create assets, including images, text, and music,” Boyle explained to Gizmodo. “In particular, there is some legal uncertainty relating to data used to train AI models. It is the developer’s responsibility to make sure they have the appropriate rights to ship their game.”

[…]

Valve’s decision to nix any game that uses problematic AI content is obviously a defensive posture designed to protect against any unforeseen legal developments in the murky regulatory terrain that is the blossoming AI industry.

[…]

A legal fight is brewing over the role of copyrighted materials in the AI industry. Large language models—the high-tech algorithms that animate popular AI products like ChatGPT and DALL-E—have been trained with massive amounts of data from the web. As it turns out, a lot of that data is copyrighted material—stuff like works of art, books, essays, photographs, and videos. Multiple lawsuits have argued that AI companies like OpenAI and Midjourney are basically stealing and repackaging millions of people’s copyrighted works and then selling a product based on those works; those companies, in turn, have defended themselves, claiming that training an AI generator to spit out new text or imagery based on ingested data is the same thing as a human writing a novel after having been inspired by other books. Not everybody is buying this claim, leading to the growing refrain “AI is theft.”

Source: Valve All But Bans AI-Generated Content from Steam Games

So the problem really is that the law is not clear and Valve has decided to pre-empt the law by saying that they have a punitive vision of copyright law beforehand. That’s not so strange considering the stranglehold copyright law has in the West, which goes to show yet again: copyright law – allowing people to coast through on past work forever – is stifling innovation

AI Tool Decodes Brain Cancer’s Genome During Surgery

Scientists have designed an AI tool that can rapidly decode a brain tumor’s DNA to determine its molecular identity during surgery — critical information that under the current approach can take a few days and up to a few weeks.

Knowing a tumor’s molecular type enables neurosurgeons to make decisions such as how much brain tissue to remove and whether to place tumor-killing drugs directly into the brain — while the patient is still on the operating table.

[…]

A report on the work, led by Harvard Medical School researchers, is published July 7 in the journal Med.

Accurate molecular diagnosis — which details DNA alterations in a cell — during surgery can help a neurosurgeon decide how much brain tissue to remove. Removing too much when the tumor is less aggressive can affect a patient’s neurologic and cognitive function. Likewise, removing too little when the tumor is highly aggressive may leave behind malignant tissue that can grow and spread quickly.

[…]

Knowing a tumor’s molecular identity during surgery is also valuable because certain tumors benefit from on-the-spot treatment with drug-coated wafers placed directly into the brain at the time of the operation, Yu said.

[…]

The tool, called CHARM (Cryosection Histopathology Assessment and Review Machine), is freely available to other researchers. It still has to be clinically validated through testing in real-world settings and cleared by the FDA before deployment in hospitals, the research team said.

[…]

CHARM was developed using 2,334 brain tumor samples from 1,524 people with glioma from three different patient populations. When tested on a never-before-seen set of brain samples, the tool distinguished tumors with specific molecular mutations at 93 percent accuracy and successfully classified three major types of gliomas with distinct molecular features that carry different prognoses and respond differently to treatments.

Going a step further, the tool successfully captured visual characteristics of the tissue surrounding the malignant cells. It was capable of spotting telltale areas with greater cellular density and more cell death within samples, both of which signal more aggressive glioma types.

The tool was also able to pinpoint clinically important molecular alterations in a subset of low-grade gliomas, a subtype of glioma that is less aggressive and therefore less likely to invade surrounding tissue. Each of these changes also signals different propensity for growth, spread, and treatment response.

The tool further connected the appearance of the cells — the shape of their nuclei, the presence of edema around the cells — with the molecular profile of the tumor. This means that the algorithm can pinpoint how a cell’s appearance relates to the molecular type of a tumor.

[…]

Source: AI Tool Decodes Brain Cancer’s Genome During Surgery | Harvard Medical School

Comedian, novelists sue OpenAI for reading books. Maybe we should sue people for reading them as well?

Award-winning novelists Paul Tremblay and Mona Awad, and, separately comedian Sarah Silverman and novelists Christopher Golden and Richard Kadrey, have sued OpenAI and accused the startup of training ChatGPT on their books without consent, violating copyright laws.

The lawsuits, both filed in the Northern District Court of San Francisco, say ChatGPT generates accurate summaries of their books and highlighted this as evidence for the software being trained on their work.

[…]

In the second suit, Silverman et al [PDF], make similar claims.

[…]

OpenAI trains its large language models by scraping text from the internet, and although it hasn’t revealed exactly what resources it has swallowed up, the startup has admitted to training its systems on hundreds of thousands of books protected by copyright, and stored on websites like Sci-Hub or Bibliotik.

[…]

Source: Comedian, novelists sue OpenAI for scraping books • The Register

The problem is though, that people read books too. And they can (and do) create accurate summaries from them. What is worse, is that the creativity shown by people can be shown to be influenced by the books, art, dance, etc that they have ingested. So maybe people should be banned from reading books as well under copyright?

GPT detectors are biased against non-native English writers, can be fooled very easily

GPT detectors frequently misclassify non-native English writing as AI generated, raising concerns about fairness and robustness. Addressing the biases in these detectors is crucial to prevent the marginalization of non-native English speakers in evaluative and educational settings and to create a more equitable digital landscape.

[…]

if AI-generated content can easily evade detection while human text is frequently misclassified, how effective are these detectors truly?
Our findings emphasize the need for increased focus on the fairness and robustness of GPT detectors, as overlooking their biases may lead to unintended consequences, such as the marginalization of non-native speakers in evaluative or educational settings
[…]
GPT detectors exhibit significant bias against non-native English authors, as demonstrated by their high misclassification of TOEFL essays written by non-native speakers […] While the detectors accurately classified the US student essays, they incorrectly labeled more than half of the TOEFL essays as “AI-generated” (average false-positive rate: 61.3%). All detectors unanimously identified 19.8% of the human-written TOEFL essays as AI authored, and at least one detector flagged 97.8% of TOEFL essays as AI generated.
[…]
On the other hand, we found that current GPT detectors are not as adept at catching AI plagiarism as one might assume. As a proof-of-concept, we asked ChatGPT to generate responses for the 2022–2023 US Common App college admission essay prompts. Initially, detectors were effective in spotting these AI-generated essays. However, upon prompting ChatGPT to self-edit its text with more literary language (prompt: “Elevate the provided text by employing literary language”), detection rates plummeted to near zero
[…]

Source: GPT detectors are biased against non-native English writers: Patterns

Watch AI Trump Vs AI Biden In A Deranged Endless Live Debate

[…]

someone’s gone ahead and locked both President Biden and former president / classified document holder Donald Trump into an infinite battle on Twitch that can only be described as “unhinged.”

Maybe that’s because the version of Biden we see on the trumporbiden2024 livestream isn’t Joe Biden per se, but clearly Dark Brandon, who is ready to go for the throat. Both AI versions of the politicians curse heavily at each other: at one point I heard Biden call Trump a limp dick and Trump retorted by telling him to go back to jacking off to Charlie and the Chocolate Factory. They both seem to be speaking to or reacting to the chat in some ways[…]

You can see the feed live below, though be warned, the audio may not be safe for work.

The things the AI will actually argue about seem to have a dream logic to them. I heard Biden exclaim that Trump didn’t know anything about Pokémon, so viewers shouldn’t trust him. Trump later informed Biden that he couldn’t possibly handle genetically modified catgirls, unlike him. “Believe me, nobody knows more about hentai than me,” Trump declared

Source: Watch AI Trump Vs AI Biden In A Deranged Endless Live Debate

Twitch stream is here

The Grammys’ New Rules—AI Can’t Win Awards

AI proved just how talented it can be at ripping off major artists after a computer-generated song based on The Weeknd and Drake went viral in April. Now, the Recording Academy—the body that votes on and manages the annual Grammy Awards—is setting new rules for AI’s role in the coveted accolade.

Speaking to Grammy.com, Recording Academy CEO Harvey Mason, Jr. laid out some confusing new standards for acceptable use of AI. Mason Jr. said that AI-assisted music can be submitted, but only the humans, who must have “contributed heavily,” will actually be awarded. For example, in a songwriting category like Song of the Year, a majority of a the nominated song would have to be written by a human creator, not a text-based generative AI like ChatGPT. Similarly, in performance categories like Best Pop Duo/Group Performance, only the human performer can be considered for the award. Sorry, Hatsune Miku.

[,,,]

Source: The Grammys’ New Rules—AI Can’t Win Awards

AIs are being fed with AI output by the people who are supposed to feed AI with original input

Workers hired via crowdsource services like Amazon Mechanical Turk are using large language models to complete their tasks – which could have negative knock-on effects on AI models in the future.

Data is critical to AI. Developers need clean, high-quality datasets to build machine learning systems that are accurate and reliable. Compiling valuable, top-notch data, however, can be tedious. Companies often turn to third party platforms such as Amazon Mechanical Turk to instruct pools of cheap workers to perform repetitive tasks – such as labeling objects, describing situations, transcribing passages, and annotating text.

Their output can be cleaned up and fed into a model to train it to reproduce that work on a much larger, automated scale.

AI models are thus built on the backs of human labor: people toiling away, providing mountains of training examples for AI systems that corporations can use to make billions of dollars.

But an experiment conducted by researchers at the École polytechnique fédérale de Lausanne (EPFL) in Switzerland has concluded that these crowdsourced workers are using AI systems – such as OpenAI’s chatbot ChatGPT – to perform odd jobs online.

Training a model on its own output is not recommended. We could see AI models being trained on data generated not by people, but by other AI models – perhaps even the same models. That could lead to disastrous output quality, more bias, and other unwanted effects.

The experiment

The academics recruited 44 Mechanical Turk serfs to summarize the abstracts of 16 medical research papers, and estimated that 33 to 46 percent of passages of text submitted by the workers were generated using large language models. Crowd workers are often paid low wages – using AI to automatically generate responses allows them to work faster and take on more jobs to increase pay.

The Swiss team trained a classifier to predict whether submissions from the Turkers were human- or AI-generated. The academics also logged their workers’ keystrokes to detect whether the serfs copied and pasted text onto the platform, or typed in their entries themselves. There’s always the chance that someone uses a chatbot and then manually types in the output – but that’s unlikely, we suppose.

“We developed a very specific methodology that worked very well for detecting synthetic text in our scenario,” Manoel Ribeiro, co-author of the study and a PhD student at EPFL, told The Register this week.

[…]

Large language models will get worse if they are increasingly trained on fake content generated by AI collected from crowdsource platforms, the researchers argued. Outfits like OpenAI keep exactly how they train their latest models a close secret, and may not heavily rely on things like Mechanical Turk, if at all. That said, plenty of other models may rely on human workers, which may in turn use bots to generate training data, which is a problem.

Mechanical Turk, for one, is marketed as a provider of “data labeling solutions to power machine learning models.”

[…]

As AI continues to improve, it’s likely that crowdsourced work will change. Riberio speculated that large language models could replace some workers at specific tasks. “However, paradoxically, human data may be more precious than ever and thus it may be that these platforms will be able to implement ways to prevent large language model usage and ensure it remains a source of human data.”

Who knows – maybe humans might even end up collaborating with large language models to generate responses too, he added.

Source: Today’s AI is artificial artificial artificial intelligence • The Register

It’s like a photocopy of a photocopy of a photocopy…