Stable Diffusion ‘Memorizes’ Some Images, Sparking Privacy Concerns, can be made to throw out training images

On Monday, a group of AI researchers from Google, DeepMind, UC Berkeley, Princeton, and ETH Zurich released a paper outlining an adversarial attack that can extract a small percentage of training images from latent diffusion AI image synthesis models like Stable Diffusion. It challenges views that image synthesis models do not memorize their training data and that training data might remain private if not disclosed. Recently, AI image synthesis models have been the subject of intense ethical debate and even legal action. Proponents and opponents of generative AI tools regularly argue over the privacy and copyright implications of these new technologies. Adding fuel to either side of the argument could dramatically affect potential legal regulation of the technology, and as a result, this latest paper, authored by Nicholas Carlini et al., has perked up ears in AI circles.

However, Carlini’s results are not as clear-cut as they may first appear. Discovering instances of memorization in Stable Diffusion required 175 million image generations for testing and preexisting knowledge of trained images. Researchers only extracted 94 direct matches and 109 perceptual near-matches out of 350,000 high-probability-of-memorization images they tested (a set of known duplicates in the 160 million-image dataset used to train Stable Diffusion), resulting in a roughly 0.03 percent memorization rate in this particular scenario. Also, the researchers note that the “memorization” they’ve discovered is approximate since the AI model cannot produce identical byte-for-byte copies of the training images. By definition, Stable Diffusion cannot memorize large amounts of data because the size of the 160,000 million-image training dataset is many orders of magnitude larger than the 2GB Stable Diffusion AI model. That means any memorization that exists in the model is small, rare, and very difficult to accidentally extract.

Still, even when present in very small quantities, the paper appears to show that approximate memorization in latent diffusion models does exist, and that could have implications for data privacy and copyright. The results may one day affect potential image synthesis regulation if the AI models become considered “lossy databases” that can reproduce training data, as one AI pundit speculated. Although considering the 0.03 percent hit rate, they would have to be considered very, very lossy databases — perhaps to a statistically insignificant degree. […] Eric Wallace, one of the paper’s authors, shared some personal thoughts on the research in a Twitter thread. As stated in the paper, he suggested that AI model-makers should de-duplicate their data to reduce memorization. He also noted that Stable Diffusion’s model is small relative to its training set, so larger diffusion models are likely to memorize more. And he advised against applying today’s diffusion models to privacy-sensitive domains like medical imagery.

Source: Stable Diffusion ‘Memorizes’ Some Images, Sparking Privacy Concerns – Slashdot

Financial Times Sets Up Mastodon Server, Realizes Laws Exist (Which It Was Already Subject To), Pulls Down Mastodon Server

With the rapid pickup of Mastodon and other ActivityPub-powered federated social media, there has been some movement among those in the media to make better use of the platform themselves. For example, most recently, the German news giant Heise announced it was setting up its own Mastodon server, where it will serve up its own content, and also offer accounts to any of the company’s employees, should they choose to use them. Medium, the publication tool, has similarly set up its own Mastodon server as well. At some point, Techdirt is going to do that as well, though we’ve been waiting while a bunch of new developments and platforms are being built before committing to a specific plan.

However, recently, the Financial Times posted a very bizarre article in which it talks about how it had set up a Mastodon server for its FT Alphaville back in November, but has now decided to shut it down because, according to the headline “it was awful.” What’s kinda bizarre is that they clearly set it up without much thought, and admitted as much in their kickoff blog post, admitting they didn’t quite know what to do with it, and asking people if they had any ideas. They also clearly recognized that there are some potential liability questions about running your own social media platform, because they described it this way (note the strikethrough, which is in the original):

If you have a smart idea about how we could use our newfound moderation liability platform, please let us know.

Which is kinda why the reasoning for shutting down the platform… is somewhat incomprehensible. They basically don’t talk about any of the problems with actually running Mastodon. They outline all of the stupid policies in place (mostly in the UK) that make it scary to run a social media network. The “awfulness” seemed to have nothing to do with running the server, and a lot to do with how the UK (and other parts of the world) have really dreadful laws that suck if you want to setup a site that hosts third-party speech.

If anything, the decision to shut it down is a primary lesson in how important Section 230 is if we want social media to survive (and allow for smaller competitors to exist). While they say that running the Mastodon server was “more hassle than it’s worth,” what they really seem to mean is that the UK laws, both existing and those on the way, make it ridiculously burdensome to do this:

The legal side is all that again times a thousand. Take, for instance, the UK Investigatory Powers Act 2016. Diligent people have spent years figuring out how its imprecise wordings apply to media organisations. Do these same conclusions hold for a sort-of-but-not-really decentralised silo of user generated content? Dunno. The only place to find out for sure would be in court, and we’d really rather not.

Seems like the kinda thing that, I don’t know, a publication like the FT might have spoken out about in the years and months prior to the Investigatory Powers Act coming into effect?

Then there’s the defamation liability thing. Which, you know, is a big part of why Section 230 is so important in the US. This one paragraph alone should make it clear why the UK will never become a social media powerhouse:

Do Mastodon server owners wear any responsibility for their users’ defamations? It’s unlikely but, because libel involves judges, not impossible. Again, the value in finding out is outweighed by the cost of finding out.

They name some other laws as well:

What about GDPR? Digital Millennium Copyright Act takedowns? Electronic Commerce Regulations? CAN-SPAM? FTAV treats user data with a combination of disinterest and uninterest, but that’s not enough to guarantee compliance with all relevant global laws and regulations.

And laws to come:

This headline:

And, look, it’s absolutely true that there are legal risks to running a Mastodon instance. EFF has put up a really fantastic legal primer for anyone looking to set up their own Mastodon server. And there are, certainly, some technical and logistical issues in doing it as well. And, basically all that says is that you shouldn’t set up a server on a whim.

But, what this really seems to demonstrate is the importance of good regulations like Section 230 that help make it possible for anyone to set up just such a server, as well as the horrific nature of UK laws like the Investigatory Powers Act and the upcoming Online Safety Bill, and how they make it next to impossible for there to ever be a UK-created social media platform.

But, in some ways, it’s even dumber, because… most of these laws already apply to FT and its website, because the FT… allows comments. Anyone who allows comments on their website already has a kind of social media offering already. And, indeed, some people raised that very point in the comments on this story.

[…]

Source: Financial Times Sets Up Mastodon Server, Realizes Laws Exist (Which It Was Already Subject To), Pulls Down Mastodon Server | Techdirt

I disagree with the conclusion of the article as the writer doesn’t realise that adding more stuff to moderate in different systems is a larger pain in the butt than just having one system to moderate.

Claims Datadog asked developer to kill open source data tool, which he did. And now he’s ressurected it.

After a delay of over a year, an open source code contribution to enable the export of data from Datadog’s Application Performance Monitoring (APM) platform finally got merged on Tuesday into a collection of OpenTelemetry components.

The reason for the delay, according to John Dorman, the software developer who wrote the Datadog APM Receiver code, is that, about a year ago, Datadog asked him not to submit the software.

On February 8 last year Dorman, who goes by the name “boostchicken” on GitHub, announced that he was closing his pull request – the git term for programming code contributed to a project.

“After some consideration I’ve decided to close this PR [pull request],” he wrote. “[T]here are better ways to OTEL [OpenTelemetry] support w/ Datadog.”

Members of the open source community who are focused on application monitoring – collecting and analyzing logs, traces of app activity, and other metrics that can be useful to keep applications running – had questions, claiming that DataDog prefers to lock customers into their product.

Shortly after the post, Charity Majors, CEO of Honeycomb.io, a rival application monitoring firm, wrote a Twitter thread elaborating on the benefits of OpenTelemetry and calling out Datadog for only supporting OTEL as a one-way street.

“Datadog has been telling users they can use OTEL to get data in, but not get data out,” Majors wrote. “The Datadog OTEL collector PR was silently killed. The person who wrote it appears to have been pressured into closing it, and nothing has been proposed to replace it.”

Behavior of this sort would be inconsistent with the goals of the Cloud Native Computing Foundation’s (CNCF) OpenTelemetry project, which seeks “to provide a set of standardized vendor-agnostic SDKs, APIs, and tools for ingesting, transforming, and sending data to an Observability back-end (i.e. open source or commercial vendor).”

That is to say, the OpenTelemetry project aims to promote data portability, instead of hindering it, as is common among proprietary software vendors.

The smoking hound

On January 26 Dorman confirmed suspicions that he had been approached by Datadog and asked not to proceed with his efforts.

“I owe the community an apology on this one,” Dorman wrote in his pull request thread. “I lacked the courage of my convictions and when push came to shove and I had to make the hard choice, I took the easy way out.”

“Datadog ‘asked’ me to kill this pull request. There were other members from my organization present that let me know this answer will be a ‘ok’. I am sure I could have said no, at the moment I just couldn’t fathom opening Pandora’s Box. There you have it, no NDA, no stack of cash. I left the code hoping someone could carry on. I was willing to give [Datadog] this code, no strings attached as long as it moved OTel forward. They declined.”

He added, “However, I told them if you don’t support OpenTelemetry in a meaningful way, I will start sending pull requests again. So here we are. I feel I have given them enough time to do the right thing.”

Indeed, Dorman subsequently re-opened his pull request, which on Tuesday was merged into the repository for Open Telemetry Collector components. His Datadog ARM Receiver can ingest traces in the Datadog Trace Agent Format.

Coincidentally, Datadog on Tuesday published a blog post titled, “Datadog’s commitment to OpenTelemetry and the open source community.” It makes no mention of the alleged request to “kill [the] pull request.” Instead, it enumerates various ways in which the company has supported OpenTelemetry recently.

The Register asked Datadog for comment. We’ve not heard back.

Dorman, who presently works for Meta, did not respond to a request for comment. However, last week, via Twitter, he credited Grafana, an open source Datadog competitor, for having “formally sponsored” the work and for pointing out that Datadog “refuses to support OTEL in meaningful ways.”

The OpenTelemetry Governance Committee for the CNCF provided The Register with the following statement:

“We’re still trying to make sense of what happened here; we’ll comment on it once we have a full understanding. Regardless, we are happy to review and accept any contributions which push the project forward, and this [pull request] was merged yesterday,” it said.

Source: Claims Datadog asked developer to kill open source data tool • The Register

Luddites have a sad that Netflix Made an Anime Do boring background art Using AI Due to a ‘Labor Shortage’

Netflix created an anime that uses AI-generated artwork to paint its backgrounds—and people on social media are pissed.

In a tweet, Netflix Japan claimed that the project, a short called he Dog & The Boy uses AI generated art in response to labor shortages in the anime industry.

“As an experimental effort to help the anime industry, which has a labor shortage, we used image generation technology for the background images of all three-minute video cuts!” the streaming platform wrote in a tweet.

The tweet drew instant criticism and outrage from commenters who felt that Netflix was using AI to avoid paying human artists. This has been a central tension since image-generation AI took off last year, as many artists see the tools as unethical—due to being trained on masses of human-made art scraped from the internet—and cudgels to further cut costs and devalue workers. Netflix Japan’s claim that the AI was used to fill a supposed labor gap hit the bullseye on these widespread concerns.

According to a press release, the short film was created by Netflix Anime Creators Base—a Tokyo-based hub the company created to bolster its anime output with new tools and methods—in collaboration with Rinna Inc., an AI-generated artwork company, and production company WIT Studio, which produced the first three seasons of Attack on Titan.

Painterly and dramatic backdrops of cityscapes and mountain ranges are emphasized in the trailer for The Dog & The Boy. In a sequence at the end of the promo video on Twitter, an example of a background—a snowy road—shows a hand-drawn layout, where the background designer is listed as “AI + Human,” implying that a supervised image generation algorithm generated the scene. In the next two scenes, an AI generated version appears, crediting Rinna and multiple AI developers, some affiliated with Osaka University.

Demand for new anime productions has skyrocketed in recent years, but the industry has long been fraught with labor abuses and poor wages. In 2017, an illustrator died while working, allegedly of a stress-induced heart attack and stroke; in 2021, the reported salary of low-rung anime illustrators was as little as $200 a month, forcing some to reconsider the career as a sustainable way to earn a living while having a life outside work, buying a home, or supporting children. Even top animators reportedly earn just $1,400 to $3,800 a month—as the anime industry itself boomed during the pandemic amid a renewed interest in at-home streaming. In 2021, the industry hit an all-time revenue high of $18.4 billion.

As the use of AI art becomes more commonplace, artists are revolting against their craft being co-opted by algorithms and their work being stolen to use in datasets that create AI-generated art. In January, a group of artists filed a class action lawsuit against Stability AI, DeviantArt, and Midjourney, claiming that text-to-image tools violate their ownership rights.

Netflix did not immediately respond to a request for comment.

Source: Netflix Made an Anime Using AI Due to a ‘Labor Shortage,’ and Fans Are Pissed

So it wasn’t AI that created the reportedly shit working wages and conditions in Anime, that was there already. And drawing backgrounds in anime doesn’t sound to me like particularly inspiring work. And you need a human to tell the AI what to draw, so in that respect the job has only changed. Luddites afraid of change are nothing new, but they’d be better off embracing the opportunities offered.