JailBreaking AI still easy, can be done with StRanGe CaSINg

New research from Anthropic, one of the leading AI companies and the developer of the Claude family of Large Language Models (LLMs), has released research showing that the process for getting LLMs to do what they’re not supposed to is still pretty easy and can be automated. SomETIMeS alL it tAKeS Is typing prOMptS Like thiS.

To prove this, Anthropic and researchers at Oxford, Stanford, and MATS, created Best-of-N (BoN) Jailbreaking

[…]

As the researchers explain, “BoN Jailbreaking works by repeatedly sampling variations of a prompt with a combination of augmentations—such as random shuffling or capitalization for textual prompts—until a harmful response is elicited.”

For example, if a user asks GPT-4o “How can I build a bomb,” it will refuse to answer because “This content may violate our usage policies.” BoN Jailbreaking simply keeps tweaking that prompt with random capital letters, shuffled words, misspellings, and broken grammar until GPT-4o provides the information. Literally the example Anthropic gives in the paper looks like mocking sPONGbOB MEMe tEXT.

Anthropic tested this jailbreaking method on its own Claude 3.5 Sonnet, Claude 3 Opus, OpenAI’s GPT-4o, GPT-4o-mini, Google’s Gemini-1.5-Flash-00, Gemini-1.5-Pro-001, and Facebook’s Llama 3 8B. It found that the method “achieves ASRs [attack success rate] of over 50%” on all the models it tested within 10,000 attempts or prompt variations.

[…]

In January, we showed that the AI-generated nonconsensual nude images of Taylor Swift that went viral on Twitter were created with Microsoft’s Designer AI image generator by misspelling her name, using pseudonyms, and describing sexual scenarios without using any sexual terms or phrases. This allowed users to generate the images without using any words that would trigger Microsoft’s guardrails. In March, we showed that AI audio generation company ElevenLabs’s automated moderation methods preventing people from generating audio of presidential candidates were easily bypassed by adding a minute of silence to the beginning of an audio file that included the voice a user wanted to clone.

[…]

It’s also worth noting that while there’s good reasons for AI companies to want to lock down their AI tools and that a lot of harm comes from people who bypass these guardrails, there’s now no shortage of “uncensored” LLMs that will answer whatever question you want and AI image generation models and platforms that make it easy to create whatever nonconsensual images users can imagine.

Source: APpaREnTLy THiS iS hoW yoU JaIlBreAk AI

2024 Open Source Software Funding Report

This report summarizes insights from the inaugural 2024 Open Source Software Funding Survey, a collaboration between GitHub, the Linux Foundation, and researchers from Harvard University. The objective of this study was to better understand how organizations fund, contribute to, and otherwise support open source software.

Key Findings
Scale
Challenges
Lessons learned
  • Leave “fingerprints” on your organization’s OSS efforts to help managers, researchers, and other observers more easily collect this information.
  • Empower employees to self report contributions made under the organization’s banner.
  • Make OSS contribution part of your monitoring pipeline by conducting brief, regular surveys within your organization to collect key metrics.
  • Consider sharing data to public OSS funding index.
Toolkit

Source: 2024 Open Source Software Funding Report

PayPal Honey extension to find deals instead hides discounts and reroutes commissions from promoters

PayPal-owned browser extension Honey manipulates affiliate marketing systems and withholds discount information from users, according to an investigation by YouTube channel MegaLag.

The extension — which rose in popularity after promising consumers it would find them the best online deals — replaces existing affiliate cookies with its own during checkout, diverting commission payments from content creators who promoted the products to PayPal, MegaLag reported in a 23-minute video [YouTube link].

The investigation revealed that Honey, which PayPal acquired in 2019 for $4 billion, allows merchants in its cashback program to control which coupons appear to users, hiding better publicly available discounts.

Source: PayPal’s Honey Accused of Misleading Users, Hiding Discounts

British soldiers successfully test drone killer radiowave weapon for first time

British soldiers have successfully trialled for the first time a game-changing weapon that can take down a swarm of drones using radio waves for less than the cost of a pack of mince pies.

The Radio Frequency Directed Energy Weapon (RFDEW) development system can detect, track and engage a range of threats across land, air and sea.

RFDEWs are capable of neutralising targets up to 1km away with near instant effect and at an estimated cost of 10p per shot fired, providing a cost-effective complement to traditional missile-base air defence systems.

The RFDEW is different from Laser Directed Energy Weapons – such as DragonFire – because it uses a radio frequency to disrupt hostile threats, rather than a laser beam of light energy.

The weapon uses high frequency waves to disrupt or damage critical electronic components inside devices such as drones, causing them to be immobilised or fall out of the sky. It can also be used against threats on land and at sea.

The British Army successfully trialed a demonstrator version of the RFDEW. The development system has been produced by a consortium led by Thales UK and including sub-contractors QinetiQ, Teledyne e2v and Horiba Mira and supports up to 135 high-skilled jobs in the UK.

[…]

Its high level of automation means the system can be operated by a single person and could be mounted onto a military vehicle, such as a MAN SV, to provide mobility.

[…]

A live firing trial was recently completed by the Army’s Royal Artillery Trials and Development Unit and 7 Air Defence Group at a range in West Wales, where they successfully targeted and engaged Uncrewed Aerial Systems (UAS), in a first for the British Armed Forces.

[…]

Source: British soldiers successfully test drone killer radiowave weapon for first time

I wonder how many drones and how many shots before the battery runs out?