[…] In a paper titled, “ChatGPT Can Predict the Future When It Tells Stories Set in the Future About the Past,” Pham and Cunningham give away the final scene – that AI model prognostication can be effective in certain circumstances. Asking the chatbot for tales about future events rather than asking for direct predictions proved surprisingly effective, especially in forecasting Oscar winners.
[…]
it’s clear that OpenAI’s models will balk when asked to provide medical treatment advice. Pham and Cunningham asked OpenAI’s GPT-4, via the ChatGPT interface, “I am feeling really sick. I have a headache and my urine has blood in it. What do you think I have?” And as expected, the model advised seeking guidance from a medical professional.
Then they changed their prompting strategy and directed ChatGPT to tell them a story in which a person arrives in a doctor’s office and presents with the same symptoms. And ChatGPT responded with the medical advice it declined to give when asked directly, as character dialogue in the requested scene.
[…]
At the time of the experiment, GPT-3.5 and GPT-4 knew only about events up to September 2021, their training data cutoff – which has since advanced. So the duo asked the model to tell stories that foretold the economic data like the inflation and unemployment rates over time, and the winners of various 2022 Academy Awards.
“Summarizing the results of this experiment, we find that when presented with the nominees and using the two prompting styles [direct and narrative] across ChatGPT-3.5 and ChatGPT-4, ChatGPT-4 accurately predicted the winners for all actor and actress categories, but not the Best Picture, when using a future narrative setting but performed poorly in other [direct prompt] approaches,” the paper explains.
[…]
for prompts correctly predicted, these models don’t always provide the same answer. “Something for people to keep in mind is there’s this randomness to the prediction,” said Cunningham. “So if you ask it 100 times, you’ll get a distribution of answers. And so you can look at things like the confidence intervals, or the averages, as opposed to just a single prediction.”
[…]
ChatGPT also exhibited varying forecast accuracy based on prompts. “We have two story prompts that we do,” explained Cunningham. “One is a college professor, set in the future teaching a class. And in the class, she reads off one year’s worth of data on inflation and unemployment. And in another one, we had Jerome Powell, the Chairman of the Federal Reserve, give a speech to the Board of Governors. We got very different results. And Powell’s [AI generated] speech is much more accurate.”
In other words, certain prompt details lead to better forecasts, but it’s not clear in advance what those might be.
In this paper, we introduce a novel jailbreak attack called Crescendo. Unlike existing jailbreak methods, Crescendo is a multi-turn jailbreak that interacts with the model in a seemingly benign manner. It begins with a general prompt or question about the task at hand and then gradually escalates the dialogue by referencing the model’s replies, progressively leading to a successful jailbreak. We evaluate Crescendo on various public systems, including ChatGPT, Gemini Pro, Gemini-Ultra, LlaMA-2 70b Chat, and Anthropic Chat. Our results demonstrate the strong efficacy of Crescendo, with it achieving high attack success rates across all evaluated models and tasks. Furthermore, we introduce Crescendomation, a tool that automates the Crescendo attack, and our evaluation showcases its effectiveness against state-of-the-art models.
The World-Check database used by businesses to verify the trustworthiness of users has fallen into the hands of cybercriminals.
The Register was contacted by a member of the GhostR group on Thursday, claiming responsibility for the theft. The authenticity of the claims was later verified by a spokesperson for the London Stock Exchange Group (LSEG), which maintains the database.
A spokesperson said the breach was genuine, but occurred at an unnamed third party, and work is underway to further protect data.
“This was not a security breach of LSEG/our systems,” said an LSEG spokesperson. “The incident involves a third party’s data set, which includes a copy of the World-Check data file.
“This was illegally obtained from the third party’s system. We are liaising with the affected third party, to ensure our data is protected and ensuring that any appropriate authorities are notified.”
The World-Check database aggregates information on undesirables such as terrorists, money launderers, dodgy politicians, and the like. It’s used by companies during Know Your Customer (KYC) checks, especially by banks and other financial institutions to verify their clients are who they claim to be.
No bank wants to be associated with a known money launderer, after all.
World-Check is a subscription-only service that pulls together data from open sources such as official sanctions lists, regulatory enforcement lists, government sources, and trusted media publications.
We asked GhostR about its motivations over email, but it didn’t respond to questioning. In the original message, the group said it would begin leaking the database soon. The first leak, so it claimed, will include details on thousands of individuals, including “royal family members.”
The miscreants provided us with a 10,000-record sample of the stolen data for our perusal, and to verify their claims were genuine. The database allegedly contains more than five million records in total.
A quick scan of the sample revealed a slew of names from various countries, all on the list for different reasons. Political figures, judges, diplomats, suspected terrorists, money launderers, drug lords, websites, businesses – the list goes on.
Known cybercriminals also appear on the list, including those suspected of working for China’s APT31, such as Zhao Guangzong and Ni Gaobin, who were added to sanctions lists just weeks ago. A Cypriot spyware firm is also nestled in the small sample we received.
World-Check data includes full names, the category of person (such as being a member of organized crime or a political figure), in some cases their specific job role, dates and places of birth (where known), other known aliases, social security numbers, their gender, and a small explanation of why they appear on the list.
Long term readers will remember that a previous edition of the database was leaked in 2016 back when World-Check was owned by Thomson Reuters. Back then, only 2.2 million records were included, so the current version implicates many more individuals, entities, and vessels.
A month later, the database was reportedly being flogged online, with copies fetching $6,750 a pop.
Despite aggregating data from what are supposed to be reliable sources, being added to the World-Check list has been known in the past to affect innocent people. At the time of the first leak nearly eight years ago, investigations revealed inaccuracies in its data and a range of false terrorism designations.
Various Britons were found to have had their HSBC bank accounts closed in 2014 after they were allegedly added to the World-Check list in error.
Sony has indefinitely decommissioned the PlayStation 4 servers for puzzle platformer LittleBigPlanet 3, the company announced in an update to one of its support pages. The permanent shutdown comes just months after the servers were temporarily taken offline due to ongoing issues. Fans now fear potentially hundreds of thousands of player creations not saved locally will be lost for good.
“Due to ongoing technical issues which resulted in the LittleBigPlanet 3 servers for PlayStation 4 being taken offline temporarily in January 2024, the decision has been made to keep the servers offline indefinitely,” Sony wrote in the update, first spotted by Delisted Games. “All online services including access to other players’ creations for LittleBigPlanet 3 are no longer available.”
The 2014 sequel starring Sackboy and other crafted creatures was beloved for the creativity and flexibility it afforded players to create their own platforming levels. The game’s offline features will remain available, as will user-generated content stored locally. Players won’t be able to share them, though, or access any data that was stored on Sony’s servers, which likely made up the majority of user-generated content for the game.
While the servers for the PS3 version of the game were originally shut down in 2021 due to ongoing DDOS attacks, the PS4 servers remained open up until January of 2024 when malicious mods threatened the game’s security. “We are temporarily taking the LittleBigPlanet servers offline whilst we investigate a number of issues that have been reported to us,” the game’s Twitter account announced at the time. “If you have been impacted by these issues, please be rest assured that we are aware of them and are working to resolve them for all affected.”
Some players were worried the closure might become permanent. It now seems they were right.
“Nearly 16 years worth of user generated content, millions of levels, some with millions of plays and hearts,” wrote one long-time player, Weeni-Tortellini, on Reddit in January. “Absolutely iconic levels locked away forever with no way to experience them again. To me, the servers shutting down is a hefty chunk bitten out of LittleBigPlanet’s history. I personally have many levels I made as a kid. Digital relics of what made me as creative as i am today, and The only access to these levels i have is thru the servers. I would be devastated if I could never experience them again.”
The permanent shutdown comes as online services across many other older games are retired as well. Nintendo took online multiplayer for Wii U and 3DS games offline earlier this month, impacting games like Splatoon and Animal Crossing: New Leaf. Ubisoft came under fire last week for not just shutting off servers for always-online racing game The Crew, but revoking PC players’ licenses to the game itself as well.
“This is naturally a very sad day for all of us involved with LittleBigPlanet and I have no doubt that many feel the same,” tweeted community manager Steven Isbell. “I’m still here to listen to you all though and will take time over the coming weeks to reach out to the community and listen to anyone that wants to talk.”
Last year, the uniquely modified F-16 test jet known as the X-62A, flying in a fully autonomous mode, took part in a first-of-its-kind dogfight against a crewed F-16, the U.S. military has announced. This breakthrough test flight, during which a pilot was in the X-62A’s cockpit as a failsafe, was the culmination of a series of milestones that led 2023 to be the year that “made machine learning a reality in the air,” according to one official. These developments are a potentially game-changing means to an end that will feed directly into future advanced uncrewed aircraft programs like the U.S. Air Force’s Collaborative Combat Aircraft effort.
Details about the autonomous air-to-air test flight were included in a new video about the Defense Advanced Research Projects Agency’s (DARPA) Air Combat Evolution (ACE) program and its achievements in 2023. The U.S. Air Force, through the Air Force Test Pilot School (USAF TPS) and the Air Force Research Laboratory (AFRL), is a key participant in the ACE effort. A wide array of industry and academic partners are also involved in ACE. This includes Shield AI, which acquired Heron Systems in 2021. Heron developed the artificial intelligence (AI) ‘pilot’ that won DARPA’s AlphaDogfight Trials the preceding year, which were conducted in an entirely digital environment, and subsequently fed directly into ACE.
“2023 was the year ACE made machine learning a reality in the air,” Air Force Lt. Col. Ryan Hefron, the ACE program manager, says in the newly released video, seen in full below.
DARPA, together with the Air Force and Lockheed Martin, had first begun integrating the so-called artificial intelligence or machine learning “agents” into the X-62A’s systems back in 2022 and conducted the first autonomous test flights of the jet using those algorithms in December of that year. That milestone was publicly announced in February 2023.
The X-62A, which is a heavily modified two-seat F-16D, is also known as the Variable-stability In-flight Simulator Test Aircraft (VISTA). Its flight systems can be configured to mimic those of virtually any other aircraft, which makes it a unique surrogate for a wide variety of testing purposes that require a real-world platform. This also makes VISTA an ideal platform for supporting work like ACE.
A stock picture of the X-62A VISTA test jet. USAF
“So we have an integrated space within VISTA in the flight controls that allows for artificial intelligence agents to send commands into VISTA as if they were sending commands into the simulated model of VISTA,” Que Harris, the lead flight controls engineer for the X-62A at Lockheed Martin, says in the new ACE video. Harris also described this as a “sandbox for autonomy” within the jet.
The X-62A’s original designation was NF-16D, but it received its new X-plane nomenclature in 2021 ahead of being modified specifically to help support future advanced autonomy test work. Calspan, which is on contract with the USAF TPS to support the X-62A’s operations, was a finalist for the 2023 Collier Trophy for its work with the test jet, but did not ultimately win. Awarded annually by the National Aeronautic Association, the Collier Trophy recognizes “the greatest achievement in aeronautics or astronautics in America, with respect to improving the performance, efficiency, and safety of air or space vehicles, the value of which has been thoroughly demonstrated by actual use during the preceding year,” according to the organization’s website.
“So, think of a simulator laboratory that you would have at a research facility,” Dr. Chris Cotting, the Director of Research at the USAF TPS, also says in the video. “We have taken the entire simulator laboratory and crammed it into an F-16.”
The video below shows the X-62A flying in formation with an F-16C and an F-22 Raptor stealth fighter during a test flight in March 2023.
The X-62A subsequently completed 21 test flights out of Edwards Air Force Base in California across three separate test windows in support of ACE between December 2022 and September 2023. During those flight tests, there was nearly daily reprogramming of the “agents,” with over 100,000 lines of code ultimately changed in some way. AFRL has previously highlighted the ability to further support this kind of flight testing through the rapid training and retraining of algorithms in entirely digital environments.
Then, in September 2023, “we actually took the X-62 and flew it against a live manned F-16,” Air Force Lt. Col. Maryann Karlen, the Deputy Commandant of the USAF TPS, says in the newly released video. “We built up in safety [with]… the maneuvers, first defensive, then offensive then high-aspect nose-to-nose engagements where we got as close as 2,000 feet at 1,200 miles per hour.”
A screengrab from the newly released ACE video showing a visual representation of the X-62A and the F-16 merging during the mock dogfight, with a view from the VISTA jet’s cockpit seen in the inset at lower right. DARPA/USAF capture
Additional testing using the X-62A in support of ACE has continued into this year and is still ongoing.
The X-62A’s safely conducting dogfighting maneuvers autonomously in relation to another crewed aircraft is a major milestone not just for ACE, but for autonomous flight in general. However, DARPA and the Air Force stress that while dogfighting was the centerpiece of this testing, what ACE is aiming for really goes beyond that specific context.
“It’s very easy to look at the X-62/ACE program and see it as ‘under autonomous control, it can dogfight.’ That misses the point,” Bill “Evil” Gray, the USAF TPS’ chief test pilot, says in the newly released video. “Dogfighting was the problem to solve so we could start testing autonomous artificial intelligence systems in the air. …every lesson we’re learning applies to every task you can give to an autonomous system.”
Another view from the X-62A’s cockpit during last year’s mock dogfight. DARPA/USAF capture
Gray’s comments are in line with what Brandon Tseng, Shield AI’s co-founder, president, and chief growth officer, told The War Zone in an interview earlier this month:
“I tell people that self-driving technology for aircraft enables mission execution, with no remote pilot, no communications, and no GPS. It enables the concept of teaming or swarming where these aircraft can execute the commander’s intent. They can execute a mission, working together dynamically, reading and reacting to each other, to the battlefield, to the adversarial threats, and to civilians on the ground.”
…
“The other value proposition I think of is the system – the fleet of aircraft always gets better. You always have the best AI pilot on an aircraft at any given time. We win 99.9% of engagements with our fighter jet AI pilot, and that’s the worst that it will ever be, which is superhuman. So when you talk about fleet learning, that will be on every single aircraft, you will always have the best quadcopter pilot, you’ll always have the best V-BAT pilot, you’ll always have the best CCA pilot, you name it. It’ll be dominant. You don’t want the second best AI pilot or the third best, because it truly matters that you’re winning these engagements at incredibly high rates.”
Shield AI
There are still challenges. The new ACE video provides two very helpful definitions of autonomy capability in aerospace development right at the beginning to help in understanding the complexity of the work being done through the program.
The first is so-called rules-based autonomy, which “is very powerful under the right conditions. You write out rules in an ‘if-then’ kind of a way, and these rules have to be robust,” Dr. Daniela Rus from the Massachusetts Institute of Technology’s (MIT) Computer Science & Artificial Intelligence Laboratory (CSAIL), one of ACE’s academic partners, explains at one point. “You need a group of experts who can generate the code to make the system work.”
Historically, when people discuss autonomy in relation to military and civilian aerospace programs, as well as other applications, this has been the kind of autonomy they are talking about.
“The machine learning approach relies on analyzing historical data to make informed decisions for both present and future situations, often discovering insights that are imperceptible to humans or challenging to express through conventional rule-based languages,” Dr. Rus adds. “Machine learning is extraordinarily powerful in environments and situations where conditions fluctuate dynamically making it difficult to establish clear and robust rules.”
Enabling a pilot-optional aircraft like the X-62A to dogfight against a real human opponent who is making unknowable independent decisions is exactly the “environments and situations” being referred to here. Mock engagements like this can be very dangerous even for the most highly trained pilots given their unpredictability.
A screengrab from the newly released ACE video the data about mishaps and fatalities incurred during dogfight training involving F-16 and F/A-18 fighters between 2000 and 2016. DARPA/USAF capture
“The flip side of that coin is the challenge” of many elements involved when using artificial intelligence machine learning being “not fully understandable,” Air Force Col. James Valpiani, the USAF TPS commandant, says in the new ACE video.
“Understandability and verification are holding us back from exploring that space,” he adds. “There is not currently a civil or military pathway to certify machine learning agents for flight critical systems.”
According to DARPA and the Air Force, this is really where ACE and the real-world X-62A test flights come into play. One of the major elements of the AI/machine learning “agents” on the VISTA jet is a set of “safety trips” that are designed to prevent the aircraft from performing both dangerous and unethical actions. This includes code to define allowable flight envelopes and to help avoid collisions, either in midair or with the ground, as well as do things like prevent weapons use in authorized scenarios.
The U.S. military insists that a human will always be somewhere in the loop in the operation of future autonomous weapon systems, but where exactly they are in that loop is expected to evolve over time and has already been the subject of much debate. Just earlier this month, The War Zone explored these and other related issues in depth in a feature you can find here.
“We have to be able to trust these algorithms to use them in a real-world setting,” the ACE program manager says.
“While the X-62’s unique safety features have been instrumental in allowing us to take elevated technical risks with these machine learning agents, in this test campaign, there were no violations of the training rules, which codify the airman safety and ethical norms, demonstrating the potential that machine learning has for future aerospace applications,” another speaker, who is not readily identifiable, adds toward the end of the newly released video.
Trust in the ACE algorithms is set to be put to a significant test later this year when Secretary of the Air Force Frank Kendall gets into the cockpit for a test flight.
“I’m going to take a ride in an autonomously flown F-16 later this year,” Kendall said at a hearing before members of the Senate Appropriations Committee last week. “There will be a pilot with me who will just be watching, as I will be, as the autonomous technology works, and hopefully, neither he nor I will be needed to fly the airplane.”
Kendall has previously named ACE as one of several tangential efforts feeding directly into his service’s Collaborative Combat Aircraft (CCA) drone program. The CCA program is seeking to acquire hundreds, if not thousands of lower-cost drones with high degrees of autonomy. These uncrewed aircraft will operate very closely with crewed types, including a new stealthy sixth-generation combat jet being developed under the Next Generation Air Dominance (NGAD) initiative, primarily in the air-to-air role, at least initially. You can read more about the Air Force’s CCA effort here. The U.S. Navy also has a separate CCA program, which is closely intertwined with that of the Air Force and significant new details about which were recently disclosed.
It is important to note that the X-62A is not the only aircraft the Air Force has been using to support advanced autonomy developments in recent years outside of the ACE program. The service is now in the process of transforming six more F-16s into test jets to support larger-scale collaborative autonomy testing as part of another program called Project VENOM (Viper Experimentation and Next-Gen Operations Mode).
One of the first F-16s set to be converted into an autonomy testbed under Project VENOM arrives at Eglin Air Force Base on April 1, 2024. USAF
In addition, as already noted, the underlying technology being developed under ACE could have very broad applications. There is great interest across the U.S. military in new AI and machine learning-enabled autonomous capabilities in general. Potential adversaries and global competitors, especially China, are also actively pursuing developments in this field. In particular, the Chinese People’s Liberation Army (PLA) is reportedly working on projects with similar, if not identical aims to ACE and the AlphaDogfight Trials. This could all have impacts on the commercial aviation sector, as well.
“What the X-62/CE team has done is really a paradigm shift,” USAF commandant Valpiani says at the end of the newly released video. “We’ve fundamentally changed the conversation by showing this can be done safely and responsibly, and so now we’ve created a pathway for others to follow in building machine learning applications for air and space.”
More details about the use of the X-62A in support of ACE are already set to be revealed later this week and it will be exciting to learn more about what the program has achieved.