Computer scientists have developed a card-playing bot, called Pluribus, capable of defeating some of the world’s best players at six-person no-limit Texas hold’em poker, in what’s considered an important breakthrough in artificial intelligence.
Two years ago, a research team from Carnegie Mellon University developed a similar poker-playing system, called Libratus, which consistently defeated the world’s best players at one-on-one Heads-Up, No-Limit Texas Hold’em poker. The creators of Libratus, Tuomas Sandholm and Noam Brown, have now upped the stakes, unveiling a new system capable of playing six-player no-limit Texas hold’em poker, a wildly popular version of the game.
In a series of contests, Pluribus handedly defeated its professional human opponents, at a level the researchers described as “superhuman.” When pitted against professional human opponents with real money involved, Pluribus managed to collect winnings at an astounding rate of $1,000 per hour. Details of this achievement were published today in Science.
For the new study, Brown and Sandholm subjected Pluribus to two challenging tests. The first pitted Pluribus against 13 different professional players—all of whom have earned more than $1 million in poker winnings—in the six-player version of the game. The second test involved matches featuring two poker legends, Darren Elia and Chris “Jesus” Ferguson, each of whom was pitted against five identical copies of Pluribus.
The matches with five humans and Pluribus involved 10,000 hands played over 12 days. To incentivize the human players, a total of $50,000 was distributed among the participants, Pluribus included. The games were blind in that none of the human players were told who they were playing, though each player had a consistent alias used throughout the competition. For the tests involving a lone human and five Pluribuses, each player was given $2,000 for participating and a bonus $2,000 for playing better than their human cohort. Elia and Ferguson both played 5,000 separate hands against their machine opponents.
In all scenarios, Pluribus registered wins with “statistical significance,” and to a degree the researchers referred to as “superhuman.”
“We mean superhuman in the sense that it performs better than the best humans,” said Brown, who is completing his Ph.D. as a research scientist at Facebook AI. “The bot won by about five big blinds per hundred hands of poker (bb/100) when playing against five elite human professionals, which professionals consider to be a very high win rate. To beat elite professionals by that margin is considered a decisive win.
Before the competition started, Pluribus developed its own “blueprint” strategy, which it did by playing poker with itself for eight straight days.
“Pluribus does not use any human gameplay data to form its strategy,” explained Brown. “Instead, Pluribus first uses self-play, in which it plays against itself over trillions of hands to formulate a basic strategy. It starts by playing completely randomly. As it plays more and more hands against itself, its strategy gradually improves as it learns which actions lead to winning more money. This is all done offline before ever playing against humans.”
Armed with its blueprint strategy, the competitions could begin. After the first bets were placed, Pluribus calculated several possible next moves for each opponent, in a manner similar to how machines play chess and Go. The difference here, however, is that Pluribus was not tasked to calculate the entire game, as that would be “computationally prohibitive,” as noted by the researchers.
“In Pluribus, we used a new way of doing search that doesn’t have to search all the way to the end of the game,” said Brown. “Instead, it can stop after a few moves. This makes the search algorithm much more scalable. In particular, it allows us to reach superhuman performance while only training for the equivalent of less than $150 on a cloud computing service, and playing in real time on just two CPUs.”
Importantly, Pluribus was also programmed to be unpredictable—a fundamental aspect of good poker gamesmanship. If Pluribus consistently bet tons of money when it figured it had the best hand, for example, its opponents would eventually catch on. To remedy this, the system was programmed to play in a “balanced” manner, employing a set of strategies, like bluffing, that prevented Pluribus’ opponents from picking up on its tendencies and habits.