The AI, called Project Debater, appeared on stage in a packed conference room at IBM’s San Francisco office embodied in a 6ft tall black panel with a blue, animated “mouth”. It was a looming presence alongside the human debaters Noa Ovadia and Dan Zafrir, who stood behind a podium nearby.
Although the machine stumbled at many points, the unprecedented event offered a glimpse into how computers are learning to grapple with the messy, unstructured world of human decision-making.
For each of the two short debates, participants had to prepare a four-minute opening statement, followed by a four-minute rebuttal and a two-minute summary. The opening debate topic was “we should subsidize space exploration”, followed by “we should increase the use of telemedicine”.
In both debates, the audience voted Project Debater to be worse at delivery but better in terms of the amount of information it conveyed. And in spite of several robotic slip-ups, the audience voted the AI to be more persuasive (in terms of changing the audience’s position) than its human opponent, Zafrir, in the second debate.
It’s worth noting, however, that there were many members of IBM staff in the room and they may have been rooting for their creation.
IBM hopes the research will eventually enable a more sophisticated virtual assistant that can absorb massive and diverse sets of information to help build persuasive arguments and make well-informed decisions – as opposed to merely responding to simple questions and commands.
Project Debater was a showcase of IBM’s ability to process very large data sets, including millions of news articles across dozens of subjects, and then turn snippets of arguments into full flowing prose – a challenging task for a computer.
Once an AI is capable of persuasive arguments, it can be applied as a tool to aid human decision-making.
“We believe there’s massive potential for good in artificial intelligence that can understand us humans,” said Arvind Krishna, director of IBM Research.
One example of this might be corporate boardroom decisions, where there are lots of conflicting points of view. The AI system could, without emotion, listen to the conversation, take all of the evidence and arguments into account and challenge the reasoning of humans where necessary.
“This can increase the level of evidence-based decision-making,” said Reed, adding that the same system could be used for intelligence analysis in counter-terrorism, for example identifying if a particular individual represents a threat.
In both cases, the machine wouldn’t make the decision but would contribute to the discussion and act as another voice at the table.
Essentially, Project Debater assigns a confidence score to every piece of information it understands. As in: how confident is the system that it actually understands the content of what’s being discussed? “If it’s confident that it got that point right, if it really believes it understands what that opponent was saying, it’s going to try to make a very strong argument against that point specifically,” Welser explains.
”If it’s less confident,” he says, “it’ll do it’s best to make an argument that’ll be convincing as an argument even if it doesn’t exactly answer that point. Which is exactly what a human does too, sometimes.”
So: the human says that government should have specific criteria surrounding basic human needs to justify subsidization. Project Debater responds that space is awesome and good for the economy. A human might choose that tactic as a sneaky way to avoid debating on the wrong terms. Project Debater had different motivations in its algorithms, but not that different.
The point of this experiment wasn’t to make me think that I couldn’t trust that a computer is arguing in good faith — though it very much did that. No, the point is that IBM showing off that it can train AI in new areas of research that could eventually be useful in real, practical contexts.
The first is parsing a lot of information in a decision-making context. The same technology that can read a corpus of data and come up with a bunch of pros and cons for a debate could be (and has been) used to decide whether or not a stock might be worth investing in. IBM’s system didn’t make the value judgement, but it did provide a bunch of information to the bank showing both sides of a debate about the stock.
As for the debating part, Welser says that it “helps us understand how language is used,” by teaching a system to work in a rhetorical context that’s more nuanced than the usual Hey Google give me this piece of information and turn off my lights. Perhaps it might someday help a lawyer structure their arguments, “not that Project Debater would make a very good lawyer,” he joked. Another IBM researcher suggested that this technology could help judge fake news.
How close is this to being something IBM turns into a product? “This is still a research level project,” Welser says, though “the technologies underneath it right now” are already beginning to be used in IBM projects.
The system listened to four minutes of its human opponent’s opening remarks, then parsed that data and created an argument that highlighted and attempted to debunk information shared by the opposing side. That’s incredibly impressive because it has to understand not only the words but the context of those words. Parroting back Wikipedia entries is easy, taking data and creating a narrative that’s based not only on raw data but also takes into account what it’s just heard? That’s tough.
In a world where emotion and bias colors all our decisions, Project Debater could help companies and governments see through the noise of our life experiences and produce mostly impartial conclusions. Of course, the data set it pulls from is based on what humans have written and those will have their own biases and emotion.
While the goal is an unbiased machine, during the discourse Project Debate wasn’t completely sterile. Amid its rebuttal against debater Dan Zafrir, while they argued about telemedicine expansion, the system stated that Zafrir had not told the truth during his opening statement about the increase in the use of telemedicine. In other words, it called him a liar.
When asked about the statement, Slonim said that the system has a confidence threshold during rebuttals. If it’s feeling very confident it creates a more complex statement. If it’s feeling less confident, the statement is less impressive.