In the paper, DeepMind describes how a descendant of the AI program that first conquered the board game Go has taught itself to play a number of other games at a superhuman level. After eight hours of self-play, the program bested the AI that first beat the human world Go champion; and after four hours of training, it beat the current world champion chess-playing program, Stockfish. Then for a victory lap, it trained for just two hours and polished off one of the world’s best shogi-playing programs named Elmo (shogi being a Japanese version of chess that’s played on a bigger board).
One of the key advances here is that the new AI program, named AlphaZero, wasn’t specifically designed to play any of these games. In each case, it was given some basic rules (like how knights move in chess, and so on) but was programmed with no other strategies or tactics. It simply got better by playing itself over and over again at an accelerated pace — a method of training AI known as “reinforcement learning.
”Using reinforcement learning in this way isn’t new in and of itself. DeepMind’s engineers used the same method to create AlphaGo Zero; the AI program that was unveiled this October. But, as this week’s paper describes, the new AlphaZero is a “more generic version” of the same software, meaning it can be applied to a broader range of tasks without being primed beforehand.What’s remarkable here is that in less than 24 hours, the same computer program was able to teach itself how to play three complex board games at superhuman levels. That’s a new feat for the world of AI.