The bots learn from self-play, meaning two bots playing each other and learning from each side’s successes and failures. By using a huge stack of 256 graphics processing units (GPUs) with 128,000 processing cores, the researchers were able to speed up the AI’s gameplay so that they learned from the equivalent of 180 years of gameplay for every day it trained. One version of the bots were trained for four weeks, meaning they played more than 5,000 years of the game.
In a match, the OpenAI team initially gives each bot a mandate to do as well as it can on its own, meaning that the bots learned to act selfishly and steal kills from each other. But by turning up a simple metric, a weighted average of the team’s success, the bots soon begin to work together and execute team attacks quicker than humanly possible. The metric was dubbed by OpenAI as “team spirit.”
“They start caring more about team fighting, and saving one another, and working together in these skirmishes in order to make larger advances towards the group goal,” says Brooke Chan, an engineer at OpenAI.
Right now, the bots are restricted to playing certain characters, can’t use certain items like wards that allow players to see more of the map or anything that grants invisibility, or summon other units to help them fight with spells. OpenAI hopes to lift those restrictions by the competition in August.