Using a technique called reinforcement learning, a researcher at Google Brain has shown that virtual robots can redesign their body parts to help them navigate challenging obstacle courses—even if the solutions they come up with are completely bizarre.
Embodied cognition is the idea that an animal’s cognitive abilities are influenced and constrained by its body plan. This means a squirrel’s thought processes and problem-solving strategies will differ somewhat from the cogitations of octopuses, elephants, and seagulls. Each animal has to navigate its world in its own special way using the body it’s been given, which naturally leads to different ways of thinking and learning.
“Evolution plays a vital role in shaping an organism’s body to adapt to its environment,” David Ha, a computer scientist and AI expert at Google Brain, explained in his new study. “The brain and its ability to learn is only one of many body components that is co-evolved together.”
Using the OpenAI Gym framework, Ha was able to provide an environment for his walkers. This framework looks a lot like an old-school, 2D video game, but it uses sophisticated virtual physics to simulate natural conditions, and it’s capable of randomly generating terrain and other in-game elements.
As for the walker, it was endowed with a pair of legs, each consisting of an upper and lower section. The bipedal bot had to learn how to navigate through its virtual environment and improve its performance over time. Researchers at DeepMind conducted a similar experiment last year, in which virtual bots had to learn how to walk from scratch and navigate through complex parkour courses. The difference here is that Ha’s walkers had the added benefit of being able to redesign their body plan—or at least parts of it. The bots could alter the lengths and widths of their four leg sections to a maximum of 75 percent of the size of the default leg design. The walkers’ pentagon-shaped head could not be altered, serving as cargo. Each walker used a digital version of LIDAR to assess the terrain immediately in front of it, which is why (in the videos) they appear to shoot a thin laser beam at regular intervals.
Using reinforcement-learning algorithms, the bots were given around a day or two to devise their new body parts and come up with effective locomotion strategies, which together formed a walker’s “policy,” in the parlance of AI researchers. The learning process is similar to trial-and-error, except the bots, via reinforcement learning, are rewarded when they come up with good strategies, which then leads them toward even better solutions. This is why reinforcement learning is so powerful—it speeds up the learning process as the bots experiment with various solutions, many of which are unconventional and unpredictable by human standards.
For the first test (above), Ha placed a walker in a basic environment with no obstacles and gently rolling terrain. Using its default body plan, the bot adopted a rather cheerful-looking skipping locomotion strategy. After the learning stage, however, it modified its legs such that they were thinner and longer. With these modified limbs, the walker used its legs as springs, quickly hopping across the terrain.
The introduction of more challenging terrain (above), such as having to walk over obstacles, travel up and down hills, and jump over pits, introduced some radical new policies, namely the invention of an elongated rear “tail” with a dramatically thickened end. Armed with this configuration, the walkers hopped successfully around the obstacle course.
By this point in the experiment, Ha could see that reinforcement learning was clearly working. Allowing a walker “to learn a better version of its body obviously enables it to achieve better performance,” he wrote in the study.
Not content to stop there, Ha played around with the idea of motivating the walkers to adopt some design decisions that weren’t necessarily beneficial to its performance. The reason for this, he said, is that “we may want our agent to learn a design that utilizes the least amount of materials while still achieving satisfactory performance on the task.”
So for the next test, Ha rewarded an agent for developing legs that were smaller in area (above). With the bot motivated to move efficiently across the terrain, and using the tiniest legs possible (it no longer had to adhere to the 75 percent rule), the walker adopted a rather conventional bipedal style while navigating the easy terrain (it needed just 8 percent of the leg area used in the original design).
But the walker really struggled to come up with a sensible policy when having to navigate the challenging terrain. In the example shown above, which was the best strategy it could muster, the walker used 27 percent of the area of its original design. Reinforcement learning is good, but it’s no guarantee that a bot will come up with something brilliant. In some cases, a good solution simply doesn’t exist.