On September 17th, OpenAI released their latest project; an AI that can play hide and seek. By playing hundreds of millions of games of hide and seek, two opposing teams of AI characters developed complex strategies. The characters learned to use tools and team work, basically they became toddlers, and learned to take advantage of loopholes in their environment its creators didn’t even know existed.
The AI was trained using a technique called reinforcement learning, that works much the way animals (and toddlers) are trained. The AI characters are given “rewards” for exhibiting a behavior. After millions of games the AI learned the best way to maximize winning.
In the AI’s games of hide and seek, the opposing teams created a range of complex hiding and seeking strategies. The above video illustrates this well.
The researchers designed a virtual environment comprised of an enclosed space and various objects like blocks and ramps, to create barricades. The rewards for the teams where simple, hiders were rewarded for avoiding the seekers, and seekers were rewarded for finding the hiders. The characters were also incentivized to explore with count-based exploration. The AI keeps a count of states it’s visited and is incentivized to go to infrequently visited states. The hiders were given ahead start and the researchers provided no further instructions.
“We’ve shown that agents can learn sophisticated tool use in a high fidelity physics simulator; however, there were many lessons learned along the way to this result. Building environments is not easy and it is quite often the case that agents find a way to exploit the environment you build or the physics engine in an unintended way.” said the blog post announcing the paper.
An AI learning to jump or surf on a block in a game of hide and seek is fairly non threatening. But the possibility of a self learning AI exhibiting other, more sinister, unseen behavior is something many people fear. I think of AI like the Genie in Aladdin, the Genie grants wishes but they may come with unforeseeable consequences.