Dear Fellow Scholars, this is Two Minute Paperswith Károly Zsolnai-Fehér.
In this project, OpenAI built a hide and seekgame for their AI agents to play.
While we look at the exact rules here, I willnote that the goal of the project was to pit two AI teams against each other, and hopefullysee some interesting emergent behaviors.
And, boy, did they do some crazy stuff.
The coolest part is that the two teams competeagainst each other, and whenever one team discovers a new strategy, the other one hasto adapt.
Kind of like an arms race situation, and italso resembles generative adversarial network a little.
And the results are magnificent, amusing, weird – you’ll see in a moment.
These agents learn from previous experiences, and to the surprise of no one, for the first few million rounds, we start out with…pandemonium.
Everyone just running around aimlessly.
Without proper strategy and semi-random movements, the seekers are favored and hence win the majority of the games.
Nothing to see here.
Then, over time, the hiders learned to lockout the seekers by blocking the doors off with these boxes and started winning consistently.
I think the coolest part about this is thatthe map was deliberately designed by the OpenAI scientists in a way that the hiders can onlysucceed through collaboration.
They cannot win alone and hence, they areforced to learn to work together.
Which they did, quite well.
But then, something happened.
Did you notice this pointy, doorstop-shapedobject? Are you thinking what I am thinking? Well, probably, and not only that, but about10 million rounds later, the AI also discovered that it can be pushed near a wall and be usedas a ramp, and, tadaa! Got’em! Te seekers started winning more again.
So, the ball is now back on the court of thehiders.
Can you defend this? If so, how? Well, these resourceful little critters learnedthat since there is a little time at the start of the game when the seekers are frozen, apparently, during this time, they cannot see them, so why not just sneak out and steal the ramp, and lock it away from them.
Look at those happy eyes as they are carryingthat ramp.
And, you think it all ends here? No, no, no.
Not even close.
It gets weirder.
When playing a different map, a seeker hasnoticed that it can use a ramp to climb on the top of a box, and, this happens.
Do you think couchsurfing is cool? Give me a break! This is box surfing! And, the scientists were quite surprised bythis move as this was one of the first cases where the seeker AI seems to have broken thegame.
What happens here is that the physics systemis coded in a way that they are able to move around by exerting force on themselves, but, there is no additional check whether they are on the floor or not, because who in theirright mind would think about that? As a result, something that shouldn’t everhappen does happen here.
And, we’re still not done yet, this paperjust keeps on giving.
A few hundred million rounds later, the hiderslearned to separate all the ramps from the boxes.
Dear Fellow Scholars, this is proper box surfingdefense…then, lock down the remaining tools and build a shelter.
Note how well rehearsed and executed thisstrategy is – there is not a second of time left until the seekers take off.
I also love this cheeky move where they setup the shelter right next to the seekers, and I almost feel like they are saying “yeahsee this here? there is not a single thing you can do aboutit”.
In a few isolated cases, other interestingbehaviors also emerged, for instance, the hiders learned to exploit the physics systemand just chuck the ramp away.
After that, the seekers go “what?” “what just happened?”.
But don’t despair, and at this point, Iwould also recommend that you hold on to your papers because there was also a crazy casewhere a seeker also learned to abuse a similar physics issue and launch itself exactly ontothe top of the hiders.
Man, what a paper.
This system can be extended and modded formany other tasks too, so expect to see more of these fun experiments in the future.
We get to do this for a living, and we areeven being paid for this.
I can’t believe it.
In this series, my mission is to showcasebeautiful works that light a fire in people.
And this is, no doubt, one of those works.
Great idea, interesting, unexpected results, crisp presentation.
Bravo OpenAI! Love it.
So, did you enjoy this? What do you think? Make sure to leave a comment below.
Also, if you look at the paper, it containscomparisons to an earlier work we covered about intrinsic motivation, shows how to implementcircular convolutions for the agents to detect their environment around them, and more.
Thanks for watching and for your generoussupport, and I'll see you next time!.