DeepMind: The Hanabi Card Game Is the Next Frontier for AI Research


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
Now get this, after defeating Chess, Go, and
making incredible progress in Starcraft 2,
scientists at DeepMind just published a paper
where they claim that Hanabi is the next frontier
in AI research.
And we shall stop …right here.
I hear you asking me, Károly, after defeating
all of these immensely difficult games, now,
you are trying to tell me that somehow this
silly card game is the next step?
Yes, that’s exactly what I am saying.
Let me explain.
Hanabi is a card game where two to five players
cooperate to build five card sequences and
to do that, they are only allowed to exchange
very little information.
This is also an imperfect information game,
which means the players don’t have all the
knowledge available needed to make a good
decision.
They have to work with what they have and
try to infer the rest.
For instance, Poker is also an imperfect information
game because we don’t see the cards of the
other players and the game revolves around
our guesses as to what they might have.
In Hanabi, interestingly, it is the other
way around, so we see the cards of the other
players, but not our own ones.
The players have to work around this limitation
by relying on each other and working out communication
protocols and infer intent in order to win
the game.
Like in many of the best games, these simple
rules conceal a vast array of strategies,
all of which are extremely hard to teach to
current learning algorithms.
In the paper, a free and open source system
is proposed to facilitate further research
works and assess the performance of currently
existing techniques.
The difficulty level of this game can also
be made easier or harder at will from both
inside and outside the game.
And by inside I mean that we can set parameters
like the number of allowed mistakes that can
be made before the game is considered lost.
The outside part means that two main game
settings are proposed: one, self-play, this
is the easier case where the AI plays with
copies of itself, therefore it knows quite
a bit about its teammates, and two, ad-hoc
teams can also be constructed, which means
that a set of agents need to cooperate that
are not familiar with each other.
This is immensely difficult.
When I looked the paper, I expected that as
we have many powerful learning algorithms,
they would rip through this challenge with
ease, but surprisingly, I found out that that
even the easier self-play variant severely
underperforms compared to the best human players
and handcrafted bots.
There is plenty of work to be done here, and
luckily, you can also run it yourself at home
and train some of these agents on a consumer
graphics card.
Note that it is possible to create a handcrafted
program that plays this game well, as we,
humans already know good strategies, however,
this project is about getting several instances
of an AI to learn new ways to communicate
with each other effectively.
Again, the goal is not to get a computer program
that plays Hanabi well, the goal is to get
an AI to learn to communicate effectively
and work together towards a common goal.
Much like Chess, Starcraft 2 and DOTA, Hanabi
is still a proxy to be used for measuring
progress in AI research.
Nobody wants to spend millions of dollars
to play card games at work, so the final goal
of DeepMind is to reuse this algorithm for
other applications where even we, humans falter.
I have included some more materials on this
game in the video description, make sure to
have a look.
Thanks for watching and for your generous
support, and I’ll see you next time!

68 Comments

Add a Comment

Your email address will not be published. Required fields are marked *