Yesterday, at The International 2017, Valve and OpenAI showed off a brand-new bot created entirely by training against itself in isolation. In a segment between matches, the bot took on Dendi, one of the most popular professional Dota 2 players, in a 1v1-mid match. It won, Elon Musk and OpenAI claimed a huge victory for AI, and Dota 2 players everywhere began bowing to their new AI overlords. Now that the hype has died down a bit, though...how big of an achievement is this really? And was this contest even sound science to begin with?
Organizations like Valve, Blizzard, and DARPA that are giving researchers an opportunity and platform for solving interesting AI challenges are awesome. I was a huge fan of DARPA's Cyber Grand Challenge, I'm looking forward to seeing what Deep Mind and Blizzard can do with StarCraft II, and I thought Valve's AI segment at The International 2017 yesterday was really cool. I also the think the engineers and researchers at OpenAI, Deep Mind, and For All Secure are doing very valuable work.
But, I'm concerned about the experiments themselves, rather than the work behind the AI (which, again, is awesome). Unfortunately, there's not a whole lot online right now about OpenAI's bot (this page and this page are really all I can find on it and neither have any technical detail). But, while I may be off-the-mark about some of the technical details, I think the overall argument below is still sound. My concerns are similar to the criticisms I levied toward the DEFCON 24 Finals, if you're familiar with my previous thoughts on the matter.
Contest Fairness
Most man vs. machine matches thus far have been in games like Chess or Go. The way I see it, these games have three, very positive, common characteristics:
- These games are played out on a fixed time scale.
- Doing well at these games requires a significant amount of planning and foresight.
- Each state in these games consists of a small amount of very precise information.
While Chess and Go are incredibly complex with very large possible state spaces (which may favor humans), they also give no significant advantage to inhumanly-fast decision-making (which may favor machines) and afford perfect information to both players equally at all times. Each game, by design, removes many possible confounding factors. At the end of the day, if a machine consistently beats top human players in chess or go, we can confidently say the machine is actually better at solving problems in the context of that game.
The design of Dota 2, by contrast, does not have those aforementioned characteristics. It is played on a continuous time scale where actions may occur at any time, take variable amounts of time, and can occur far faster than human reaction time. Doing well requires planning and foresight (creep blocking, good positioning, high ground advantage, and so forth), but is also heavily weighted toward punishing opponents' mistakes (which is reactive, not proactive). Furthermore, each state consists of a large amount of information that is not presented precisely to a human. It's also, critically, not a game of complete information.
So, here's the problem. In order for Dendi to make an action in Dota 2, he has to go through the following high-level process:
- Process the images on the screen
- Determine what the individual images on the screen are
- Estimate their relative distances and current values
- Understand what state the individual objects are in (Is a creep about to die? Is the opponent about to attack?)
- Formulate a decision
- Physically move the mouse and/or keyboard
- Click the mouse and/or a key on the keyboard to input the desired action
The bot, by contrast:
- Receives exact positioning data from an API
- As far as I can tell, this is positioning data on all objects everywhere - including ones that don't fit on the screen or even perhaps ones a regular player wouldn't be able to see.
- Constructs a new "current state" from that information based on exact values
- Uses the current and relevant previous states to predict the best possible next action to take
- Tells an API what action it wants to take next (which then instantly acts on its behalf)
Hopefully, this illustrates that the bot is not playing the same game as the human. The bot is playing a game where information is precise and complete. The human is playing a game where neither of those things are true. The bot is also able to react to game state changes instantaneously, whereas the human must take time to make physical inputs.
In short: The contest isn't fair. The bot doesn't need high APM: It's able to calculate the statistically best possible next state and act upon it with precision humans simply can't. And it can do it consistently faster than humans can.
Why it Matters
In scientific experiments, it's extremely important to remove confounding factors. If we aren't making an apples-to-apples comparison, we can't draw accurate conclusions about the results of the experiment. If Deep Mind's StarCraft II bot defeats Jaedong, but does it with perfectly reactive micro in every single engagement (or, even worse, complete and precise knowledge of every unit on the map at all times), can we really say it's a better StarCraft II player? I'd assert it can't.
This is the position we find ourselves in today with OpenAI's Dota 2 bot. Articles like this Business Insider article are claiming, thanks to people like Elon Musk, that AI has claimed an important victory over professional e-sports players. But, I'd argue we simply don't know. OpenAI's Dota 2 bot might be better than Dendi at 1v1 mid. Until the two are able to play on equal footing, though, there's no way to tell.
The most obvious solution here is that we need to build a Dota 2 robot! It needs to take in all input from a screen with visual sensors and it needs to produce all output from robotic hands that physically manipulate a real keyboard and mouse. And all the robotics need to conform closely to what human eyes and hands are capable of achieving.
I'm not sure that's the best solution, though. I think the way OpenAI designed their bot is actually great engineering. Why solve non-AI problems like building an accurate representation of a human that your code needs to interface with when you can solve "real" problems like teaching an AI to play 1v1 mid?
Personally, I'd just be happy if future efforts tried. There must be some way to create a middle-man that prevents the API from sending the bot information that wouldn't be "on screen". There must also be some way to create a delay in the bot receiving information that will approximate human reaction time. (It's also ideal that the bot shouldn't know why the game is in a certain state until after the game. The bot should need to "watch the replay" like the human in order to learn things it couldn't see during gameplay.)
I think (what I assume are) relatively minor tweaks like these would go a long way toward ensuring bots really are better than humans. This isn't something AlphaGo has to contend with - it's playing a different game. But, it does nothing for the development of AI to win in an unfair contest. Just like I pointed out for DARPA's Cyber Grand Challenge, while the progress here is very real, it's important that the achievements of AI are being accurately reported.