google.com, pub-2571979842820424, DIRECT, f08c47fec0942fa0
Artificial intelligence

In game theory, generalists sometimes beat specialists MIT News

Whether you’re playing poker against a single opponent or finding yourself in a home bidding war with another potential buyer, you’re operating under conditions of imperfect information. You know what cards you hold in a poker game, and you know you can buy more than the house’s asking price, but you don’t know your opponent’s hand in a card game or how high another house buyer is willing to go.

A paper compiled by MIT researchers and presented in April at the International Conference on Learning Advocacy in Rio De Janeiro won’t tell you what to do in these situations, specifically. But it provides new insights into so-called games of imperfect information involving two competitors facing “total money”, where one player’s gain means the other player’s loss.

MIT researchers on the project include Sobhan Mohammadpour, a PhD student in MIT’s Department of Electrical Engineering and Computer Science (EECS) and the Laboratory for Information and Decision Systems (LIDS); and Gabriele Farina, assistant professor at EECS and principal investigator at LIDS. Additional co-authors include Max Rudolph of the University of Texas at Austin (UT), Nathan Lichtlé of the University of California at Berkeley (UCB), Alexandre Bayen of UCB, J. Zico Kolter of Carnegie Mellon University (CMU), Amy X. Zhang ’11, MNG ’12 of UT; Eugene Vinitsky of New York University; and Samuel Sokota of CMU.

The focus of the new work is on algorithms that can be used to train neural networks to participate in games of incomplete information. The assumption, which has been held for a long time in the field, was that algorithms based on the principles of game theory, in this setting, will clearly surpass the common peninsula of algorithms called policy gradient methods, which were first used in decision-making in the 1990s. The word “policy” in this context basically means strategy, and “gradient” refers to the path that leads to the greatest change — going up (or down) a hill, for example. Policy gradient methods are used to train neural networks to make decisions that move – in small, sequential steps – towards a specific goal (such as reaching a peak, figuratively speaking), with continuous adjustments and learning corrections made along the way to bring the agent closer to the target.

Although strategy games were not on the original agenda when policy gradient methods were invented in the early 1990s, the authors of the new paper still wonder how this class of algorithms might fare in two-player games. These methods become more difficult to analyze in multi-agent settings, according to Farina. “There’s still a path you can go into to improve your conditions, but, due to the actions of another player, that path can change constantly during the game. And those shifts can be quick.”

“It was taken for granted that specialized game-theoretic algorithms were the right approach for this setting,” Sokota said. “Our research has shown that policy gradient methods can work better than these special algorithms, and that special algorithms may not work as well as people thought – which raises an interesting sociological question about why this went unnoticed for so long. Part of the answer is that the field had not done the engineering work needed to rigorously test the algorithms, so it was hard to say what worked.”

Therefore, the main contribution of this work has been to provide a limited way to measure different algorithms that can teach agents – that is, neural networks – how to compete in games of incomplete information. “We’re taking a different approach,” Rudolph commented. “Unlike many papers published in this field, we are not proposing a new algorithm that can beat other algorithms. We are proposing a benchmark that can test these algorithms.”

Simply put, a benchmark consists of software designed to measure the performance of algorithms. “What we offer are test platforms, or playgrounds, where people can take their algorithms, train them for a task, and see how well they do,” Farina said.

The team measures a player’s performance in terms of a concept called exploitability, which measures how well a player performs against a “worst opponent,” Sokota explained. “In a game like poker, the opponent wouldn’t know what my hand was, but he would know how I would handle any hand.” Achieving a zero on this scale means perfect play, while a high user-friendliness score indicates a very long and appropriate play.

Five games were played in the experiment by the team: two versions of Phantom Tic-Tac-Toe, where players can’t see what their opponents have done, and two variations of the imperfect knowledge board game called Hex, and another trick game called False Dice.

The biggest challenge the researchers faced was finding an exploit scale to work with games of this size, which can include about 30 billion states. “Situation” in this case is not only all possible board positions, but also includes the entire history of the game, including all moves and missteps along the way.

“It’s like looking into a dark room full of things you can’t see,” Mohammadpour said. “Somehow, you need to find out where these things are and how they got there.” Previous researchers, Mohammadpour adds, typically used mini-game exploits 100,000 times smaller than those analyzed in their study.

In a study conducted on these five games, neural networks trained on policy gradient algorithms obtained better (lower) usability scores than networks trained on game-based algorithms. In the head-to-head competitions, which take place in the next round, networks trained with the policy gradient again beat their game theory-trained opponents. Rudolph says: “Those results were reassuring, because they make us more confident in our measurement method.”

The team has made its benchmarking software freely available and easy to use. “You don’t need a big computer,” Mohammadpour said. “You can use it on a regular laptop. And all you have to do is add one line of code to a commonly used suite of simulation software called OpenSpiel.”

Although their tests involve very obscure games, Farina would like to place this work in a broader context. “Remember that the word ‘game’ really applies to any multi-agent strategic interaction,” he says. “Therefore, the lessons we learn from this research are not limited to entertainment games.”

Vinitsky agrees. “Hidden information is the most valuable asset in the world,” he says. The idea that we can improve these games suggests that we can do better in these other settings as well.”

Ian Gemp – a computer scientist and game theorist at Google DeepMind who was not involved in this research – finds these results encouraging. He says: “This work is a moving reminder, that creative tools are modernized.” [like policy gradient methods] it remains the most productive way to solve complex strategic problems.”

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button