Page Nav

HIDE

Grid

GRID_STYLE

Breaking News

latest

How Game Theory Changed Poker

Professional players are learning from computers how to make their play more unpredictable and harder to beat / WSJ. Michael Bowling, a comp...

Professional players are learning from computers how to make their play more unpredictable and harder to beat / WSJ.
Michael Bowling, a computer scientist at the University of Alberta, keeps a tidy office, unlike many of his colleagues, whose spaces overflow with technological detritus. Prof. Bowling’s only clutter is the dense, inscrutable formulas and graphs scrawled with technicolor markers on a wall-size whiteboard. He needs the elaborate mathematics because he is trying to make sense of a very complex world: the game of poker. “Even the smallest variant of poker has a billion billion decision points,” he told me.

The Computer Poker Research Group at the university was formed in 1996, following Garry Kasparov’s chess matches with IBM’s Deep Blue that year. Poker’s mathematical complexity rivals that of chess—or exceeds it, depending on the variant—and poker adds randomness and hidden information, bringing it closer to the “real world” that AI researchers so badly want to influence. The researchers in the poker group aren’t interested in conquering the game per se. They see it as a testing ground for doing good science.

‘I would say every one of the top 10 pros in the world is paying a poker programmer to do something.’

Their work hasn’t just been an academic exercise, however; it has transformed how poker professionals approach the game. In search of ways to improve their odds, pros began poaching Prof. Bowling’s talent. “I regularly get contacted by poker pros asking if I can help them with something. I would say every one of the top 10 pros in the world is paying a poker programmer to do something,” he said. “The pros caught wind of what we were doing,” Richard Gibson, a former doctoral student of Prof. Bowling’s, told me. Prof. Gibson’s dissertation was titled “Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents.”

Regret is a formalized mathematical concept when it comes to making decisions in an uncertain environment—it’s the difference between an optimal decision and an actual decision. Minimizing regret is an important ingredient in many modern poker-playing algorithms. “It seemed like it was worth a lot to them,” Prof. Gibson said. “They were paying me pretty good money.” Programmers are hired to analyze a player’s game data, finding “leaks” or mistakes in their play, and to perform game-theoretical analyses, calculating what plays are optimal in any of the countless situations a poker player might face.

Even an off-the-shelf poker program can be fairly expensive—the pro version of one popular program is $475. But it’s worth it because of how dramatically it can change a player’s game. The key is to make your playing more unpredictable and thus less exploitable. In the children’s game of rock, paper, scissors, playing each move at random with a one-third chance makes it impossible for your opponent to guess your pattern and beat you. In poker, this can be accomplished with hand ranges and mixed strategies.

If other players know, for instance, that when I have the “small blind”—the obligatory bet made at the beginning of the hand by the player sitting to the dealer’s left—I will only raise $100 when I have a pair of aces, that is a fact that can be exploited. Instead, the programs advise, I should bundle my hands into ranges, raising $100 when I have not just aces but also, say, kings and queens. Moreover, I shouldn’t always do the same thing with the same range of hands—I should mix my strategy and randomize.

Maybe two-thirds of the time that I have a pair of aces I raise, and a third of the time I call, matching the last bet made. Some poker players have even been known to use a wristwatch as a randomization device. For example, to decide whether to raise or call, you simply look at the second hand: in the first two-thirds of the minute you raise, in the last third you call.

Poker players today call this style of play “GTO”—game-theory optimal. Its practitioners are free to cocoon themselves beneath hoodies and big headphones because if you embrace these tenets fully, you can all but ignore the other players at your table. Their specific identities and quirks are immaterial. All that matters is that eventually they will err, and you will profit. And that’s why GTO players hire the programmers, to tweak these ranges and percentages, to find and eliminate every exploitable sliver of their games.

In the summer of 2019, I sat down with my laptop on a sunny afternoon to play no-limit Hold ‘Em against DeepStack, a program that Prof. Bowling helped to develop. The computer and I each started with 20,000 chips, and the blinds, the mandatory bets that begin each hand, started at 50 and 100 chips and increased every 10 hands. Whenever one player won all the chips, he (or it) tallied a point and the process started again.

Over the course of a few days, DeepStack exhibited a peculiar style of play. It was ferociously aggressive at the “pre-flop” stage, when the only cards a player can see are the two in his own private hand. It raised and re-raised early with just about anything and sometimes launched early and enormous all-in bets; it almost never folded in the small blind. But after the flop, it calmed considerably, as if having taken a digital Xanax, and played what seemed to me like a passive game.

I did what I could to exploit what I saw as the program’s tendencies, but this was a machine specifically designed and trained not to be exploited—to abide by the mathematical maxims found in game theory and the game’s elemental geometries. To my surprise, I managed to grind out some wins. I stopped the match when I had a lead of 15 to 14 games. Much as Kasparov did after playing Deep Blue for the first time, I stared at the ceiling for a long time after the match, relieved that I had beaten the machine.

The feeling didn’t last. Shortly after our match, Prof. Bowling sent me an email, debriefing my performance against his creation by analyzing which parts of my success came from skill and which parts had emerged from the thick fog of randomness inherent in no-limit poker. He wrote, “You should expect to win 42% (margin of error of 5%) of your matches against DeepStack. While you won 15 and lost 14, your play (after removing luck) suggests you should have won 12 matches and lost 17.”