Transcript of Tuomas Sandholm: Poker and ...

[00:00:00]

The following is a conversation with Thomas Sandhamn. He's a professor simu and cocreator of Labrador's, which is the first A.I. system to be top human players in the game of heads up no limit Texas Hold'em. He has published over 450 papers on game theory and machine learning, including a best paper in 2017 at Nips, now renamed to New Reps, which is where I caught up with him for this conversation. His research and companies have had wide reaching impact in the real world, especially because he and his group not only proposed new ideas, but also build systems to prove that these ideas work in the real world.

[00:00:42]

This conversation is part of the MIT course on artificial general intelligence and the artificial intelligence podcast if you enjoy it.

[00:00:49]

Subscribe on YouTube, iTunes or simply connect with me on Twitter at Lex Friedman spelled F Outride. And now here's my conversation with Thomas Sandhamn.

[00:01:19]

Can you describe. At the high level, the game of poker, Texas Hold'em has a Texas Hold'em for people who might not be familiar this card game. Yeah, happy to.

[00:01:30]

So heads up, no limit. Texas Hold'em has really emerged in the aid community as a main benchmark for testing the application, independent algorithms for imperfect information, game solving. And this is a game that's actually played by humans. You don't see that much on TV or casinos because, well, for various reasons, but you do see it in some expert level casinos and you see it in the best poker movies of all time. It's actually an event in the World Series of Poker, but mostly it's played online and typically for pretty big sums of money.

[00:02:07]

And this is a game that usually only experts play. So if you go to your home game on a Friday night, it probably is not going to be heads up. No limit Texas hold it might be no limit Texas Hold'em in some cases, but typically for a big group and it's not as competitive while heads up means it's to play. So it's really like me against you and my better or you better, much like chess or or go in that sense.

[00:02:34]

But then an imperfect information game, which makes it much harder because I have to deal with issues of you knowing things that they don't know. And I know things that you don't know instead of pieces being nicely laid on the board for both of us to see.

[00:02:47]

So in Texas Hold'em, there's a two cards that you only see the one to you. Yeah, and there is they gradually lay out some cards that overall to five cards that everybody can see. Yeah, the imperfect nature of the information is the two cards that you're holding up front.

[00:03:05]

Yeah. So as you said, you know, you first get to cards in private each and then there's a betting.

[00:03:11]

Right. Then you get three cards in public on the table, then there's a betting card, then you get the fourth card in public on the table betting around, then you get the fifth card on the table. There's a betting, right.

[00:03:22]

So there's a total of four betting rounds and four tranches of information, revelation, if you will, that only the first tranche is private and then it's public from there.

[00:03:32]

And this is probably probably by far the most popular game in high and just the general public in terms of imperfect information. So that's probably the most popular spectator game to watch. Right. So which is why is a super exciting tackle. So it's on the order of chess, I would say, in terms of popularity, in terms of A.I. setting it as the bar of what is intelligence. So in twenty seventeen Liberator's, how do you pronounce the Liberato Labrador's Labrador's beats Latin.

[00:04:10]

They're a little bit Latin.

[00:04:12]

Labrador's beat a few for expert human players. Can you describe that event, what you learn from it? What was it like? What was the process in general for people who have not read the papers and the.

[00:04:26]

Yeah, so the event was that we invited four of the top ten players. Were these are specialists, players in heads up no limit Texas Hold'em, which is very important because this game is actually quite different than the multiplayer version. We brought them in to Pittsburgh to play at the Rivers Casino for twenty days. We wanted to get 120000 hands in because we wanted to get statistical significance. So it's a lot of hands for him to play, even for these top pros who play fairly quickly normally.

[00:04:59]

So we couldn't just have one of them play so many hands.

[00:05:02]

Twenty days they were playing basically morning to evening and I raised two hundred thousand is a little incentive for them to play. And the setting was so that they didn't all get 50000. We actually paid them out based on how they did against each. So they had an incentive to play as hard as they could, whether they're way ahead the way behind or right at the mark of beating the high.

[00:05:30]

And you don't make any money. Unfortunately, right now we can't make any money. So originally, a couple of years earlier, we actually explored whether we could actually play for money, because that would be, of course, interesting as well to play against the top people for money. But the Pennsylvania Gaming Board said no. So so we couldn't. So this is much like an exhibit like for a musician or a boxer or something like that.

[00:05:56]

Nevertheless, you are keeping track of the money and brought us one close to two million dollars. I think so. So if if it was for real money, if you were able to earn money, that was a quite impressive and inspiring achievement. Just a few details.

[00:06:13]

What what were the players looking at and were they behind a computer? What was the interface like?

[00:06:18]

Yes, there they were playing much like the. Normally do these top players, when they play this game, they play mostly online, so they're used to playing through UI. Yes, and they did the same thing here. So there was this layout.

[00:06:31]

You could imagine there's a table on the screen, this the human sitting there, and then there's the guy sitting there and the screen shows everything that's happening, the cards coming out, and so is the bets being made. And we also had the betting history for the humans or if the human forgot what had happened in the hand so far, they could actually reference back and so forth.

[00:06:53]

Is there a reason they were given access to the betting history for.

[00:06:57]

Well, we just it's it didn't really matter that they wouldn't have forgotten anyway. These are top quality people, but we just wanted to put out there. So it's not a question of the human forgetting and the AAA somehow trying to get that advantage of better memory. So what was that like?

[00:07:14]

I mean, that was an incredible accomplishment. So what did it feel like before the event?

[00:07:19]

Did you have doubt, hope, or where was your confidence at?

[00:07:24]

Yeah, that's great. Great question.

[00:07:26]

So 18 months earlier, I had organized a similar brains versus a competition with a previous ethical cloud and we could have beat the teams. So this time around, it was only 18 months later. And I knew that this new Liberator's was way stronger. But it's hard to say how you'll do against the top humans before you try. So I thought we had about a 50/50 shot and the international betting sites put us up us as a four to one or five to one underdog.

[00:07:58]

So it's kind of interesting that people really believe in people over 80, not just people. People don't just believe over, believe in themselves, but they have overconfidence in other people as well compared to the performance of A.I.. And yeah, so we were offered to one to five to one underdog.

[00:08:15]

And even after three days of beating the humans in a row, we were still 50/50 on the international betting sites.

[00:08:23]

Do you think there's something special and magical about poker in the way people think about it? In the sense you have? I mean, even in chess, there's no Hollywood movies. Poker is the star of many movies, and there's this feeling that certain human facial expressions and body language, eye movement, all these towels are critical to poker.

[00:08:50]

You can look into somebody's soul and understand their betting strategy and so on.

[00:08:54]

So that's probably why the possibility. Do you think that is why people have confidence that humans will outperform? Because A.I. systems cannot in this construct perceive these kinds of tools. They're only looking at betting patterns and nothing else, the betting patterns and and statistics.

[00:09:15]

So what's more important to you if you step back and human players, human versus human, what's the role of these tells of these ideas that we romanticize? Yeah, so I'll split it into two parts.

[00:09:33]

So one is, why do humans trust humans more than 80 and all have overconfidence in humans?

[00:09:39]

Yes, I think that's that's not really related to the question. It's just that they've seen these top players, how good they are and they're really fantastic. So it's just hard to believe that the NBA could beat them. Yes, I think that's where that comes from. And that's actually maybe a more general lesson about A.I. that until you've seen it over perform a human, it's hard to believe that it could.

[00:10:02]

But then the tail is a lot of these top players, they're so good at hiding tails that among the top players, it's actually not really worth it for them to invest a lot of effort trying to find Talison each other because they're so, so good at hiding them. So, yes, at the Friday evening game, fellas are going to be a huge thing. You can read other people, and if you're a good reader, you'll you'll read them like an open book.

[00:10:32]

But at the top levels of poker. Now, that tells become a lesson, a much, much smaller and smaller aspect of the game.

[00:10:38]

As you go to the top levels, the amount of strategies, the demands of possible actions is very large. Ten to the power of hundred plus. So there has to be some I read a few of papers related to has it has to form some abstractions of various hands and actions. So what kind of abstractions are effective for the game of poker.

[00:11:05]

Yeah, so you're exactly right. So when you go from a game three, that's ten to the one hundred and sixty one, especially in an imperfect information game, it's way too large to solve directly, even with our fastest equilibrium finding algorithms.

[00:11:19]

So. You want to abstract it first and abstraction in games is much trickier than abstraction in MDP or other single agent settings.

[00:11:31]

Because you have these abstraction pathologies that if I have a fine-grained abstraction, the strategy that I can get from that for the real game might actually be worse than the strategy I can get from the coarse grand abstraction.

[00:11:43]

So you have to be very careful now the kinds of abstraction you just assume we're talking about. There's the hands, abstractions, and then there's betting strategies, wording actions, type beating action.

[00:11:55]

So there's information, abstraction to talk about general information, abstraction, which is the abstraction of what chance does. And this would be the cards in the case of poker. And then there's action abstraction, which is abstracting the actions of the actual players, which would be bets in the case of poker yourself and the other players.

[00:12:17]

Yes, yourself and other players.

[00:12:20]

And for information abstraction, we were completely automated. So these were these are algorithms which they do what we call potential aware abstraction, where we don't just look at the value of the hand, but also how it might materialize into good or bad hands over time. And it's a certain kind of Bottom-Up process with integer programming there and clustering and various aspects. How do you build this abstraction?

[00:12:47]

And then in the action abstraction there, it's largely based on how humans, other and other ideas have played this game in the past. But in the beginning we actually use an automated action abstraction technology, which is provably convergent, that it finds the optimal combination of size, but it's not very scalable. So we couldn't use it for the whole game, but we use it for the first couple of betting actions.

[00:13:16]

So what's more important, the strength of the hand so that the information abstraction or the how you play them, the actions does it? You know, the romanticized notion, again, is that it doesn't matter what happens. You have that the actions, the betting may be the way you win, no matter what hands you have.

[00:13:37]

So that's why you have to play a lot of hands so that the role of luck gets smaller so you could otherwise get lucky and get some good hands and then you're going to win the match. Even with thousands of hands.

[00:13:50]

You can get lucky because there's so much variance in no limit Texas Hold'em, because if we both go all in, it's a huge stack of variance. So there are these massive swings in no limit Texas Hold'em. So that's why you have to play not just thousands, but over 100000 hands. Look at statistical significance in the mask.

[00:14:12]

Another way, this question, if you didn't even look at your hands.

[00:14:18]

But they didn't know that the opponents didn't know that. How well would you be able to do? That's a good question.

[00:14:24]

There's actually I heard the story that there's this Norwegian female poker player called Annette Overstored who actually won a tournament by doing exactly that. But that would be extremely rare. So I can you cannot really play well that way.

[00:14:40]

OK, so the hands do have some role to play.

[00:14:43]

OK, yes. So Labrador's does not use, as far as I understand, use learning methods, deep learning. Is there room for learning in? You know, there's no reason why Liberalist doesn't, you know, combined with now Fugo type approach for estimating the quality for function estimate. What are your thoughts on this? Maybe as compared to another algorithm, which I'm not that familiar with, deep stack, the engine that does use deep learning. That is unclear how well it does, but nevertheless uses deep learning.

[00:15:19]

So what are your thoughts about learning methods to aid in the way that Celebrators plays the game of poker?

[00:15:25]

Yeah, so as you said, Liberato did not use learning methods and played very well without them.

[00:15:32]

Since then, we have actually actually here we have a couple of papers on things that do use learning techniques, SEXON so and deep learning in particular, and sort of the way you're talking about where it's learning and evaluation function, but in imperfect information games unlike, let's say, Engo or not. Now also in chess and shogi, it's not sufficient to learn an evaluation for a state because the. Value of an information, it depends not only on the exact state, but it also depends on both players beliefs, like if I have a bad hand, I'm much better off if the opponent thinks I have a good hand and vice versa.

[00:16:22]

If I have a good hand, I'm much better off if the opponent believes I have a bad hand.

[00:16:27]

So the value of a state is not just a function of the cards.

[00:16:32]

It depends on, if you will, the path of play, but only to the extent that is captured in the belief distributions. So. So that's why it's not as simple as as it is imperfect information games. And I don't want I'd say it's simple there either. It's of course, very complicated computationally there too. But at least conceptually, it's very straightforward. There's a state, there's an evaluation function. You can try to learn it here.

[00:16:57]

You have to do something more. And what we do is in one of these papers we're looking at allowing where we allow the opponent to actually take different strategies at the leaf of the search surgery, if you will.

[00:17:12]

And that is a different way of doing it. And it doesn't assume, therefore, a particular way that the opponent plays, but it allows the opponent to choose from a set of different continuation strategies, and that forces us to not be too optimistic in our lookahead search. And that's that's one way you can do sound lookahead search in imperfect information games, which is very difficult. And in us, you were asking about deep STAC. What they did was very different than what we do either in Liberator's or in this new work.

[00:17:48]

They were randomly generating various situations in the game, then they were doing it to look ahead from there to the end of the game as if that was the start of a different game. And then they were using deep learning to learn those values of those states. But the states were not just the physical states. They include belief distributions.

[00:18:09]

When you talk about look ahead of the SEC or when the US doesn't mean considering every possibility that the game can involve.

[00:18:19]

Are we talking about extremely sort of this exponential growth of a tree? Yes.

[00:18:24]

So we're talking about exactly that. Much like you do in AlphaBeta search or want to crawl to research, but with different techniques, so there's a different search algorithm and then we have to deal with the leaves differently. So if you think about what Liberator's did, we didn't have to worry about this because we only did it at the end of the game. So we would always terminate into a real situation and we would know what to pay out. This it didn't do this depth limited locates.

[00:18:53]

But now in this new paper, which is called Depth Limited, I think it's called depth limited, search for Imperfect Information games, we can actually do sound depth, limited lookahead. So we can actually started with a look ahead from the beginning of the game on, because that's too complicated to do for this whole long game.

[00:19:11]

So in Liberator's, we were just doing it for the end.

[00:19:14]

So and then the other side, the belief distribution. So is it explicitly modeled what kind of believes that the opponent might have?

[00:19:23]

Yeah, yeah, it is explicitly modeled, but it's not assumed. The beliefs are actually output, not input. Of course, the starting beliefs are input, but they just fall from the rules of the game because we know that the dealer deals uniformly from the deck. So I know that every pair of cards that you might have is equally likely. I know that for a fact that just follows from the rules of the game, of course. Except that they have.

[00:19:51]

I know you don't have those yet. You have to take that into account. That's called card removal. And that's very important.

[00:19:57]

Is the dealing always coming from a single deck in heads up so you can assume a single deck that you'd know that if somebody if if I have the ace of spades, I know you don't have an ace of spades.

[00:20:10]

So in the beginning, your belief is basically the fact there's a fair dealing of hands.

[00:20:15]

But how do you start to adjust their belief?

[00:20:19]

Well, that's where this beauty of game theory comes from. So Nash Equilibrium, which John Nash introduced in 1950, introduces what rational play is when you have more than one player.

[00:20:32]

And these are pairs of strategies where strategies are contingency plans, one for each player so that neither player wants to deviate to a different strategy, given that the other doesn't deviate. But that's a side effect.

[00:20:47]

You get the beliefs from base rule. So Nash, equilibrium really isn't just deriving in this imperfect information games. Nash equilibrium doesn't just define strategies. It also defines believes for both of us and finds beliefs for each state so that each state, each individual call information sits at each information set in the game. There's a set of different states that we might be in, but I don't know which one we're in. Nash Equilibrium tells me exactly what is a probability distribution over those real states.

[00:21:23]

In my mind, how does Nash equilibrium give you that distribution?

[00:21:27]

So while I'll do a simple example so you know the game, rock, paper, scissors, so we can draw it as player one moves first and the player tumbles. But of course, it's important that Player two doesn't know what player one moved otherwise player to would win every time. So we can draw that as an information set where player one makes one or three moves first, and then there's an information set for player to player to who doesn't know which of those nodes the world is in.

[00:21:58]

But once we know the strategy for Player one, Nash equilibrium will say that you play one third rock, one food paper, one third Caesar's. From that, they can derive my beliefs on the information. Said that they're one third, one third, one third. There's a base gives you that you.

[00:22:14]

But is that specific to a particular player or is it is it something you quickly update with those that the game theory isn't really player specific?

[00:22:25]

So that's also why we don't need any data. We don't need any history how these particular humans played in the past or how any A.I. or human had played before.

[00:22:33]

It's all about rationality. So we just think he just thinks about what would a rational opponent do and what would I do if I were I am rational.

[00:22:44]

And what's that? That's the idea of game theory. So it's really a data free opponent free approach.

[00:22:52]

So comes from the design of the game as opposed to the design of the player. Exactly.

[00:22:57]

There's no opponent modelling persay. I mean, we've done some work on combining aborning modelling with game theory so you can exploit weak players even more. But that's another strand. And in libraries, we didn't turn that on because I decided that these players are too good. And when you start to exploit an opponent, you're typically open yourself up self up to exploitation. And these guys have so few holes to exploit and the world's leading experts in counter exploitation. So I decided that we're not going to turn that stuff on.

[00:23:25]

Actually, I saw a few. Exploiting opponents sound very interesting to explore. Do you think there's room for exploitation generally outside of the Broaddus? Is there a subject or people differences that could be exploited, maybe not just in poker, but in general interactions, negotiations, all these other domains that you're considering?

[00:23:50]

Yeah, I definitely we've done some work on that, and I really like the work at Hybridize. Is that two. So you figure out what would a rational opponent do? And by the way, that's safe in these Zero-Sum games to a player, Zero-Sum games, because if the opponent does something irrational, yes, it might also throw off my beliefs. But the amount that the player can gain by throwing out of my belief is always less than they lose by playing poorly.

[00:24:18]

So so it's safe. But still, if somebody is weak as a player, you might want to play differently to exploit them more so that you can think about it this way.

[00:24:28]

A game theoretic strategies are unbeatable, but it doesn't maximally beat the other opponent. So the winnings per hand might be better with a different strategy and the hybridised that you start from a game theoretic approach. And then as you gain data from about the opponent in certain parts of the game three, then in those parts of the game three, you start to tweak your strategy more and more towards exploitation while still staying fairly close to the game through the strategy so as to not open yourself up to exploitation too much.

[00:25:03]

How how do you do that? Do you try to vary up strategies, make it unpredictable? It's like, what is it, tit for tat strategies in prisoner's dilemma or.

[00:25:17]

Well, that that's a repeated game kind of prisoner's dilemma. Repeated games. But but even there is no proof that says that that's a best thing. But experimentally, it actually does does well.

[00:25:29]

So what kind of games are there? First of all, I don't know if this is something that you could just summarize. There's perfect information games or all the information is on the table. There is imperfect information games. There's repeated games that you play over and over. There's zero sum games. There's not zero sum games. Yeah. And then there's a really important distinction you're making to player versus more players.

[00:25:57]

So what are what other games are there and what's the difference, for example, with this to play player game versus more players. Yeah. What are the key differences. You so let me start from the.

[00:26:10]

The basic so. A repeated game is a game where the same exact game is played over and over in this extensive form games where I think about three form, maybe with this information says to represent incomplete information, you can have kind of repetitive interactions. Even repeated games are a special case of that, by the way. But the game doesn't have to be exactly the same. It's like insourcing, sorting objects. Yes, we're going to see the same supply base year to year.

[00:26:43]

But what I'm buying is a little different every time, and the supply base is a little different every time and so on. So it's not really repeated. So to find a purely repeated game is actually very rare in the world.

[00:26:55]

So there really are very coarse model of what's going on. Then if you move up from that just repeated, simple, repeated Matrix games, not all the way to extensive formal games, but in between the stochastic games where, you know, there's these you think about it like this little Matrix games. And when you take an action and you're taking an action, they determine not which next state I'm going to, next game I'm going to, but the distribution over next games where I might be going.

[00:27:29]

So that's the stochastic game. But it's like Matrix games, repeated stochastic games, extensive form games that is from less to more general. And poker is an example of the last one.

[00:27:42]

So it's really in the most general setting, extensive form games. And that's kind of what the AA community has been working on and being benchmarked on with this heads up, no limit Texas Hold'em.

[00:27:54]

Can you describe extensive form games? What was the model here?

[00:27:58]

Yeah, so if you're familiar with the tree form, so it's really the tree form, like in chess there's a search tree versus a matrix is a matrix here. And that's the matrix. It's called the matrix form or by matrix form or normal form game. And here you have the tree form. So you can actually do certain types of reasoning there that you lose the information when you go the normal form. There's a certain form of equivalence, like if you go from three form and you say every possible contingency plan is a strategy, then I can actually go back to the normal form.

[00:28:31]

But they lose some information from the lack of sequence reality then the multiplayer is to play a distinction is an important one. So two player games in zero sum are conceptually easier and computationally easier. There's still huge like this, this one, but they're conceptually easier and computationally easier in that. Conceptually, you don't have to worry about which equilibrium is the other guy going to play when there are multiple because any equilibrium strategy is a best response to any other equilibrium strategy.

[00:29:08]

So I can play a different equilibrium from you and we'll still get the right values of the game that falls apart even with two players.

[00:29:15]

When you have a general or someone gets even with our cooperation, just even without cooperation. So there's a big gap from two player, zero sum to two player general sum or even the three players zero sum.

[00:29:27]

That's that's a big gap, at least in theory. Can you may be non mathematically providing tuition while falls apart with three or more players. It seems like you should still be able to have a Nash equilibrium that. Yeah, that's instructive, that holds OK, so it is true that all finite games have a Nash equilibrium. So this is what John Nash actually proved. So they do have a Nash equilibrium, that's not the problem. The problem is that there can be many and then there's a question of which equilibrium to select.

[00:30:06]

So and if you select your strategy from a different equilibrium when I select mine.

[00:30:14]

Then what does that mean? And in this non-zero some games, we may lose some. Joint benefit by being just simply stupid. We could actually both be better off if we did something else. Yes. And in three player, you get other problems also like collusion, like maybe you and I can gang up on a third player and we can do radically better by colluding. So there are lots of issues that come up there.

[00:30:38]

So no Brown student you work with on this has mentioned. I looked through the AMA on Reddit. He mentioned that the ability of poker players to collaborate would make the game. He was asked the question of how would you make the game of poker?

[00:30:54]

Or both of you were asked the question, how would you make the game of poker beyond being solvable by current methods?

[00:31:03]

And he said that there's not many ways of making poker more difficult, but a collaboration or cooperation between players would make it extremely difficult. So can you provide the intuition behind why that is, if you agree with that idea? Yeah.

[00:31:22]

So I've done a lot of work on coalitional games and we actually have a paper here with my other student, Gabriela Farina, and some other collaborators on at neeps on that. I actually just came back from the poster session where we presented that. So when you have a collusion, it's a different problem and it typically gets even harder than. Even the game representations, some of the game representations don't really allow. We're computation, so we actually introduced a new game representation for that, is that kind of cooperation part of the model?

[00:31:58]

Are you do you have do you have information about the fact that other players are cooperating or is it just this chaos? There were nothing is known. So there's some some things unknown. Can you give an example of a collusion type game or is it, you know, like bridge?

[00:32:14]

So think about bridge. It's like when you and I are on a team, our pay offs are the same. The problem is that we can't talk. So.

[00:32:23]

So when I get my cards, I can't whisper to you what my cards are that would not be allowed.

[00:32:28]

So we have to somehow coordinate our strategies ahead of time and only ahead of time. And then there's certain signals we can talk about, but they have to be such that the other team also understands that.

[00:32:43]

So so that's that's an example where the coordination is already built into the rules of the game. But in many other situations like auctions or negotiations or diplomatic relationships, poker, it's not really built in, but it still can be very helpful for the colluders.

[00:33:01]

I've read you read somewhere in the negotiations you come to the table with prior like a strategy that like that you're willing to do and are willing to do those kinds of things.

[00:33:14]

So how do you start to now moving away from poker, moving beyond poker into other applications like negotiations?

[00:33:22]

How do you start applying this to other to other domains? Yeah, real world domains that you've worked on.

[00:33:29]

Yeah, I actually have to start up companies doing exactly that. One is called Strategic Machine, and that's going to be Indian applications, gaming, sports, all sorts of things like that. Any applications of this to business and to sports and or gaming to various types of things in finance, electricity markets and so on.

[00:33:50]

And the other is called a strategy robot, where we are taking these to military security, cybersecurity and intelligence applications.

[00:33:59]

I think you worked a little bit in how you put it advertisement as sort of suggesting ads kind of thing.

[00:34:11]

Yeah, and that's another company, optimized markets.

[00:34:14]

But that's much more about a combinatorial market and optimization based technology that's not using these game theoretic reasoning technologies, I think.

[00:34:23]

OK, so what sort of high level do you think about our ability to use game theoretic concepts to model human behavior? Do you think do you think human behavior is amenable to this kind of modeling? So outside of the poker games and where have you seen it done successfully in your work?

[00:34:44]

I'm not sure the goal really is modeling. Humans are like, for example, if I'm playing a zero sum game. Yes, I don't really care that the opponent is actually following my model of rational behavior, because if they're not, that's even better for me. Right.

[00:35:03]

So so see, with the opponent's games, there's a the prerequisite is that you formalize the interaction in some way that can be amenable to analysis. And you've done this amazing work with mechanism design, designing games that have certain outcomes.

[00:35:26]

But so I'll tell you an example from my from my world of autonomous vehicles. We're studying pedestrians and pedestrians and cars negotiating in nonverbal communication. There's this weird game dance of tension where pedestrians are basically saying, I trust that you won't kill me. And so as a jaywalker, I will step onto the road even though I'm breaking the law. And there's this tension.

[00:35:51]

And the question is, we really don't know how to model that well in trying to model intent. And so people sometimes bring up ideas of game theory and so on.

[00:36:01]

Do you think that aspect of human behavior can use these kinds of imperfect information approaches modeling? How do we how do you start to attack a problem like that when you don't even know how the game to design the game, to describe the situation in order to solve it?

[00:36:20]

OK, so I haven't really thought about jaywalking, but one thing that I think would be a good application in an autonomous vehicles is the following.

[00:36:29]

So let's say that you have fleets of autonomous cars operating by different companies. So maybe here's the way Marfleet and here's that Uber fleet. If you think about the rules of the road, they define certain legal rules, but that still leaves a huge strategy space open, like a simple example when cars merge.

[00:36:49]

Documents, you know, they slow down and look at each other and try to try to merge, wouldn't it be better if these situations would already be renegotiated so we can actually merge at full speed? And we know this is a situation, this is how we do it, and it's all going to be faster. But there are way too many situations to negotiate manually. So you could use automated negotiation. This is the idea. At least you could use automated negotiation to negotiate all of these situations or many of them in advance.

[00:37:20]

And of course, it might be that, hey, maybe you're not going to always let me go first. Maybe you said, OK, well, in these situations I'll let you go first. But in exchange, you're going to give me two hours. You're going to let me go first in these situations. So it's this huge combinatorial negotiation.

[00:37:37]

And do you think there's room in that example of merging to model this whole situation is an imperfect information game, or do you really want to consider it to be a perfect.

[00:37:46]

That's a good question. Yeah, that's a good question.

[00:37:49]

Do you pay the price of assuming that you don't know everything? Yeah, I don't know. It's certainly much easier. Games with perfect information are much easier. So if you can get away with it, you should. But if the real situation is of imperfect information, then you're going to have to deal with imperfect information.

[00:38:11]

Great.

[00:38:12]

So what lessons have you learned in your computer poker competition? An incredible accomplishment of you know, you look at the history of deep blue Fargo, these kind of moments when I stepped up in an engineering effort and a scientific effort combined to beat the best, the human player.

[00:38:32]

So what do you take away from this whole experience? What have you learned about designing as systems that play these kinds of games? And what does that mean for sort of A.I. in general, for the future of AI development? Yeah, so that's a good question.

[00:38:49]

There's so much to say about it. I do like this type of performance oriented research, although in my group we go all the way from idea to theory to experimental big system fielding the commercialization. So we spend that spectrum by.

[00:39:04]

But I think that in a lot of situations in A.I., you really have to build the big systems and evaluate them at scale before you know what works and doesn't. And we've seen that in the computational game theory community, that there are a lot of techniques that look good in the small, but then this is to look good in the large. And we've also seen that there are a lot of techniques that look superior in theory and I really mean of convergence rates better like first or methods better convergence rates like the CFR based algorithms.

[00:39:37]

Yet the CFR based telegram's are the fastest in practice. So it really tells me that you have to test this. In reality, the theory isn't tight enough, if you will, to tell which algorithms are better than the others. And you have to look at these things that in the litch, because any sort of projections you draw from the small can, at least in this domain, be very misleading. So that's kind of from from a kind of science and engineering perspective.

[00:40:04]

From a personal perspective, it's been just a wild experience in that with the first Pulcher competition, the first branes versus apemen machine competition that we organized, there had been, by the way, for all the poker games, there had been previous competitions, but this was four heads up, no limit. This was the first and I probably became the most hated person in the world of poker. And I didn't mean to.

[00:40:28]

I sized that character in the game for a yeah, it was a lot of people felt that it was a real threat to the whole game, the whole existence of the game. If A.I. becomes better than humans, people would be scared to play poker because there are these superhuman ace running around taking their money and, you know, all of that. So so I just it's just really aggressive. The comments were super aggressive. I got everything just short of death threats.

[00:40:58]

Do you think the same was true for chess? Because right now they just completed the world championships and chess and humans just started ignoring the fact that there is A.I. systems now that outperform humans and they still enjoy the game is still a beautiful game. That's what I think.

[00:41:13]

And I think the same thing happens in poker as so I didn't think of myself as somebody was going to kill the game and I don't think I did. Yeah, I've really learned to love this game. I wasn't the poker player before, but I learned so many nuances about it from these A's. And they've really changed how the game is played, by the way. So they have these very machine ways of playing poker and the top humors are now incorporating those types of strategies into their own play.

[00:41:37]

So if anything, to me, our work has made poker a richer, more interesting game for humans to play, not something that is going to steer us away from. Entirely. Just a quick comment and something you said, which is. If I may say so in academia's, a little bit rare, sometimes it's pretty brave to put your ideas to the test in the way you described, saying that sometimes good ideas don't work when you actually try to apply them at scale.

[00:42:09]

So where does that come from? I mean, what if you could do, um, advice for people? What drives you in that sense where you always this way?

[00:42:18]

I mean, it takes a brave person, I guess, is what I'm saying, to test their ideas and to see if this thing actually works against human top human players and so on.

[00:42:28]

I don't know about brave, but it takes a lot of work. It takes a lot of work and a lot of time to organize, make something big and to organize an event and stuff like that.

[00:42:39]

And what drives you in that effort? Because you could still, I would argue, get a best paper award at Nips, as you did in 17 without doing this. That's right. Yes.

[00:42:48]

Yes. And so so in general, I believe it's very important to do things in the real world and at scale.

[00:42:58]

And that's really where the the pudding, if you will, proof is in the pudding that that's where it is. In this particular case, it was kind of a competition between different groups and for many years as to who can be the first one to beat the top human heads up, no limit Texas Hold'em. So the game became kind of like a competition. Who can get there? Yeah.

[00:43:26]

So a little friendly competition can be can do wonders for progress.

[00:43:30]

Yes. So the topic of mechanism design, which is really interesting, also kind of new to me, except as an observer of, I don't know, politics and any I'm an observer of mechanisms.

[00:43:44]

But you write in your paper an automated mechanism design that I quickly read. Some mechanism design is designing the rules of the game so you get a certain desirable outcome and you have to work on doing so in an automatic fashion as opposed to fine tuning it.

[00:44:03]

So what have you learned from those efforts? If you look say, I don't know, it's complex, just like our political system. Can we design our political system to have in an automated fashion to have outcomes that we want? Can we design something like traffic lights to be smart where it gets outcomes that we want?

[00:44:28]

So what are the lessons that you draw from that work?

[00:44:31]

Yeah, so I still very much believe in the automated mechanism. Design direction. Yes. But it's not a panacea. There are impossibility results in mechanism design, saying that there is no mechanism that accomplishes Objective X in Class C. So so it's not going up. There's no way using any mechanism designed to tools, manual or automated to do certain things in the mechanism design.

[00:44:59]

Korkin described that again. So meaning there it's impossible to achieve that.

[00:45:04]

Yeah, yeah. This is certainly more likely impossible.

[00:45:08]

So it's always these are these are not statements about human ingenuity. It all might come up with something smart. These are proofs that if you want to accomplish propolis X in class C, that is not doable with any mechanism. The good thing about automated mechanism design is that we're not really designing for a class. We're designing for specific settings at the time.

[00:45:30]

So even if there's an impossible result for the whole class, it just doesn't mean that all of the cases in the class are impossible. It just means that some of the cases are impossible. So we can actually carve these islands of possibility within this not only possible classes and we've actually done that. So one of the famous results in mechanism design is a Myerson's Satterthwaite Theorem by Roger Myerson and Mark Satterthwaite from 1983. It's an impossibility of efficient trade under imperfect information.

[00:46:01]

We show that. You can in many settings avoid that and get the efficient trade anyway, depending on how you design the game. OK, so depending how you design the game, of course it's not it doesn't in any way, any way contradict the impossibility of the impossibility.

[00:46:18]

Results are still there, but it just finds sports within this impossible class where in those sports you don't have that possibility.

[00:46:28]

Sorry if I'm going a bit philosophical, but what lessons do you draw towards, like I mentioned, politics or human interaction and designing mechanisms for outside of just these kinds of trading or auctioning or purely formal games?

[00:46:48]

Are human interaction like a political system?

[00:46:50]

What how do you think it's applicable to politics or to business, to negotiations, these kinds of things, designing rules that have certain outcomes?

[00:47:05]

Yeah, yeah, I do think so.

[00:47:07]

Have you seen success that successfully done the I didn't really mean mechanism design or automated automated mechanisms, but so mechanism design itself has had.

[00:47:19]

Fairly limited success so far. There are certain cases, but most of the real world situations are actually not sound from a mechanism design perspective, even in those cases where they've been designed by very knowledgeable mechanism design people, the people are typically just taking some insights from the theory and applying those insights into the real world rather than applying the mechanisms directly. So one famous example of is the FCC spectrum auctions.

[00:47:49]

So I've also had a small role in that. And our very good economists have been were excellent economists have been working on that or know game theory.

[00:48:00]

Yet the rules that are designed in practice there, there are such that bidding truthfully is not the best strategy. Usually mechanism design. We try to make things easy for the participants. So telling the truth is the best strategy. But even in those very high stakes auctions where you have tens of billions of dollars worth of spectrum being auctioned. Truthtelling is not the best strategy. And by the way, nobody knows even a single optimal beating strategy for those auctions.

[00:48:30]

What's the challenge of coming up with an optimal? Because there's a lot of players and there's not a lot of players, but many items for sale. And these mechanisms are such that even with just two items or one item, bidding truthfully wouldn't be the best strategy.

[00:48:47]

If you look at the history of A.I., it's marked by seminal events and Africa being a world champion human go player, I would put librettists winning their heads up.

[00:48:58]

No limit hold'em is one such event. Thank you.

[00:49:02]

And what do you think? Is the next such event, whether it's in your life or in broadly, a community that you think might be out there, that would surprise the world.

[00:49:18]

So that's a great question. And I don't really know the answer in terms of game solving. Heads up, no limit. Texas Hold'em really was the one remaining widely agreed upon benchmark. So that was the big milestone. Now, are there other things? Yes, certainly there are. But there is not one that the community has kind of focused on. So what could be other things? There are groups working on Starcraft. There are groups working on Dota two.

[00:49:46]

These are video games.

[00:49:47]

Yes. Or you could have like diplomacy or Hanabi, you know, things like that. These are like recreational games, but none of them are really acknowledged as kind of the main next challenge problem like chess or go or heads up. No limit. Texas Hold'em was so I don't really know in the game solving space what is or what will be the next benchmark.

[00:50:11]

I kind of hope that there will be a next benchmark because the really the different groups working on the same problem really drove these application independent techniques forward very quickly over 10 years. Do you think there's an open problem that excites you, that you start moving away from games into real world games like, say, the stock market trading?

[00:50:33]

Yeah, so that's that's kind of how I am.

[00:50:35]

So I am probably not going to work as hard on these recreational benchmarks.

[00:50:44]

I'm going to start up on game solving technology, strategic machine and strategy robot. Then we're really interested in pushing this stuff into practice. What do you think would be really.

[00:50:57]

You know, a powerful result that would be surprising, that would be if you can say, I mean, you know, five years, 10 years from now, something that statistic you say is not very likely.

[00:51:14]

But if there's a breakthrough, would achieve. Yeah.

[00:51:18]

So I think that overall we're in a very different situation in game theory than we are in, let's say, machine learning. Yes.

[00:51:28]

So in machine learning, it's a fairly mature technology and it's very broadly applied and proven success in the real world in game solving. There are almost no applications yet. We have just become super human, which machine learning, you could argue, happened in the 90s, if not earlier, and at least unsupervised learning a certain complex, supervised learning applications.

[00:51:54]

Now, I think a next challenge. I know you're not asking about this where you're you're asking about the technology breakthrough, but I think that the big, big breakthrough is to be able to show it. Hey, maybe most of, let's say, military planning or most of business strategy will actually be done strategically using computational game theory that that's what I would like to see as a next five or 10 year goal.

[00:52:14]

Maybe you can explain to me again, forgive me if this is an obvious question, but, you know, machine learning methods, neural networks are suffer from not being transparent, not being explainable. A game theoretic methods, you know, Nash equilibria. Do they generally when you see the different solutions, are they when you talk about military operations, are they once you see the strategies, do they make sense of the explainable or do they suffer from the same problems as neuro networks do?

[00:52:44]

So that's that's a good question. I would say a little bit yes and no.

[00:52:48]

And what I mean by that is that these games are the strategies, let's say Nash equilibrium. It has provable properties.

[00:52:57]

So it's unlike, let's say, deep learning where you kind of cross your fingers. Hopefully it'll work. And then after the fact, when you have the weights, you still crossing your fingers. And I hope it will work here.

[00:53:09]

You know that the solution quality's there. Disprovable or solution quality guarantees. Now, that doesn't necessarily mean that the strategies are human. Understandable. That's a whole other problem. So I think that deep learning and computational game theory are in the same boat in that sense that both are difficult to understand. But at least the game theoretic techniques they have this get guarantees of guarantee quality.

[00:53:36]

So did you see business operation, strategic operations or even military in the future being at least the strong candidates being proposed by automated systems? Do you see that? Yeah, I do. I do.

[00:53:51]

But that's more of a belief than an unsubstantiated fact, depending on where you land and optimism or pessimism.

[00:53:58]

That's a relief to me.

[00:54:00]

That's an exciting future, especially if they're provable things in terms of optimality.

[00:54:07]

So looking into the future, there's a few folks worried about the especially you look at the game of poker, which is probably one of the last benchmarks in terms of games being solved.

[00:54:22]

The worry about the future and the existential threats of artificial intelligence. So the negative impact in whatever form on society, is that something that concerns you? As much or are you more optimistic about the positive impacts of I or I am much more optimistic about the positive impacts? So just in my own work, what we've done so far, but we run the nationwide kidney exchange. Hundreds of people are walking around alive today. Who would it be? And it's increased employment.

[00:54:52]

You have you have a lot of people now running kidney exchanges and at the transplant centers interacting with the kidney exchange, you have extra surgeons, nurses, anesthesiologists, hospitals, all of that sort. So employment is increasing from that and the world is becoming a better place. Another example is combinatorial sourcing auctions.

[00:55:15]

We did 800 Large-Scale combinatorial sourcing auctions from 2001 to 2010 in a previous startup or combine it and we increase the supply chain efficiency on that 60 billion dollars of spend by twelve point six percent. So that's over six billion dollars of efficiency improvement in the world. And this is not like shifting value from somebody to somebody else, just efficiency improvement like in trucking, less empty driving, so there's less waste, less carbon footprint and so on.

[00:55:50]

The huge positive impact in the near term. But sort of to stay in it for for a little longer because I think game theory has a role to play here.

[00:55:59]

Well, let me actually come back and that is one thing I think is also going to make the world much safer.

[00:56:06]

So, so so that's another aspect that often gets overlooked. Well, let me ask this question.

[00:56:11]

Maybe you can speak to the the safer. So I talked to Max Tegmark, Stewart Russell, who are very concerned about existential threat, severe. And often the concern is about value, misalignments or A.I. systems, basically working uprating towards goals that are not the same as human civilization, human beings.

[00:56:34]

So it seems like game theory has a role to play there to to make sure the values are aligned with human beings.

[00:56:44]

I don't know if that's how you think about it. If not, how do you think I might help with this problem? How do you think I make the world safer? Yea, I think this value misalignment is a fairly theoretical worry, and I haven't really seen it in it because I do a lot of real applications. I don't see it anywhere. The closest I've seen it was the following type of mental exercise, really, where I had this argument in the late 80s when we were building these transportation optimisation systems and somebody had heard that it's a good idea to have high utilization of assets.

[00:57:24]

So they told me, hey, why don't you put that as objective? And we didn't even put it as an objective because I just showed him at you know, if you had that as your objective, the solution would be to load your trucks full and drive in circles. Nothing would ever get delivered. You'd have 100 percent utilization.

[00:57:41]

So, yeah, I know this phenomenon. I've known this for over 30 years, but I've never seen it actually be a problem. Reality in reality. And yes, if you have the wrong objective, the will optimize that to the hilt. And it's going to hurt more than some human who's kind of trying to solve it a half baked way with some human insight, too. But I just haven't seen that materialize in practice.

[00:58:05]

There's this gap that you actually put your finger on very clearly just now between theory and reality. That's very difficult to put into words. I think it's what you can theoretically imagine. The worst possible case or even. Yeah, I mean, bad cases and what usually happens in reality. So, for example, to me, maybe it's something you can comment on, having grown up in. I grew up in the Soviet Union.

[00:58:35]

You know, there's currently 10000 nuclear weapons in the world. And for many decades, it's theoretically surprising to me that the nuclear war has not broken out.

[00:58:47]

Do you think about this aspect from a game theoretic perspective in general? Why is that true? Why in theory, you could see how things would go terribly wrong and somehow yet they have not.

[00:59:00]

Yeah. How do you think it's all so?

[00:59:02]

I do think that about that a lot. I think the biggest threats that we're facing as mankind one is climate change and the other is nuclear war. So as I said, those are my main two worries that that worry about then I've tried to do something about climate, thought about trying to do something for climate change twice. Actually, for two of my startups, I've actually commissioned studies of what we could do on those things. And we didn't really find a sweet spot, but I'm still keeping an eye out on that.

[00:59:29]

If there's something where we could actually provide a market solution or optimization solution or some other technology solution to problems right now, like, for example, pollution, credit markets was what we were looking at then. And it was much more the lack of political will by those markets were not so successful rather than better market design. So I could go in and make a better market design, but that wouldn't really move the needle on the world very much if there's no political will.

[00:59:57]

And in the US, you know, the market, at least the Chicago market was just shut down. And so once and then it doesn't really help a great deal.

[01:00:06]

Market design was so new nucleoside, it's more so global warming is more encroaching. Problem, you know, nuclear weapons have been here, it's an obvious problem that's just been sitting there. So how do you think about what is the mechanism design there that just made everything seem stable?

[01:00:28]

And are you still extremely worried?

[01:00:31]

I am still extremely worried. So you probably know the simple game theory of mad. So, so so this was a mutually assured destruction. And it's like it doesn't require any computation with small matrices. You can actually convince yourself that the game is such that nobody wants to initiate. Yeah, that's a very cross-grained analysis.

[01:00:51]

And it really works in a situation where you have two superpowers or a small number of superpowers. Now, things are very different. You have a smaller nuke. So the threshold of initiating is smaller and you have smaller countries and non non nation actors who may get nukes and so on. So it's I think it's riskier now than it was maybe ever before.

[01:01:14]

And what idea application of A.I. you've talked about a little bit, but what is the most exciting to you right now? I mean, you're here at NEBs NeuroPace now.

[01:01:28]

You have a few excellent piece of work, but what are you thinking into the future with several companies are doing? What's the most exciting thing or one of the exciting things?

[01:01:37]

The number one thing for me right now is coming up with these scalable techniques for game solving and applying them into the real world that I'm still very interested in market design as well. And we're doing that in the optimized markets. But I'm most interested if no one right now is strategic machine strategy, robot, getting that technology out there and seeing as you were in the trenches doing applications, what needs to be actually filled with technology gaps still need to be filled.

[01:02:06]

So it's so hard to just put your feet on the table and imagine what needs to be done. But when you're actually doing real applications, the applications tell you what needs to be done. And I really enjoy that interaction.

[01:02:17]

Is it a challenging process to apply some of the state of the art techniques you're working on and having the various players in industry or the military or people who could really benefit from it actually use it? What's that process like of, you know, in advanced vehicles or work with automotive companies?

[01:02:40]

And they're in in many ways are a little bit old fashioned. It's difficult. They really want to use this technology. There's clearly will have a significant benefit, but the systems aren't quite in place to easily have them integrated in terms of data, in terms of compute, in terms of all these kinds of things. So deuced is that one of the bigger challenges that you're facing and how do you tackle that challenge?

[01:03:06]

Yeah, I think that's always the challenge there. That's kind of slowness and inertia, really. Let's do things the way we've always done it. You just have to find the internal champions that the customer who understand that he thinks can't be the same way in the future. Otherwise bad things are going to happen. And it's in autonomous vehicles. It's actually very interesting that the car makers are doing that. Then they're very traditional. But at the same time, you have tech companies who have nothing to do with cars or transportation like Google and Baidu really pushing on autonomous cars.

[01:03:38]

I find that fascinating.

[01:03:39]

Clearly, you're super excited about actually these ideas having an impact in the world of in terms of the technology, in terms of ideas and research, other directions that you're also excited about, whether that's on the some of the approaches you talked about with imperfect information games, whether it's applying deep learning to some of these problems. Is there something that you're excited in in the research side of things? Yeah, yeah.

[01:04:06]

Lots of different things in the game. Solving so solving even bigger games, games where you have more heat in action of the play, your actions as well.

[01:04:18]

Poker is a game where really the transactions are hidden or some of them are hidden, but the player actions are public. It multiplayer games of various sorts, collusion, opponent exploitation, all, all and even longer games, some games that basically go forever, but they're not repeated so extensive on games that go forever.

[01:04:44]

What what would that even look like? How do you represent that? How do you solve that?

[01:04:48]

What's an example of a game like that?

[01:04:50]

This is some of the success against the business strategy and not just modeling like a particular interaction, but thinking about the business from here to eternity or I see or let's let's see military strategy.

[01:05:05]

So it's not like war is going to go away. How do you think about military strategy that's going to go forever?

[01:05:13]

How do you even model that? How do you know whether a move was good that you or somebody made and so on?

[01:05:21]

So that that's kind of one direction. I'm also very interested in learning much more scalable techniques for Inderjit programming. So we had a nice paper this summer on that for the first automated algorithm configuration paper that has a theoretical generalization guarantees. So if I see this many training examples and I told my algorithm in this way, it's going to have good performance on the real distribution, which I have not seen. So of which is kind of interesting that algorithm configuration has been going on now for.

[01:05:55]

At least 17 years seriously, and there has not been any generalization there before.

[01:06:02]

Well, this is really exciting and has been it's a huge honor to talk to you. Thank you so much to us. Thank you for bringing brought us to the world and all the great work you're done. Well, thank you very much. It's been fun. Good questions.