The following is a conversation with Aryal Viniar. He's a senior research scientist to Google the mind, and before that he was a Google brain and Berkeley. His research has been cited over 39000 times. He's truly one of the most brilliant and impactful minds in the field of deep learning. He's behind some of the biggest papers and ideas now, including sequence to sequence learning, audio generation, image captioning, neural machine translation and, of course, reinforcement learning.
He's a lead researcher of the Alpha Star project, creating an agent that defeated a top professional at the game of Starcraft. This conversation is part of the Artificial Intelligence podcast, if you enjoy it. Subscribe on YouTube, iTunes or simply connect with me on Twitter at Lex Friedman spelled F Outride. And now here's my conversation with Aryal Menials.
You spearheaded the deep mine team behind Alpha Starr that recently beat a top professional player, Starcraft.
So you have an incredible wealth of work and deep learning and a bunch of fields.
But let's talk about Starcraft first. Let's go back to the very beginning. Even before our first star, before the mine, before deep learning. First, what came first for you? A love for programming or a love for video games?
I think for me, it definitely came first. The drive to play video games.
I really liked computers. I didn't really code much. But what I would do is I would just mess with the computer, break it and fix it. That was the level of skills, I guess, that I gain in my very early days.
I mean, when I was 10 or 11 and then I really got into video games, especially Starcraft, actually the first version I spent most of my time just playing kind of pseudo professionally as professionally as you could play back in 98 in Europe, which was not very mainstream, like what's called nowadays eSports.
Right, of course, in the 90s. So how did you get into Starcraft? What was your favorite race? How how do you develop how did you develop your skill? What was your strategy?
All that kind of thing. So as a player, I tended to try to play not many games not to kind of disclose the strategies that I kind of developed. And I like to play random actually not in competitions, but just to I think in Starcraft there's three main races and I found it very useful to play with all of them. And so I would choose random many times, even sometimes in tournaments, to gain skill on the three races because it's not how you play against someone, but also if you understand the race because you played, you also understand what's annoying.
What then when you're on the other side, what to do to annoy that person, to try to gain advantages here and there and so on. So I actually played random, although I must say in terms of favorite race, I really like Zork. I was probably best at Zerk and that's probably what I tend to use towards the end of my career before starting university.
So let's step back a little bit. Could you try to describe Starcraft to people that may never have played video games, especially the massively online variety Starcraft?
So Starcraft is a real time strategy game. And the way to think about Starcraft, perhaps if you understand a bit chess, is that there are there's a board which is called MAP or that. Yeah, like the map where people play against each other. There's obviously many ways you can play. But the most interesting one is the one versus one set up where you just play against someone else or even the build in a blazer. Put a system that can play the game reasonably well if you don't know how to play.
And then in this board, you have, again, pieces like in chess, but these pieces are not there initially like they are in chess. You actually need to decide to gather resources to decide which pieces to build. So in a way, you're starting almost with no pieces. You start gathering resources in Starcraft. There's minerals and gas that you can gather and then you must decide how much do you want to focus, for instance, on gathering more resources or starting to build units or pieces.
And then once you have enough pieces or maybe like attack a good attacking position, then you go and attack the other side of the map. And now the other main difference with chess is that you don't see the other side of the map, so you're not seeing the moves of the enemy. It's what we call partially observable. So as a result, you must not only decide trading off economy versus building your own units, but you also must decide whether you want to scout to gather information.
But also by scouting, you might be giving away some information that you might be hiding from the enemy. So there's a lot of complex decision making all in real time. There's also unlike chess, this is not a turn based game.
You play basically all the time continuously and thus some skill in terms of speed and accuracy of clicking is also very important. And people that train for these really play this game at amazing skill level. I have seen many times these and if you can witness this, life is really, really impressive. So in a way, it's kind of a chess where you don't see the other side of the board, you're building your own pieces, and you're also need to gather resources to basically get some money to build other buildings, pieces, technology and so on.
From the perspective of the human player, the difference between that and chess or maybe that and a game like turn based strategy like here, as you might imagine, is that there's an anxiety because.
You have to make these decisions really quickly, and if you are not actually aware of what decisions work, it's a very stressful balancing of everything you describe is actually quite stressful, difficult to balance for amateur human player. I don't know if it gets easier at the professional level, like if they're fully aware what they have to do. But at the amateur level, there's a society, oh, crap, I'm being attacked. Oh, crap. I have to build up resources.
Oh, I have to probably expand and all these the time. The real time strategy aspect is really stressful and competition I'm sure difficult. We'll get into it.
But for me that's net so stark, at least in ninety eight, twenty years ago, which is hard to believe.
And Blizzard bottlenecked with Diablo ninety six came out and to me it might be a narrow perspective, but it changed online gaming and perhaps society forever.
Yeah, but I may have a way to narrow viewpoint, but from your perspective, can you talk about the history of gaming over the past 20 years? Is this how transformational, how important is this line of games?
Right. So I think I kind of was an active gamer whilst this was developing the Internet, online gaming. So for me that the way it came was I played other games, strategy related.
I played a bit of command and conquer, and then I played Warcraft two, which is from Blizzard. But at the time I didn't know I didn't understand about what Blizzard was or anything. Warcraft was just a game which was actually very similar to Starcraft in many ways. It's also a real time strategy game where there's orcs and humans. So there's only two races.
But it was offline and it was offline.
Right. So I remember a friend of mine came to the school, say, oh, there's this new cool game called Starcraft. And I just said, oh, this sounds like just a copy of Warcraft two until I kind of installed it. And at the time, I am from Spain, so we didn't have, like, very good Internet. Right. So there was for us, the Starcraft became first kind of an offline experience where you kind of start to play these missions.
Right. You play against some sort of scripted things to develop the story of the characters in the game. And then later on, I start playing against the building. I and I thought it was impossible to defeat it. Then eventually you defeat one and you can actually play against seven guys at the same time, which also felt the impossible. But actually, it's not that hard to beat seven guys at once. So once we achieved that, also we discovered that we could play, as I said, Internet wasn't that great, but we could play with land right on like basically against each other if we were in the same place, because you could just connect machines with, like, cables.
Right. So we started playing in land mode and again, as a group of friends. And it was really, really like much more entertaining than playing against the ice.
And later on, as the Internet was starting to develop and being a bit faster and more reliable, then it's when I started experiencing battlement, which is these amazing universe, not only because of the fact that you can play the game against anyone, anyone in the world, but you can also get to know more people you just get exposed to now, like this vast variety of it's kind of a bit when the chat's came about, right.
There was a chat system you could play against people, but you could also chat with people not only about Starcraft, but about anything. And that became a way of life for kind of two years. And obviously then it became like kind of it exploded in me that I started to play more seriously going to tournaments and so on, so forth.
Do you have a sense and a societal sociological level, what's this whole part of society that many of us are not aware of in? It's a huge part of society, which is gamers. I mean, every time I come across that in YouTube or streaming sites, I mean, this is the huge number of people play games religiously.
Do you have a sense of those folks, especially now that you've returned to that realm a little bit? And they said, yeah.
So in fact, I even after Starcraft, I actually played World of Warcraft, which is mainly the main sort of online worlds on presence that you get to interact with lots of people. So I played that for a little bit. It was to me it was a bit less stressful than Starcraft because winning was kind of a given you just put in this world and you can always complete missions.
But I think it was actually the social aspect of especially Starcraft first and then games like World of Warcraft really shaped me in a very interesting way, because you had you get to experiences, just people you wouldn't usually interact with. Right. So even nowadays, I still have many Facebook friends from the area. When I played online and their ways of thinking, even political, they just we don't live in like we don't interact in the in the real world, but we were connected by basically fiber.
And that way, I actually get to understand a bit better that we live in a diverse world. And these were just connections that were made by because, you know, I happened to go in a city, in a beautiful city as a priest. And I met these, you know, these warrior and we became friends and then we started playing together. Right. So I think it's it's it's transformative.
And more and more and more people are more aware of it.
I mean, it's it's becoming quite mainstream. But back in the day, as you are saying, in 2000, 2005, even though it was very still very strange thing to do, especially in in Europe, I think there were exceptions like Korea, for instance. It was amazing like that that everything happened so early in terms of cyber cafes. Like it's if you go to Salt Lake City that back in the day, Starcraft was kind of you could be a celebrity by playing Starcraft, but this was like ninety nine.
Two thousand. But it is not like recently. So, yeah, it's quite it's quite interesting to to look back. And yeah, I think it's changing society the same way, of course, like technology and social networks and so on are also transforming things.
And a quick tangent, let me ask, you're also one of the most productive people in your particular chosen passion and path in life, and yet you're also appreciate and enjoy video games.
Do you think it's possible to do it to enjoy video games in moderation?
Someone told me that you could choose two out of three. When I was playing video games, you could choose having a girlfriend playing video games or studying. And I think for the most part, it was relatively true. These things do take time games like if you take the game pretty seriously and you want to study it, then you obviously will dedicate more time to it. And I definitely took gaming and obviously studying very seriously. I love learning science and etc.
. So to me, especially when I started university undergrad, I kind of step off Starcraft. I actually fully stop playing. And then World of Warcraft was a bit more casual. You could just connect online. And I mean, it was it was fun. But as I said, that was not as much time investment as it was for me in Starcraft.
OK, so let's get into Alpha Star. What are the gear behind the team so deep? Mine has been working on Starcraft and released a bunch of cool opensource agents and so on the past few years.
But Alpha Star really is the moment where the first time you beat a world.
Class player, so what are the parameters of the challenge in the way that our four star took it on and how did you and David and the rest of my team get into it? Consider that you can even beat the best in the world or top players.
I think it all started in back in 2000 15. Actually, I'm like I think it was 2014 when the mine was acquired by Google. And I at the time was at Google Brain, which is it was in California, is still in California. We had this summit where we got together. The two groups, a Google brain and Google Deep Mine, got together and we gave a series of talks. And given that they were doing deep reinforcement learning for games, I decided to bring up part of my past, which I had developed at Berkeley, like these thing, which we call Berkeley Overmind, which is really just a Starcraft one.
But. Right. So I talked about that and I remember that means just came to me and said, well, maybe not now. It's it's perhaps a bit too early, but you should just come to the mine and do these again with deep reinforcement learning.
Right. And at the time, it sounded very science fiction for for several reasons. But then in 2016, when I actually moved to London and joined mind transferring from brain, it became apparent that because of the Alpha go moment and kind of police are reaching out to us to say, hey, like, do you want the next challenge? And also me being full time, I did mine sort of kind of all these came together. And then I was I went to to Irvine in California, to the Breezer headquarters to just chat with them and try to explain how would it all work before you do anything.
And the approach has always been about. The learning perspective, right, so in in Berkeley, we did a lot of rule-based, you know, conditioning, and if you have more than three units, then go attack. And if the other has more units than me, I retreat and so on, so forth. And of course, the point of the reinforcement learning, deep learning, machine learning in general is that all these should be learned behavior.
So that kind of was the DNA of the project since its inception in 2006, where we just didn't even have an environment to work with. And so that's how it all started, really.
So if you go back to our conversation with dummies or even in your own head, how far away did you because that's we're talking about Atari games.
We're talking about go, which is kind of if you're honest about a really far away from Starcraft in in well, now that you've beaten it, maybe you could say it's close, but it's much it seems like Starcraft is way harder than go philosophically and mathematically speaking.
So how far away did you did you think you were do you think in twenty, nineteen, eighteen you could be doing as well as you have?
Yeah. When I, when I kind of thought about, OK, I'm going to dedicate a lot of my time and focus on this. And obviously I do a lot of different research in deep learning. So spending time on it. I mean I really had to kind of think there's going to be something good happening out of this. So really I thought, while this sounds impossible and it probably is impossible to do the full thing, like all like the full game where you play one versus one and it's only a neural network playing and so on.
So it really felt like I just didn't even think it was possible. But on the other hand, I could see some stepping stones like towards that goal. Clearly you could define some problems in Starcraft and sort of dissect it a bit and say, OK, here is a part of the game. Here is another part. And also obviously the fact. So this was really also critical to me, the fact that we could access human replays. Right.
So Breezer was very kind. And in fact, they open sources for the whole community where you can just go. And it's not every single Starcraft game ever played, but it's a lot of them. You can just go and download. And every day they will you can just query a dataset and say, well, give me all the games that were played today.
And given my kind of experience with language and sequences and supervised learning, I thought, well, that's definitely going to be very helpful and something quite unique now, because ever before we had such a large data set of replays of people playing the game at this scale of such a complex video game.
Right. So that to me was a precious resource. And as soon as I knew that Blitzer was able to kind of give these to the community, I started to feel positive about something non-trivial happening. But but I also thought the fool thing, like, really no rules, no no single line of code that tries to say, well, I mean, if you see this, you need to build a detector. All these not having any of these specializations seemed really, really, really difficult to me intuitively.
I do also like that blizzard was teasing or even trolling you sort of almost. Yeah. Pulling you in into this really difficult challenge. Did they have any awareness? What's what's the interest from the perspective of blizzard except just curiosity?
Yeah, I think Blizzard has really understood and really bring forward this competitiveness of eSports in the games of Starcraft really kind of sparked a lot of something that almost was never seen, especially as I was saying it back in Korea.
So they just probably thought, well, this is such a pure one versus one set up that it would be great to see if something that can play Atari or go and then later on chess could could even tackle these kind of complex, real time strategy game. Right. So for them, they wanted to see first, obviously, whether it was possible if the game they created was in a way solvable to some extent. And I think, on the other hand, they also are a pretty modern company that innovates a lot.
So just starting to understand A.I. For them to how to bring it into games is not is not a four games, but a games for a guy. Right. I mean, both ways I think can work on you. We obviously the mind games for A.I.. Right. To drive real progress. But please, there might actually be able to do and many other companies to to start to understand and do the opposite. So I think that is also something they can get out of these.
And they definitely we have brainstorm a lot about about this. Right.
But one of the interesting things to me about Starcraft and Diablo and these games that Blitzers created is the task of balancing classes, for example, sort of making the game fair from the starting point and then let the skill determine the outcome is.
I mean, can you first comment, there's three races, Zerg Protests and Terryn, I don't know if I've ever said that out loud. Is that how you pronounce it, Terren?
Yeah. Yeah. I don't think I've ever seen in person interact with anybody about Starcraft.
It's funny. So they seem to be pretty balanced. I wonder if the A.I., the work that you're doing with Alpha Star would help balance them even further. Is that something you think about? Is that something that Blizzard is thinking about? Right.
So so balancing when you add the new unit or a new special type is obviously possible, given that you can always train or retrain that skill. Some agent that might start using that in unintended ways. But I think actually, if you understand how Starcraft has kind of coevolved with players in a way, I think it's actually very cool the ways that many of the things and strategies that people came up with. Right. So I think we've seen it over and over in Starcraft.
That blizzard comes up with maybe a new unit and then some players get creative and do something kind of unintentional or something that please our designers that just simply didn't test or think about. And then after that becomes kind of mainstream in the community. Blizzard watches the game and and then they kind of maybe weaken that strategy or make it actually more interesting, but a bit more balanced. So these kind of continual talk between players and Breezer, this is kind of what has defined them actually in actually most games in Instagram, but also in World of Warcraft.
They would do that. There are several classes and it would be not good that everyone plays absolutely the same race or and so on.
Right. So I think they they do care about balancing, of course, and they do a fair amount of testing. But it's also beautiful to to also see how players get creative anyways. And I mean, whether they can be more creative at this point. I don't think so. Right. I mean, it's just sometimes something so amazing happens. Like I remember back in the days, like you have these drop ships that could drop the rivers and that was actually not thought about that.
You could drop these. You need that has these what's called splash damage that would basically eliminate all the enemies workers at once. No one thought that you could actually put them on really early game, do that kind of damage and then, you know, things change in the game. But I don't know. I think it's quite an amazing exploration process from both sides, players and blizzardy like.
Well, it's it's almost like a reinforcement learning exploration. But I mean, the scale of humans that play that play blizzard games is almost on the scale of a large scale deep mind RL experiment. I mean, if you look at the numbers, it's I mean, you're talking about I don't know how many games, but hundreds of thousands of games, probably a month. Yeah. I mean, so you could it's almost the same as running early agents.
What aspect of the problem of Starcraft do you think is the hardest? Is it the, like you said, the imperfect information? Is it the fact they have to do long term planning? Is it the real time aspect? We have to do stuff really quickly.
Is it the fact that large action space, you can do so many possible things or is it the you know, in the game theoretic sense, there is no national or at least you don't know what the optimal strategy is because there's way too many options. Right. What is there something that stands out is just like the hardest the most annoying thing.
So when we sort of looked at the problem and start to define like the parameters of what are the observations, what are the actions, it became very apparent that, you know, the very first barrier that one would hit in Starcraft would be because of the action space being so large and as not being able to search like you could in in chairs or go, even though the space is vast.
The main problem that we identified was that of exploration. Right. So without any sort of human knowledge or human prior, if you think about Starcraft and you know how deep reinforcement learning algorithm works work, which is essentially by issuing random actions and hoping that they will get some wins sometimes so they could learn.
So if you think of the of the action space in Starcraft, almost anything you can do in the early game is bad because any action involves taking workers which are mining minerals for free. That's something that the game does automatically sends them to mine. And you would immediately just take them out of mining and send them around. So just thinking how how is it going to be possible to to get to understand that these concepts. But but even more like expanding, right.
There's there's these buildings you can place in other locations in the map to gather more resources, but the location of the building is important and you have to select a worker, send it walking to that look.
Build a building, wait for the building to be built, and then put extra workers there so they start mining, that just that feels like impossible if you just randomly click to produce that state's desirable state that then you could hope to learn from because eventually that may lead to an extra win. Right. So for me, the exploration problem and due to the action space and the fact that there's not returns, there's so many turns because the game essentially takes at 22 times per second.
If you I mean, that's how they could democratize sort of time. Obviously, you always have to democratize time. There's no such thing as real time, but it's really a lot of time, steps of things that could go wrong. And that definitely felt apriori like the hardest. You mentioned many good ones, I think partial observability. The fact that there is no perfect strategy because of the partial observability, those are very interesting problems. We start seeing more and more now in terms of as we saw of the previous ones.
But the core problem to me was exploration and solving. It has been basically kind of the focus on how we saw the first breakthroughs.
So exploration and multi hierarchical way. So like twenty two times a second exploration is a very different meaning than it does in terms of should I gather resources early or should I wait or so on. So how do you solve the long term?
Let's talk about the internals of Alpha.
So first of all, how do you represent the state of the game as an input? Right.
How do you then do the long term sequence modeling?
How do you build a policy? Right. Or what was the architecture like?
So Alpha Star has obviously several components, but everything passes through what we call the policy, which is a neural network. And that's kind of the beauty of it. There is I could just now give you a neural network and some weights. And if you had the right observations and you understood the actions the same way we do, you would have basically the agent playing the game. There's absolutely nothing else needed other than those weights that were trained. Now, the first step is observing the game.
And we've experimented with a few alternatives. The one that we currently use makes both spatial sort of images that you would process from the game. That is the zoomed out version of the of the map and also assume the inversion of the camera or the screen, as we call it. But also we give to the agent the list of units that it sees more of as a set of objects that it can operate on that is not necessarily required to use it.
And we have versions of the game that play well without these set vision. That is not like how humans perceive the game, but it certainly helps a lot because it's a very natural way to encode the game is by just looking at all the units that there are. They have properties like have position type of unit, whether it's my unit or the enemies. And that sort of is kind of the the summary of of the state of of of the game, that list of units or set of units that you see all the time.
But that's pretty close to the way humans see the game. Why do you say it's not? Isn't that you're saying the exactness of it is not.
Yeah, the humans, the exactness of it is perhaps not the problem. I guess maybe the problem, if you look at it from how actually humans play the game, is that they play with a mouse on a keyboard and a screen and they don't see sort of a structured object with all the units. What they see is what they see on the screen. Right. So you remember that there's a certain interrupt.
There's a plot that you showed with camera based where you do exactly that right around and that seems to converge to some of the performance.
Yeah, I think that's what we're kind of experimenting with, what's necessary or not, but using the set.
So so actually, if you look at research in computer vision where it makes a lot of sense to treat images as two dimensional arrays, there is actually a very nice paper from Facebook. I think I forgot who the others are, but I think it's part of gaming's has group. And what they do is they take an image, which is that these two-dimensional signal and they actually take pixel by pixel and scramble the image as if it was just a list of pixels.
And crucially, they encode the position of the pixels with the x Y coordinates.
And this is just kind of a new architecture, which we incidentally also use in Starcraft called the Transformer, which is a very popular paper from last year, which yielded very nice results in machine translation.
And if you actually believe in these kind of oh, it's actually a set of pixels, as long as you encode X Y, it's OK. Then you could argue that the list of units that we see is precisely that because we have each unit as a kind of pixel. If you will, and then there X Y coordinates, so in that perspective, we, without knowing it, we use the same architecture that was shown to work very well on past colony Metronet and so on.
So the interesting thing here is putting it in that way, it starts to move it towards the way usually work with language.
So what and especially with your expertise and work in language, it seems like there's echoes of a lot of the way you would work with natural language in the way you approach Alpha Star, right.
What does that help with the long term sequence modeling there somehow?
Exactly. So so now that we understand what an observation for a given time step is, we need to move on to say, well, there's going to be a sequence of such observations and an agent will need to. Given all that it's seen not only the current step, but all the way because there is partial observability. We must remember whether we saw Walker going somewhere, for instance. Right. Because then there might be an expansion on the top right of the map.
So given that what you must then think about is there is the problem of given all the observations, you have to predict the next action and not only given all the observations, but given all the observations and given all the actions you've taken, predict the next action.
And that sounds exactly like machine translation where and that's exactly how kind of I saw the problem, especially when you are given supervised data or replays from humans, because the problem is exactly the same.
You're translating essentially a prefix of observations and actions onto what's going to happen next, which is exactly how you would train a model to translate or to generate language as well. Right. You have a certain prefix. You must remember everything that comes in the past because otherwise you may start having non coherent text and the same architectures we using LSD and Transformer's to operate on across time to kind of integrate all that's happened in the past. Those architectures that work so well in translation or language modeling are exactly the same than what the agent is using to issue actions in the game and the way we train it.
Moreover, for imitation, which is step one of alpha studies, take all the human experience and try to imitate it, much like you try to imitate translators. That translated many pairs of sentences from French to English say that sort of principle applies exactly the same. It's you. It's almost the same code, except that instead of words, you have a slightly more complicated objects, which are the observations and the actions are also a bit more complicated that than a what is there a self play component.
And to so once you run out of imitation.
Right. So so indeed you can bootstrap from human replays, but then the agents you get are actually not as good as the humans you imitated. Right. So how do we imitate where we take humans from three thousand MMR and higher three thousand MMR is just a matter of human skill and three thousand MMR maybe like 50 percent percentile. Right. So it's just an average human. What's that. So maybe Quickparts Amar's a ranking scale, the matchmaking rating for players.
So Stresa, I remember there's like a master and a grandmaster with three thousand.
So three thousand is is pretty bad. I think it's kind of gold level.
It just sounds really good relative to chess I think. Oh yeah. Yeah, I know the ratings. The best in the world are at seven thousand. Seven thousand. So three thousand. It's a bit like elo indeed. Right. So three thousand five hundred just allows us to not filter a lot of the data. So we like to have a lot of data in deep learning as as you probably know. So we take these kind of three thousand five hundred and above, but then we do a very interesting trick, which is we tell the neural network what level they are imitating.
So we say these replay you are going to try to imitate to predict the next action for all the actions that you're going to see is a four thousand MMR replay. This one is a six thousand MMR replay. And what what's cool about these is then we take this policy that is being trained from human and then we can ask it to play like a three thousand MMR player by setting a beat saying, well, OK, play like a three thousand MMR player or play like a six thousand MMR player and you actually see how the policy behaves differently.
It gets worse economy. If you play like a gold level player, it does less actions per minute, which is the number of clicks or number of actions that you will issue in a whole minute. And it's very interesting to see that it kind of imitates the skill level quite well. But if we ask you to play like a six thousand MMR player, we tested, of course, these policies to see how well they do. They actually beat all the guys that these are put in the game, but they're nowhere near 6000 MMR players.
They may be maybe around gold level platinum perhaps. So there's still a lot of work to be done for the policy to truly understand what it means to win. So far, we only ask them, OK, here is the screen and that's what's happened on the game. Until this point, what would the next action be? If we ask we ask a pro to now say, oh, you're going to click here or here or there. And the point is experiencing experiencing wins and losses is very important to then start to refine, otherwise the policy can get loose, can can just go off policy, as we call it.
That's so interesting that you can at least hope eventually to be able to control a policy approximately to be some amama level.
That's so interesting, especially given that you have ground truth for a lot of these cases. Right. I can ask you a personal question.
What's your mama?
Well, I haven't played Starcraft two, so I am unranked. Oh, she's the kind of lowest league. OK, so I used to play Starcraft.
I won the first one and. But you haven't seriously played. So the best player we have at the mind is about five thousand MMR, which is high. Marsters is not at grandmaster level. Grandmaster level will be the top 200 players in a certain region like Europe or America or Asia.
But for me it would be hard to say I am very bad at the game. I actually played Alpha Star a bit too late and beat me. I remember the whole team was Aureole. You should play and I was I. It looks like it's not so good yet. And then I remember I kind of got busy and waited an extra week and I played and it really beat me very badly.
Was that I mean, how did that feel? Isn't that amazing? That's amazing. Yeah. I mean, obviously I tried my best and I tried to also impress my because I actually played the first game. So I'm still pretty good at the micromanagement. The problem is, I just don't understand Starcraft two. I understand Starcraft. And when I played Starcraft, I probably was consistently like for for a couple of years, top thirty two in Europe. So I was decent.
But at the time we didn't have these kind of MMR systems as well established. So it would be hard to know what, what it was back then.
So what's the difference in interface between our four star and Starcraft and a human player and Starcraft?
Is there any significant differences between the way they both see the game?
I would say the way they see the game, there's a few things that are just very hard to simulate. The main one, perhaps, which is obvious in hindsight, is what's called cloaked units, which are invisible units. So in in Starcraft, you can make some units that you need to have a particular kind of unit to detect it. So these units are invisible. If you cannot detect them, you cannot target them. So they would just destroy your buildings or kill your workers, but.
Despite the fact you cannot target the unit, there's a SHEMER that as a human, you observe. I mean, you need to train a little bit. You need to pay attention.
But you would see this kind of space time as space time like distortion. And you would know, OK, there are yeah, yeah.
There's like a wave thing called distortion and I like it.
That's really like the blizzard term is shemer.
And so these shemer professional players actually can see it immediately. They understand it very well, but it's still something that requires a certain amount of attention and it's kind of a bit annoying to deal with. Whereas for Alpha Star in terms of vision, it's very hard for us to simulate sort of all are you looking at these pixels on the screen and so on. So the only thing we can do is we there is a unit that's invisible over there. So Alpha Star would know that immediately.
Obviously still obeys the rules. You cannot attack the unit, you must have a detector and so on. But it's it's kind of one of the main things that it just doesn't feel there's a very proper way.
I mean, you could imagine, oh, you you don't have hypers. Maybe you don't know exactly what it is or sometimes you see sometimes you don't. But it's it's just really, really complicated to to to get it so that everyone would agree that's that's the best way to simulate this.
So, you know, it seems like a perception problem. It is a perception problem. So so the only problem is people you ask or what's the difference between how humans perceive the game? I would say they wouldn't be able to tell a shemer immediately as it appears on the screen, whereas Alpha Star in principle sees it very sharply.
Right. It sees it sees that the bit turned from zero to one, meaning there is now a unit there, although you don't know the unit or you don't know, you know that you cannot attack it.
And so on guard. So that from from a vision standpoint, that probably is the one that is kind of the most obvious one. Then there are things humans cannot do perfectly, even professionals, which is they may miss a detail or they may have not seen a unit. And obviously as a computer, if there is a corner of the screen that turns green because a unit enters the field of view that can go into the memory of the agent, the a.T.M, and passes there for a while and for whatever for however long is relevant.
And in terms of action, it seems like the rate of action from Alpha Stars comp. if not slower than professional players, but is there. But it's more precise as well.
So so that's that's a very like that's really probably the one that is causing us more issues for a couple of reasons. Right. The first one is Starcraft has been in our environment for quite a few years.
In fact, I mean, I was participating in the very first competition back in 2010, and there's really not been that kind of a very clear set of rules, how the actions per minute, the rate of actions that you can issue is and as a result, these agents or boards that people build in a kind of almost very cool way, they do like twenty thousand forty thousand actions per minute now.
Now, to put this in perspective, a very good professional human may do three hundred to eight hundred actions per minute. They may not be as precise. That's why the range is a bit tricky to to identify. Exactly. I mean three hundred actions per minute precisely is probably realistic. Eight hundred is probably not. But you see humans doing a lot of actions because they warm up and they kind of select things and spam and so on just so that when they need they have the accuracy.
So we came into this by not having kind of a standard way to say, well, how do we measure whether an agent is at human level or not? On the other hand, we had the huge advantage, which is because we do imitation learning agents turned out to act like humans in terms of rate of actions, even precisions and in precisions of actions in the supervised policy. You could see all these. You could see how agents like to spam click to move here.
If you played especially the. You would know what I mean. I mean, you just like spam a movie. And if you move here, you're doing literally like maybe five actions in two seconds. But these actions are not very meaningful.
Meaningful one would have sufficed. So on the one hand, we start from these immigration policy that is at the ballpark of the actions criminals of humans, because it's actually statistically trying to imitate humans. So we see this very nicely in the curves that we showed in the blog post like this, these actions per minute and the distribution looks very human like.
But then, of course, as self play kicks in and that's the part we haven't talked too much yet, but of course the agent must play itself to improve. Then there's almost no guarantees that these actions will not become more precise or even the rate of actions is going to increase over time. So what we did and this is probably. The first attempt that we thought was reasonable is we looked at the distribution of actions for humans, for certain windows of time.
And just to give a perspective, because I guess I mentioned that some of these agents that are programatic, let's call them, they do forty thousand actions per minute professionals, as I said, to three hundred to 800.
So what we look at is we look at the distribution of our professional gamers and we do reasonably high actions per minute, but we can identify certain cutoffs afterwards. Even if the agent wanted to act, these actions would be dropped.
Hmm. But the problem is this cutoff is probably set a bit too high. And what ends up happening, even though the games and when we ask the professionals and the gamers, by and large, they feel like it's playing human, like there are some Asians that developed maybe slightly to high apes, which is actions per minute, combined with the precision which made people sort of start discussing a very interesting issue, which is should we have limited these? Should we just let it loose and see what cool things it can come up with?
Right. Interesting. So this is in itself an extremely interesting question, but the same way that modeling the SHEMER would be so difficult, modeling.
Absolutely. All the details about muscles and precision and and tiredness of humans would be quite difficult. Right. So we really here in kind of innovating in this sense of, OK, what could be maybe the next iteration of putting more rules that makes the agents more humanlike in terms of restrictions.
We are putting constraints that more constraints. Yeah, that's really an issue that's really innovative. So one of the constraints you put on your on yourself or at least focused in, is on the protest race, as far as I understand. Can you tell me about the different races and how they saw protest Terran and Zerg? How do they compare? How do they interact? Why did you choose? Protus, there is in the dynamics of the game scene from a strategic perspective.
So, Protus, so Instagram, there are three races indeed. In the demonstration we saw only the Protos race.
So maybe let's start with that. One produces kind of the most technologically advanced race. It has units that are expensive but powerful. Right. So in general, you want to kind of conserve your units as you go attack. So you want to and then you want to utilize these tactical advantages of very fancy spells and so on and so forth. And at the same time, they're kind of. People say like they're they're a bit easier to play, perhaps, but that I actually didn't know.
I mean, I just talk to now a lot to the players that we we work with dialogue and manner. And they said, oh, yeah, protest is actually people thing is actually one of the easiest races. So perhaps the easier that doesn't mean that it's. No, obviously professional players excel at the three races and there's never like a race that dominates for a very long time anyway.
So if you look at the top, I don't know, one hundred in the world is there one race that dominates the list?
It would be hard to know because it depends on the regions. I think it's pretty equal in terms of distribution. And Blizzard wants it to be equal. Right? They don't want they wouldn't want one race like Protos to not be representative in the top place. Right. So definitely like they tried to be like balance. Right. So then maybe the opposite race of proteaceae. Zork Zerk is a race where you just kind of expand and take over as many resources as you can, and they have a very high capacity to regenerate their units.
So if you have an army, it's not that valuable in terms of losing the whole army is not a big deal as Isaac, because you can then rebuild it. And given that you generally accumulate a huge bank of resources, Zerk typically play by applying a lot of pressure, maybe losing their whole army, but then rebuilding it quickly. So although, of course, every race I mean, there is no I mean, they're pretty diverse. I mean, there are some units in there that are technologically advanced and they do some very interesting spells and there's some units in protest that are less valuable.
And you could lose a lot of them and rebuild them and it wouldn't be a big deal.
All right. So maybe I'm missing out. Maybe I'm going to say some dumb stuff, but as a summary of strategy. So first, there's a collection of a lot of resources, right? That's one option. The other one is expanding. So building other bases than the others, obviously attack or building units and attacking with those units. And then I don't know what else there is.
Maybe there is the different timing of attacks like the attack, early attack. Right. What are the different strategies that emerge that you've learned about? I've read a bunch of people are super happy that you guys have apparently that alpha star apparently has discovered that it's really good to. What does it saturate?
Oh, yeah. I mean online. Yeah, yeah, yeah, yeah. And it's for greedy amateur players like myself, that's always been a good strategy. You just build up a lot of money and it just feels good to just accumulate and accumulate. So thank you for discovering that. Yeah.
Validating all of us. But is there other strategies that you discovered interesting, unique to to this game?
Yeah. So if you look at the kind of I'm not being a Starcraft two player, but of course Starcraft and Starcraft two and real time strategy games in general are very similar. I would classify perhaps the openings of the game. They're very important. And generally I would say there's two kinds of openings. One, that's standard opening. That's generally how players find. Sort of a balance between risk and economy and building some units early on so that they could defend, but they're not too exposed, basically, but also expanding quite quickly.
So this would be kind of a standard opening and within a standard opening, then you what you do choose generally is what technology are you aiming towards? So there's a bit of rock, paper, scissors of you could go for spaceships or you could go for invisible units or you could go for like massive units that attack against certain kinds of units. But they're weak against others. So stand there opening themselves, have some choices like rock, paper, scissors style.
Of course, if you scout and you're good at guessing what the opponent is doing, then you can be seen as an advantage.
Because if you know you're going to play rock, I mean, I'm going to play paper, obviously, so you can imagine that normal standard games, Instagram looks like a continuous rock, paper scissors game where you guess what the distribution of rock, paper and scissor is from the anime and reacting accordingly to try to beat it or, you know, put the paper out before he kind of changes his mind from rock to scissors, and then you would be in a weak position.
So so the person that I didn't realize this because I know is true, but poker and I looked at Broaddus, you're a sissy.
You're also estimating, trying to guess the distributor, trying to better, better estimate the distribution, what the pawn is likely to be doing. Yeah.
I mean, as a player, you definitely want to have a belief state over what's up on the other side of the map. And when your belief state becomes inaccurate, when you start having serious doubts whether he's going to play something that you must know, that's when you scout, you want to then gather information is improving the accuracy of the belief or improving the belief state part of the laws to trying to optimize?
Or is it just a side effect?
It's implicit, but you could explicitly model it and it would be quite good at probably predicting what's on the other side of the map. But so far it's all implicit. There's no no additional reward for predicting the enemy. So there's these thunder openings and then there's what people called cheats, which is very interesting. And Allfirst are sometimes really likes this kind of cheats. These cheeses. What they are is kind of an all in strategy. You're going to do something sneaky.
You're going to hide enemies, hide your own buildings close to the enemy base, or you're going to go for hiding your technological buildings so that you do invisible units and the enemy just cannot react to detected and thus lose the game. And there's quite a few of these cheeses and variants of of them. And there is where actually the belief state becomes even more important, because if I scout your base and I see no buildings at all, any human player knows something's up.
They might know. Well, you're hiding something close to my base. Should I build suddenly a lot of units to defend you? Should I actually block my ramp with workers so that you cannot come and destroy my base? So there's all this is happening and defending against Jesus is extremely important. And in the Alpha League, many agents actually develop some cheezy strategies. And in the games we saw against yellow and mana, two out of the ten agents, we're actually doing these kind of strategies which are cheezy strategies.
And then there's a variant of GC strategy which is called All In.
So an all in strategy is not perhaps as drastic as, oh, I'm going to build cannons on your base and then bring all my workers and try to just disrupt your base and game over. Or as we say in Starcraft, there's these kind of very cool things that you can align precisely at a certain time, Mark. So, for instance, you can generate exactly 10 unit composition. That is perfect. Five of these type five of these are a type and align the upgrade so that at four minutes and a half, let's say you have these ten units and the upgrade just finished.
And at that point that army is really scary. And unless the enemy really knows what's going on, if you push, you might then have an advantage because maybe the enemy is doing something more standard.
It expanded to match it up to match economy and any trade off badly against having defenses and the enemy will lose. But it's called Olean, because if you don't win, then you're going to lose. So you see players that do these kind of strategies, if they don't succeed, game is not over. I mean, they still have a base and they still gathering minerals, but they will just dig out of the game because they know, well, the game is over.
I gambled and I felt so if we start entering the game theoretic aspects of the game, it's really rich. And it's really that's why it also makes it quite entertaining to watch. Even if I don't play, I still enjoy watching the game. But the agents are trying to do these mostly impressively but one. An element that we improved in self plays, creating the Alpha Standlake and the Alpha Star League is not pure self play. It's trying to create different personalities of agents so that some of them will become cheese agents.
Some of them might become very economical, very greedy, like getting all the resources, but then being maybe early on, they're going to be weak. But later on they're going to be very strong.
And by creating these personality of agents, which sometimes it just happens naturally, that you can see kind of an evolution of agents that given the previous generation, they train against all of them and then they generate kind of the the perfect counter to that distribution. But these these agents, you must have them in the populations, because if you don't have them, you're not covered against these things. Right. It's kind of you want to you want to create all sorts of the opportunities that you will find in the wild so you can be exposed to these cheeses, early aggression, later aggression, more expansion's dropping your needs in your base from the side.
All these things and pure stuff play is getting a bit stuck at finding some subset of these, but not all of these. So the Alpha Starlink is a way to kind of do an ensemble of agents that they're all playing in a league match. Like people play on basinet, right? They play you play against someone who does nuchal strategy and you immediately, oh, my God, I want to try it. I want to play again.
And this to me was another critical part of the of the of the problem, which was can we create a battlement for agents?
Yeah, that's kind of what the officers are fascinating and where they stick to their different strategies.
Yeah. Wow. That's that's really, really interesting. So but that said, you were fortunate enough or just skilled enough to win five zero. And so how hard is it to win? I mean, that's not the goal, I guess. I don't know what the goal is.
The goal should be to win a majority, not five zero.
But how hard is it in general to win all matchups? And I want one V one.
So that's a very interesting question because. Once you see Alpha Star and superficially, you think, well, OK, one, let's give you some of the games, like 10 to one, right? It lost the game that it played with the camera interface. You might think, well, that's that's done, right. There's it's it's superhuman at the game. And that's not really the claim we really can make, actually.
The claim is we beat a professional gamer for the first time, Starcraft has really been a thing that been going on for a few years. But moment a moment like this has not had not occurred before yet. But are these agents impossible to beat? Absolutely not.
So that's a bit what's, you know, kind of the differences the agents play at grandmaster level. They are definitely understand the game enough to play extremely well. But are they unbeatable?
Do they play perfect? No.
And actually in Starcraft, because of these sneaky strategies, it's always possible that you may take a huge risk sometimes, but you might get wins right out of out of this.
So I think that as a domain, it still has a lot of opportunities, not only because, of course, we want to learn with less experience. We would like to I mean, if I if I learn to play products, I can play Terran and learn it much quicker than Alpha Star content. Right. So there are obvious, interesting research challenges as well. But even as as the raw as the raw performance goes, really the claim here can be we are at pro level or at high grandmaster level.
But obviously the players also did not know what to expect. This kind of their prior distribution was a bit off because they played this kind of new, like alien brain, as they like to say. Right. And that's what makes it exciting for them. But also, I think if you look at the games closely, you see there were weaknesses in some points, maybe Alpha started, not Skout or if it had got invisible units going against at certain points, it wouldn't have known and it would have been bad.
So there's still quite a lot of work to do, but it's really a very exciting moment for us to be seeing.
Wow. A single neural net on a GPU is actually playing against these guys who are amazing. I mean, you have to see them play in life. They're really, really amazing players.
Yeah, I'm sure there's there's there must there must be a guy in Poland somewhere right now training his butt off to make sure that this never happens again with Alpha Star. So that's really exciting in terms of Alpha Star having some holes to exploit, which is great. And then we build on top of each other and it feels like Starcraft on like go. Even if you win, it's still not. It's still not. There's so many different dimensions in which you can explore.
So that's really, really interesting. Do you think there's a ceiling to Alpha Star?
You've said that it hasn't reached you know, it's this is a big.
Well, let me actually just pause for a second.
How did it feel to to come here to this point, to to be a top professional player like that night? I mean, you know, Olympic athletes have their gold medal, right? This is your gold medal in sense. Sure. You're cited a lot. You've published a lot of prestige papers, whatever. But this is like a win. How did it feel?
I mean, it was for me, it was unbelievable because first the win itself.
I mean, it was so exciting, I mean, so looking back to those last days of twenty eighteen, really, that's when the games were played, I'm sure I look back at that moment and say, oh, my God, I want to be like in a project like that is like I already feel the nostalgia of like yeah that was huge in terms of the energy and the team effort that went into it.
And so in that sense, as soon as it happened, I already knew it was kind of I was losing it a little bit. So it is almost like sad that it happened and.
Oh, my God. But on the other hand, it also verifies the approach. But to me also, there's so many challenges and interesting aspects of intelligence that even though we can train a neural network to play at the level of the best humans, there's still so many challenges. So for me, it's also like, well, this is really an amazing achievement. But I already was also thinking about next steps. I mean, as I said, these agents play Protos versus Protos, but they should be able to play a different race much quicker.
Right. So that would be an amazing achievement. Some people call these metal reinforcement learning, metal learning and so on. Right.
So there's so many possibilities after that moment. But the moment itself, it really felt great. It's I we had these bet. So so I'm kind of a pessimist in general. So I kind of send an email to the team. I said, OK, let's again still offers like what's going to be the result. And I really thought we would lose like five zero. Right. I, I, we had some calibration made against the 5000 MMR player was much stronger than that player, even if he played Protasiewicz his off race.
But yeah, he was not imagining we would win.
So for me that was just kind of a test run or something and then it really kind of he was really surprised. And unbelievably we went to these, to these bar to celebrate and Dave tells me, well, why don't we invite someone who is a thousand MMR stronger in brothers like an actual PROTUS player like that. It ended up being Mannah. Right. And, you know, we had some drinks and I said, sure, why not?
But then I thought, well, that's really going to be impossible to beat. I mean, even because it's just so much. I had a thousand MMR is really like ninety nine percent probability that Mannar would beat Dialo as brothers versus protons. Right. So we did that. And to me the second, the second game was much more important, even though a lot of uncertainty kind of disappeared after we we kind of vitiello. I mean, he's a professional player.
So that was kind of odd. But that's really a very nice achievement. But, man, I really was at the top and you could see he played much better, but our agents got much better, too.
So and then after the first game, I said, if we take a single game, at least we can say we beat a game. I mean, even if we don't beat the series, for me, that was a huge relief. And I mean, I remember hanging Demis. And I mean, it was it was really like this moment for me will resonate forever as a researcher and I mean as a person. And yeah, it's a really, like, great accomplishment.
And it was great also to be there with the team in the room. I don't know if you saw like this.
So it was really like I mean, from my perspective, the other interesting thing is just like watching Kasparov now watching Mannah was also interesting because he is kind of a loss of words. I mean, whenever you lose I've done a lot of sports, you sometimes say excuses. You look for reasons. And he couldn't really come up with reason.
And so would offer is for Protus. You could say it was it felt awkward. It wasn't. But here it was. He was just beaten. And it was beautiful to look at a human being being superseded by the system. I mean, it's a it's a beautiful moment for for researchers. So, yeah, for sure it was. It was I mean, probably the highlight of my career so far because of its uniqueness and coolness. And I don't know.
I mean, it's obviously, as you said, you can look at paper citations and so on. But these these really is like a testament of the whole machine learning approach and using games to advance technology. I mean, it really it really was everything came together that moment that that's really the summary.
Also on the other side, it's a popularization of A.I., too, because it's just like traveling to to the moon and so on. I mean, this is where a very large community of people that don't really know where they get to really interact with it, which is very important. I mean, it's extremely important.
We must, you know, writing papers helps our peers, researchers to understand what we're doing. But I think A.I. is becoming mature enough that we must sort of try to explain what it is and perhaps through games, this is an obvious way because these games always had built. So it may be everyone's experience and I playing a video game, even if they don't know, because there's always some scripted element and some people might even call that they are already right.
So what are other applications of the approaches underlying Alpha Star that you see happening? There's a lot of echoes of you said transformer of language modeling and so on.
Have you already started thinking where the breakthroughs in Alpha Star get expand into other applications?
Right. So I thought about a few things for like kind of next month, next year.
The main thing I'm thinking about actually is what's next as a as a kind of a grand challenge. Because for me, like we've seen Atari and then there's like the the sort of three dimensional worlds that we've seen also like pretty good performance from this captured the flag agents that also some people at the mine and elsewhere are working on. We've also seen some amazing results on like, for instance, Dota two, which is also a very complicated game. So for me, like the main thing I'm thinking about is what's next in terms of challenge.
So as a researcher, I see sort of two tensions between research and then applications or areas or domains where you apply them.
So on the one hand, we've done thanks to the application of Starcraft is very hard. We developed some techniques, some new research that now we could look at elsewhere like are there other applications where we can apply these and the obvious ones?
Absolutely. You can think of feeding back to sort of the community we took from, which was mostly sequence modeling or natural language processing. So we've developed and extended things from the transformer and we use pointer networks. We combine steam and transformers in interesting ways. So that's perhaps the kind of lowest hanging fruit of feeding back to now different fields of machine learning that's not playing video games.
Let me go old school and jump to the to Mr. Alan Turing. Yeah. So the Turing test. You know, it's there's a natural language, just a conversational test. What's your thought of it as a test for intelligence? Do you think it is a grand challenge as worthy of undertaking? Maybe. If it is, would you reformulate it or phrases somehow differently?
So I really love the Turing Test because I also like sequences and language understanding and in fact, some of the early work we did in machine translation, we tried to apply to apply to kind of a neural chatbot, which obviously would never pass the Turing test because it was very limited. But it is a very fascinating, fascinating idea that you could really have an AI that would be indistinguishable from humans in terms of asking or conversing with it. Right.
So I think the test itself seems very nice and it's kind of well defined, actually, like the the passing it or not, I think there's quite a few rules that feel like pretty simple.
And, you know, you could you could really like have I mean, I think they have these competitions every year. Yeah. So the Lebanon prize. But I don't know if you've seen. I don't know if you've seen the kind of bots that emerge from that competition.
They're not quite as what you would think.
It feels like that there's weaknesses with the way Turing formulated. It needs to be that the definition of a genuine, rich, fulfilling human conversation needs to be something else, like the Aleksa Prize, which I'm not as well familiar with, has tried to define that more, I think, by saying you have to continue keeping a conversation for 30 minutes, something like that.
So basically forcing the agent not to just fool, but to have an engaging conversation kind of thing is that I mean, is this have you thought about this problem originally?
And if if you have in general, how far away are we from you worked a lot on language understanding language generation.
But the full dialogue, the conversation. You know, just sitting at the bar, having a couple of beers for an hour, that kind of conversation. Have you thought about. Yeah. So I think you touched here on the critical point, which is feasibility. Right. So so there's there's a great sort of sabai coming, which describes sort of grand challenges of physics. And he argues that, well, OK, for instance, teleportation or time travel, our great grand challenges of physics, but there's no attacks.
We really don't know or cannot kind of make any progress. So that's why most physicists and so on, they don't work on these in their Ph.D. and as part of their careers.
So I see the Turing test as in the full Turing test as a bit still too early like I am.
I think we're especially with the current trend of deep learning language models. We've seen some amazing examples, I think, to being the most recent one, which is very impressive. But to understand. To fully solve. Passing or fooling a human to think that that there's a human on the other side? I think we're quite far. So as a result, I don't see myself and I probably would not recommend people doing a study on solving the Turing test because it just feels it's kind of too early or too hard of a problem.
Yeah, but that said, you said the exact same thing about Starcraft, but a few years ago. So to them. So you'll probably also be the person who passes the Turing Test in three years.
I mean, I think I think the. Yeah, so so we have this on record. This is nice. True. I mean, the it's true that progress sometimes is a bit unpredictable.
I really wouldn't have not even six months ago, I would not have predicted the level that we see that these agents can deliver at grandmaster level.
But I have worked on language enough. And basically my concern is not that something could happen, a breakthrough could happen that would bring us to solving or passing the Turing test is that I just think the statistical approach to it like this is not is not going to cut it.
So we need we need a breakthrough, which is great for the community. But given that, I, I think there's quite a bit more uncertainty, whereas for Starcraft, I knew what the steps would be to kind of get us there. I think it was clear that using the mutation learning part and then using these botnet, four agents were going to be key. And and it turned out that this was the case and a little more was needed, but not much more for Turing Test.
I just don't know what the plan or execution plan would look like. So that's why I'm I myself working on it as a as a grand challenge is hard. But there are quite a few challenges that are related that you could say, well, I mean, what if you create a great assistant like Google already has, like the Google assistant? So can we make it better and can we make it fully neutral and so on that I start to believe maybe we're reaching a point where we should attempt these challenges.
Like I like this conversation so much because it echoes very much to start our conversation. It's exactly how you approach Starcraft. Let's break it down into small pieces of those and you end up solving the whole game.
Great. But that said, you're behind some of the biggest pieces of work and deep learning in the last several years. So you mentioned some limits. What do you think of the current limits of deep learning and how do we overcome those limits?
So if I had to actually use a single word to defined the main challenge in deep learning is a challenge that probably has been the challenge for many years and is that of generalization. So what that means is that. All that we are doing is feeding functions to data, and when the data we see is not from the same distribution or even if they're sometimes that, it is very close to distribution, but because of the way we train it with limited samples, we then get to this stage where we just don't see generalisation as much as we can generalize.
And I think adversarial examples are a clear example of this. But if you study machine learning and literature and you know, the reason why as VMS came very popular were because they were dealing and they had some guarantees about generalization, which is Anacin data or out of distribution or even within distribution, where you take an image adding a bit of noise, these models fail.
So I think. Really, I don't see a lot of progress on generalization in the strong generalization sense of the word, I think our neural neural networks, you can always find design examples that will make their outputs arbitrary, which is which which is not good, because we humans will never be fooled by these kind of images or manipulation of the image. And if you look at the mathematics, you kind of understand this is a bunch of matrices multiplied together.
There's probably numerics and instability that you can just find corner cases. So I think that's really the underlying topic. Many times we see when even even at the grand stage of like Turing test generalization, I mean, if you if you start. I mean, passing the Turing test, should you should it be in English or should it be in any language? Right. I mean, as a human, if you could, you could if you ask something in a different language, you actually will go and do some research and try to translate it and so on.
Should the Turing test included include that?
Right. And it's really a difficult problem and very fascinating and very mysterious, actually.
Yeah, absolutely. But do you think it's if you were to try to solve it? Can you not grow the size of data intelligently in such a way that the distribution of your training set does include the entirety of the testing that I think is that one path.
The other path is a totally new methodology, not statistical.
So a path that has worked well and it worked well in Starcraft, in machine translation and in languages, scaling up the data and the model. And that's kind of been maybe the only single formula that the EU delivers today in deep learning. Right? It's that scale data scale and model scale really do more and more of the things that we thought. Oh, there's no way we can generalize to these or there's no way we can generalize to that. But I don't think fundamentally it will be solved with this.
And for instance, I'm really liking some style or approach that would not only have neural networks, but it would have programs or some discrete decision making, because there is what I feel there's a bit more like like I mean, the example of the best example, I think, for understanding these is I also work to be done like we can learn an algorithm with a neural network. Right. So you give you many examples and it is going to sort the input numbers or something like that.
But really strong generalization is you give me some numbers or you asked me to create an algorithm that sort of numbers and instead of creating a neural net, which will be fragile because it's going to go out of range, at some point, you're going to give you numbers that are too large, too small and whatnot. You just if you just create a piece of code that sorts the numbers, then you can prove that that will generalize to absolutely all the possible inputs you could give.
So I think that's the problem comes with some exciting prospects. I mean, scale is a bit more boring, but it really works. And then maybe programs and discrete abstractions are a bit less developed. But clearly, I think they're quite exciting in terms of future for the field.
Do you draw any insight, wisdom from the eighties and expert systems and symbolic systems of all computing? Do you ever go back to those sort of reasoning, that kind of logic? Do you think that might make a comeback? You have to dust off those books.
Yeah, I. I actually love.
Actually adding more inductive biases to me, the problem really is what are you trying to solve if what you're trying to solve is so important that try to solve it no matter what, then absolutely use rules, use domain knowledge, and then use a bit of the magic of machine learning to empower to make the system as the best system that will detect cancer or or detect weather patterns.
Right. Or in terms of Starkloff, it also was a very big challenge. So I was definitely happy that if we had to get cattycorner here and there, it could have been interesting to do. And in fact, instead of we we start thinking about expert systems because it's a very you know, you can differ. I mean, people actually build staggered boards by thinking about those principles, you know, state machines and rule-based. And then you could you could think of combining a bit of a rule based system, but that has also neural networks incorporated to make it generalisable better.
So absolutely. I mean, we should we should definitely go back to those ideas and anything that makes the problem simpler. As long as your problem is important, that's OK. And that's research driving a very important problem. And on the other hand, if you want to really focus on the limits of reinforcement learning, then of course, you must try not to look at imitation data or to use to look like for some rules of the domain that would help a lot or even feature engineering.
So these these are tension that depending on what you do, I think both both ways are definitely fine. And I would never not do one or the other if you're as long as what you're doing is important and needs to be solved. Right.
So there's a bunch of different ideas that that you developed that I really enjoy.
But one one is translating from some image captioning translating image to text just just another just beautiful, beautiful idea, I think that resonates throughout your work, actually, sort of the underlying nature of reality being language.
Yes, somehow. So what's the connection between images and text or rather the visual world and the world of language, in your view?
Right. So I think there's a piece of research that's been central to, I would say, even extending into Starcraft is this idea of sequence to sequence learning which what we really meant by that is that you can you can now really input anything to a neural network as the input X and then the neural network will learn a function F that will take X as an input and produce any output Y and these X and Y don't need to be like static or like a feature, like a sink, like a fixed vectors or anything like that.
It could be really sequences are now beyond like data structures. Right.
So that paradigm. Was tested in a very interesting way when we moved from translating French to English to translating an image to its caption, but the beauty, the beauty of it is that really and that's actually how it happened. I ran a change, a line of code in this thing that was doing machine translation. And I came the next day and I saw how it was producing captions that seemed like, oh, my God, this is really, really working.
And the principle is the same. Right. So I think. I don't see text vision speech waveforms as something different, even as long as you basically learn a function that will. Vectorized, you know, these into and then after we vectorized it, we can then use Transformer's LSD, whatever the flavor of the month of the model is, and then as long as we have enough supervised data, really these formula will work and we'll keep working, I believe, to some extent model of these generalization issues that I mentioned before.
But the testers to vectorized sort of formal representation as meaningful thing and your intuition now, having worked with all this media, is that once you are able to form that representation, you can basically take any things, any sequence is there.
Going back to Starcraft is there are limits on the length. So we didn't really touch on the long term aspect. How did you overcome the whole really long term aspect of things here?
Is there some tricks or is all the main trick?
So Starcraft, if you look at absolutely every frame, you might think it's quite a long game. So we would have to multiply 22 times. Sixty seconds per minute times, maybe at least 10 minutes per game on average, so there are quite a few frames, but the trick really was to only observe, in fact, which may be seen as a limitation, but it is also a computational advantage.
Only observe when you act. And then what the neural network decides is what is the gap going to be until the next action?
And if you look at most Starcraft games that we have in the in the data set that Blizz provided, it turns out that most games are actually only I mean, it is still a long sequence, but it's maybe like a thousand to fifteen hundred actions, which if you start looking at the larger systems, transformers, it's it's not like it's not that that difficult, especially if you have supervised learning, if you had to do it with reinforcement learning the credit assignment problem, what is it that in this game that made you win?
That would be really difficult. But thankfully, because of imitation learning, we didn't kind of have to deal with these directly, although if we had to, we tried it. And what happened is you just take all your workers and attack with them. And that sort of is kind of obvious in retrospect, because you start trying random actions. One of the actions will be a worker that goes to the enemy base. And because itself, it's not going to know how to defend because it basically doesn't know almost anything.
And eventually what you develop is these take on workers and attack, because the great assignment issue in Iraq is really, really hard. I do believe we could do better. And that's maybe a research challenge for the future. But yeah, even even in Starcraft, the sequences are maybe a thousand, which I believe there is within the realm of what Transformers can do. Yeah, I guess the difference between Starcraft and Go is Engo and chess stuff starts happening right away.
Right. So there's not. Yeah, it's pretty easy to self play, not easy, but to sell plays possible to develop reasonable strategies quickly as opposed to Starcraft, I mean Engo. There's only four hundred actions, but one action is what people would call the got action. That would be if you had expanded the whole search street.
That's the best action if you did Cinemax or whatever algorithm you would do if you had the computational capacity. But in Starcraft, the 400 is minuscule, like 400. You don't even like you couldn't even click on the pixels around a unit. Right. So I think the problem there is in terms of action, space size is way harder. So does search is impossible. So there's quite a few challenges indeed that make this kind of a step step up in terms of machine learning for humans.
Maybe they playing Starcraft seems more intuitive because it looks real. I mean, like the graphics and everything moves smoothly, whereas I don't know how to. I mean, go is a game that I wouldn't really need to study. It feels quite complicated. But for machines, kind of maybe. Yes.
Which shows you the gap actually between deep learning and however the hacker brains work.
So you developed a lot of really interesting ideas. It's interesting to just ask, what's the what's your process of developing new ideas? Do you like brainstorming with others? Do you like thinking alone? Do you like like it was in Goodfellas that you came up with Ganz after a few beers? He thinks beers are essential to come up with new ideas. We had beers to decide to play another game game of Starcraft after a week. So it's really similar to that story.
Actually, I explain this in a in a mine retreat. And I said this is the same as the gang story. I mean, we were in a bar and we decided, let's play again next week. And that's what happened. I feel like we're getting the wrong message to young undergrads. Yeah, no, but in general, like, yeah. You like brainstorming. Do you like thinking along, working stuff out.
And so I think I think throughout the years also things change. Right. So initially I was. Very fortunate to be with great minds like Geoff Hinton, Jeff Dean, aliased Scheuber, I was really fortunate to join Brain at the very good time. So at that point, ideas, I was just kind of brainstorming with my colleagues and learned a lot and keep learning is actually something you should never stop doing.
Right. So learning implies reading papers and also discussing ideas with others. It's very hard at some point to not communicate that being reading a paper from someone or actually discussing. Right.
So definitely that communication aspect needs to be there, whether it's written or oral nowadays.
I'm also trying to be a bit more strategic about what research to do. So I was describing a little bit this sort of tension between research for the sake of research. And then you have, on the other hand, applications that can drive the research. Right. And honestly, the formula that has worked best for me is just find a heart problem and then try to see how research fits into it, how it doesn't fit into it, and then you must innovate.
So I think machine translation drove sequence to sequence then maybe like learning algorithms that had to like combinatorial algorithms, led 2.0 networks. Starcraft led to really scaling up imitation learning and the Alpha Star League. So that's been a formula that I personally like. But the other one is also valid and I seen it succeed. A lot of the times where you just want to investigate model based RL as as a kind of a research topic, and then you must then start to think, well, how are the tests?
How are you going to test these ideas? You need to kind of a minimal environment to try things. You need to read a lot of papers and so on. And that's also very fun to do. And something I've also done quite a few times, both at brain, at the mine and obviously as APHC. So so I think besides the the ideas and discussions, I think it's important also because you start sort of guiding not only your own goals but other people's goals to the next breakthrough.
So you must really kind of understand these, you know, feasibility also, as we were discussing before.
Right. Whether whether this domain is ready to be tackled or not. And you don't want to be too early. You obviously don't want to be too late. So it's it's really interesting. These are strategic component of research, which I think as a grad student, I just had no idea. You know, I just read papers and discussed ideas. And I think this has been maybe the major change. And I recommend people kind of fit forward to success, how it looks like and try to backtrack other than just kind of looking out.
These looks cool, these looks cool. And then you do a bit of random work with sometimes you stumble upon some interesting things, but in general, it's also good to plan a bit. Yeah, like it, especially like your approach of taking a really hard problem, stepping right in and then being super skeptical about being able to solve the problem.
I mean, there's a balance of both, right?
There's a silly optimism and a and a critical sort of skepticism that's good to balance, which is why it's good to have a team of people. Exactly. That's the balance that you don't do that on your own.
You have both mentors that have seen or you obviously want to chat and discuss whether it's the right time. I mean, Demis came in 2014 and he said maybe in a bit we'll do Starcraft and maybe he knew. And that's and I'm just following his lead, which is great because he's he's brilliant. Right. So these these things are obviously quite important that you want to be surrounded by people who, you know, are diverse. They have their knowledge.
There's also important to I mean, I've learned a lot from people who actually have an idea that I might not think it's good, but if I give them the space to try it, I've been proven wrong many, many times as well.
So that's that's great.
I think it's your colleagues are more important than yourself.
I think so. Now, let's real quick talk about another impossible problem. Ajai, right.
What do you think it takes to build a system that's human level intelligence? We talked a little bit about the Turing test. After all, these have echoes of general intelligence. But if you think about just something that you would sit back and say, wow, this is really something that resembles human level intelligence. What do you think it takes to build it?
So I find that ajai oftentimes is maybe not very well defined. So what I'm trying to then come up with for myself is what would be a result look like that you would start to believe that you would have agents or neural nets that no longer sort of overfeed to a single task.
But actually kind of learn the skill of learning, so to speak.
And that actually is a field that I am fascinated by, which is the learning to learn or metal learning, which is about no longer learning about the single domain. So you can think about the learning algorithm itself is general. Right. So the same formula we applied for Alpha Star or Starcraft, we can now apply to kind of almost any video game or you could apply to many other problems and domains. But the algorithm is what kind of generalizing? But the neural network, the weights, those weights are useless.
Even to play another race. They train a network to play very well at Protus versus Brothers. I need to throw away those weights if I want to play now Terryn versus Theran, I would need to retrain and network from scratch with the same algorithm. That's beautiful, but the network itself would not be useful. So I think when I if I see an approach that can observe or start solving new problems without the need to kind of restart the process, I think that to me would be a nice way to define some form of ajai.
Again, I don't know the grandiose like I mean, so it should be saw before ajai. I mean, I don't know. I think I think concretely I would like to see clearly that metal learning happen, meaning that there is an architecture or a network that as it sees the new problem or new data, it solves it.
And to make it kind of a benchmark, it should solve it at the same speed that we do solve new problems. When I define your new object and you have to recognize it when I when you start playing a new game, you played all the three games, but now you play a new Atari game. Well, you you're going to be pretty quickly pretty good at the game. So that's perhaps what's the domain and what's the exact benchmark is a bit difficult.
I think as a community, we may need to do some work to define it. But I think this first step, I could see it happen relatively soon, but then the whole what I mean, that's when I am a bit more confused about what I think people mean, different things. And there's an emotional psychological level.
That even the Turing test passing the Turing test is something that we just passed judgment on human beings, what it means to be, you know, is is the dog in a najai system? Yeah, like what level?
What does it mean? Right. Yeah, what does it mean? But I like the generalization.
And maybe as a community we converge towards a group of domains that are sufficiently far away that will be really damn impressive if we're able to generalize some perhaps not as close as protests and ZAGG, but like we could be a good start and then a really good step. But then then like from Starcraft to Wikipedia. Yeah. And back. Yeah. That kind of thing.
And that that feels also quite hard and far. But I think there's as long as you put the benchmark out, as we discovered, for instance, with imagine that then tremendous progress can be had. So I think maybe there's a lack of benchmark, but I'm sure we'll find one.
The community will then work towards that and then beyond what Ajai might mean or would imply. I really am hopeful to see basically machine learning or A.I. just scaling up and helping, you know, people that might not have the resources to hire an assistant or that they might not even know what the weather is like.
But, you know, so so I think there's in terms of the impact, the positive impact of I think that's maybe what we should also not lose focus.
The Research Community Building Adjei. I mean, that's a real nice goal. But I, I think the way that Demaine puts it is and then use it to solve everything else. Right. So I think we should realize, yeah.
We shouldn't forget about all the positive things that are actually coming out of are already and are now going to be coming out us. But that and that. No, let me ask the relative, the popular perception, do you have any worry about the existential threat of artificial intelligence in the near or far future that some people have?
I think in the near future I'm I'm skeptical. So I hope I'm not wrong. But I'm I'm not concerned. But I appreciate efforts, ongoing efforts and even like whole research field on a safety emerging and in conferences and so on.
I think that's great in the long term. I really hope we just can simply have the benefits outweigh the potential dangers. I am hopeful for that. But also we must remain vigilant to kind of monitor and assess whether the tradeoffs are there. And we have enough also lead time to prevent or to redirect our efforts if need be. Right.
So but I'm quite I'm quite optimistic about the technology and definitely more fearful of other threats in terms of planetary level at this point. But obviously, that's the one I kind of have more like power on. So clearly I do start thinking more and more about this. And it's kind of it's grown in me, actually, to to start reading more about safety, which is a field that so far I have not really contributed to. But maybe there's something to be done then as well.
I think it's really important. You know, I talk about the few folks, but it's important to ask you and shove it in your head because you're at the leading edge of actually what people are excited about now. I mean, the work with it, it's arguably at the very cutting edge of the kind of thing that people are afraid of. And so you speaking to that fact and that we're actually quite far away to the kind of thing that people might be afraid of, but it's still worthwhile to think about.
And it's also good that you're that you're not as worried and you're also open to.
Yeah, I mean, there's two aspects. I mean, me not being worried, but obviously we should prepare for the worst for it. Right. For for like for for things that could go wrong. Misuse of the technologies as with any technologies. Right. So I think. There's there's always trade offs, and I as a society, we've kind of solved these to some extent in the past, so I'm hoping that. By having the researchers and the whole community brainstorm and come up with interesting solutions to the new things that will happen in the future, that we can still also push the research to the avenue that I think is kind of the greatest avenue, which is.
To understand intelligence, right, how are we doing what we're doing and obviously from a scientific standpoint, that is kind of the drive my personal driver of all the time they spend doing what I'm doing. Really?
What do you see the deep learning as a field heading? What do you think the next big, big breakthrough might be?
So I think deep learning I discussed a little of these before. Deep learning has to be combined with some form of this great nation program synthesis. I think that's kind of as a research in itself is an interesting topic, to expand and start doing more research and then as kind of what will they be learning and able to do in the future?
I don't think that's going to be what's going to happen this year. But also this idea of starting not to throw away all the ways that this idea of learning to learn and really having these agents not having to restart their ways. And you can have an agent that is kind of solving or classifying images on Usenet, but also generating speech, if you ask it, to generate some speech. And it should really be kind of almost the same network. But it may not be a neural network.
It might be a neural network with a optimization algorithm attached to it. But I think this idea of generalization to new task is something that we first must define good benchmarks. But then I think that's going to be exciting and I'm not sure how close we are. But I think there's if you have a very limited domain, I think we can start doing some progress.
And much like how we did a lot of programs in computer vision, we should start thinking, am I really like a dog that gave that to gave an e-mail a few years ago, which is this training test paradigm should be broken. We we should stop thinking about a training test at a training set and test set. And these are closed things that are untouchable. I think we should go beyond these. And in metal learning, we call these the metal training set.
And the method that said which is really thinking about if I know about imaging it, why would that network not work on Monist, which is a much simpler problem. But right now it really doesn't know. But it just feels wrong. Right.
So I think that's kind of the there's the on the application or the benchmark sites. We we probably will see quite a few more interest and progress and hopefully people defining new and exciting changes. Really.
Do you have any hope or interest in knowledge when within this context it's kind of totally constructing? So going back perhaps. Yeah, well, neural networks and graphs, but I mean, a different kind of knowledge graph, sort of like semantic graphs or there's concepts. Yeah.
So I think, I think the idea of graphs is so I've been quite interested in sequences first and then more interesting or different data structures like graphs. And I've studied graph neural networks in the last three years or so. I found these models just very interesting from like deep learning sites standpoint. But then how how what do we want? Why do we want these models and why would we use them? What's the application? What's kind of the killer application of graphs and perhaps.
If we could extract a knowledge graph from Wikipedia automatically right there, that would be interesting, because then these graphs have this very interesting structure that also is a bit more compatible with this idea of programs and deep learning, kind of working together, the jumping neighborhoods and so on. You could imagine the finding some primitives to to go around graphs, right.
So I think I really like the idea of a knowledge graph. And in fact, when we we started our you know, as part of the research we did for Starcraft, I thought, wouldn't it be cool to give the graph of, you know, all the like this, all these buildings that depend on each other and units that have prerequisites of being built by that. And so. This is information that the network can learn and extract, but it would have been great to see or to think of really off as a giant graph that even also as the game evolves, you kind of start taking branches and so on.
And we try we bit of research on these, nothing to relevant. But I really like the idea.
And it has elements that are which something you also worked with in terms of visualizing your networks as elements of having human interpretable, being able to generate knowledge, representations that are human interpretable, that maybe human experts can then tweak or at least understand. So there's a lot of interesting aspect there. And for me personally, I'm just a huge fan of Wikipedia, and it's a shame that our neural networks aren't taking advantage of all the structured knowledge that's on the Web.
What's next for for you? What's next for the mind? What are you excited about? What for Alpha Star?
Yeah. So I think the obvious next steps would be to.
Apply Alpha Star to other races. I mean, that's sort of shows that the algorithm works because we wouldn't want to have created by mistake something in the architecture that happens to work for protons, but not for other races. Right. So as verification, I think that's an obvious next step that we are working on. And then I would like to see. So agents and players kind of specialize on different skill sets that allow them to be very good. I think we've seen our star understanding very well when to take bottles and when to not to do that.
Do that also very good at micromanagement and moving the units around and so on, and also very good at producing non-stop and trading of economy with building units.
But I have not perhaps seen as much as I would like this idea of the poker idea that you mentioned. Right. I'm not sure Starcraft or Alpha Star, rather, has developed a very deep understanding of what the opponent is doing and reacting to that and sort of trying to to to trick the player to do something else or that, you know, this kind of reasoning I would like to see more. So I think purely from a research standpoint, there's perhaps also quite a few a few things to be done there in the domain of Starcraft.
Yeah, in the domain of games. I've seen some interesting work and sort of and even auctions manipulating other players, sort of forming a belief net and just messing with people.
Yeah, it's called Theory of Mind. Yeah, yeah, yeah. It's a fascinating theory of mine on Starcraft is kind of they're really made for each other. Yeah. So that would be very exciting to see those techniques applied to Starcraft or perhaps Starcraft driving new techniques. As as I said, this is always the tension between the two.
Well, oil, thank you so much for talking to the awesome.
It was great to be here. Thanks.