Transcript of Pieter Abbeel: Deep ...

[00:00:00]

The following is a conversation with Peter Abele. He's a professor at UC Berkeley and the director of the Berkeley Robotics Learning Lab. He's one of the top researchers in the world working on how to make robots understand and interact with the world around them, especially using imitation and deep reinforcement learning. This conversation is part of the, of course, and artificial general intelligence and the artificial intelligence podcast. If you enjoy it, please subscribe on YouTube, iTunes or your podcast provider of choice, or simply connect with me on Twitter at Lex Friedman spelled F our ID.

[00:00:36]

And now here's my conversation with Peter Abele.

[00:00:58]

You've mentioned that if there was one person you could meet, it would be Roger Federer. Let me ask, when do you think we'll have a robot that fully autonomously can beat Roger Federer at tennis? Roger Federer, level player tennis?

[00:01:14]

Well, first, if he can make it happen for me to meet Roger, let me know. Know of getting a robot to beat him at tennis is kind of an interesting question because. For a lot of the challenges we think about in AI, the software is really the missing piece, but for something like this, the hardware is nowhere near either like to really have a robot that can physically run around the Boston Dynamics. Robots are starting to get there, but still not really human level ability to run around and then swing a racket.

[00:01:53]

So you think that's a hardware problem? I don't think it's a hardware problem only. I think it's a hardware and software problem. I think it's both. And I think they'll they'll have independent progress. I'd say the hardware maybe in 10, 15 years. And this is not grass, I mean, grass sliding. Yeah, I'll the plague, I'm not sure what's Carter grass or clay. The clay involves sliding, which might be harder to master actually here.

[00:02:22]

But you're not limited to by people, I mean, I'm sure they're going can build a machine. It's a whole different question. Of course, if you can if you can say, OK, this robot can be on wheels, it can move around on wheels and can be designed differently, then I think. That can be done sooner, probably than a full humanoid type of setup. What do you think? Swing Iraq? So you've worked at basic manipulation.

[00:02:48]

How hard do you think is the task of swinging Iraq? It would be able to hit a nice backhand or a forehand. Let's say let's say we just set up stationary and a nice robot arm, let's say, you know, standard industrial arm and you can watch the ball come in and swing the racket. It's a good question, I'm not sure it would be. Super hard to do, I mean, I'm sure it would require a lot if we do it with reinforced money, would require a lot of trial and error, it's not going to swing it right the first time around, but.

[00:03:21]

Yeah, I don't I don't see why I couldn't see things the right way. I think it's learnable. I think if you set up a ball machine, let's say, on one side and then a robot with a tennis racket on the other side, I think it's learnable and maybe a little bit of pre training and simulation. Yeah, I think that's I think that's feasible. I think I think the swing the racket is feasible. Would be very interesting to see how much precision it can get.

[00:03:48]

Because, I mean, that's. That's where I mean, some of the human players can hit it on the lines, which is very high precision with spin, the spin is an interesting weather. RL can learn to put a spin on the ball. Well, you got me interested. Maybe someday we'll set. Sure. But the answer is basically OK for this problem.

[00:04:10]

It sounds fascinating, but for the general problem of a tennis player who might be a little bit farther away, what's the most impressive thing you've seen a robot do in the physical world?

[00:04:20]

So physically, for me, it's. The Boston Dynamics videos always just bring home, and I'm just super impressed. Recently, the robot running up the stairs during the parkour type thing, I mean, yes, we don't know what's underneath. They don't really ride a lot of detail. But even if it's hardcoded underneath, which you might or might not be, just the physical abilities of doing that parkour, that's a very impressive.

[00:04:49]

So have you met sport many or any of those robots in person sport? Many. Last hearing in April at the Mars event that Jeff Bezos organizes, they brought it out there and it was nicely falling around. Jeff, when Jeff left the room, they had to follow him along, which is pretty impressive.

[00:05:08]

So I think there's some confidence to know that there's no learning going on on those robots, the psychology of it. So while knowing that while knowing there's not if there's any learning going on, it's very limited. I met so many earlier this year.

[00:05:23]

And knowing everything that's going on, having one on one interactions, I got to spend some time alone and there's immediately a deep connection on the psychological level, even though you know the fundamentals, how it works, there's something magical. So do you think about the psychology of interacting with robots in the physical world? Even you just showed me the part to the robot and there was a little bit something like a face had a little bit something like a face or something that immediately draws you to it.

[00:05:57]

Do you think about that aspect of of the robotics problem? Well, it's very hard with Brett here. Give him a name. Berkely robot for the elimination of tedious task is very hard to not think of.

[00:06:11]

The robot as a person, and it seems like everybody calls him a he for whatever reason, but that also makes it more a person than if it was an it. And it's it seems pretty natural to think of it that way, this past weekend really struck me. I've seen Pepper many times on. On videos, but then I was at an event organized by this was by Fidelity, and they had scripted Peper to help moderate some sessions and scripted Peper to have the personality of a child a little bit.

[00:06:43]

And it was very hard to not think of it as its own person in some sense, because it was just going to jump and it would just jump into conversation like in there and try to moderate with the same people would just jump in. Hold on. How about me? Can I participate in this doing it? Just like I said, this is like like a person and that was 100 percent scripted. And even then it was hard not to have that sense of somehow there is something there.

[00:07:07]

So as we have robots interact in this physical world, is that a signal that could be used in reinforcement learning? You've you've worked a little bit in this direction, but do you think that that psychology can be somehow pulled in now?

[00:07:21]

That's a question I would say a lot. A lot of people ask, and I think part of why they ask it is they're thinking about. How unique are we really still as people like after they see some results, they see a computer go to, say, a computer to this, that they're like, OK, but can it really have emotion? Can it really interact with us in that way? And then once you're around robots, you already start feeling it.

[00:07:46]

And I think the kind of maybe mythologically, the way I think of it is. If you run something like reinforcement learning about optimizing some objective and. There was no reason that the. Objective couldn't be tied into how much does a person like interacting with this system and why could not the reinforcement learning system optimized for the robot being fun to be around? And why wouldn't it then naturally become more and more interactive and more and more maybe like a person or like a pet?

[00:08:20]

I don't know what exactly, but more and more have those features and acquire them automatically.

[00:08:25]

As long as you can formalize an objective of what it means to like something, what how you exhibit, what's the ground truth? How do you how do you get the reward from human? Because you have to somehow collect that information or a human. But you're saying if you can formulate as an objective, it can be learned.

[00:08:44]

There is no reason it cannot emerge through learning. And maybe one way to formulate as an objective, you wouldn't have to necessarily score it explicitly. So standard rewards are numbers and numbers are hard to come by. This is a one point five or ten point seven on some scale. It's very hard to do for a person, but much easier is for a person to say, OK, well, you did. The last five minutes was much nicer than we did the previous five minutes.

[00:09:07]

And that now gives a comparison. And in fact, there has been some results in that. For example, Paul Cristiana and collaborators at opening, I had the Hopper, Musyoka, Hopper, one legged robot, the back flips purely from feedback. I like this better than that. That's kind of equally good. And after a bunch of interactions, it figured out what it was the person was asking for, namely a back flip. And so I think the same thing.

[00:09:32]

I really wasn't trying to do a back flip.

[00:09:35]

It was just getting a score from the comparison score from the person based on the person having in mind in their own mind what I wanted to do a backflip. But the robot didn't know what it was supposed to be doing. It just knew that sometimes the person said, this is better, this is worse. And then the robot figure it out. What the person was actually after was a backflip. And I imagine the same would be true for things like more interactive robots that the robot would figure out over time.

[00:10:01]

Oh, this kind of thing apparently is appreciated more than this other kind of thing.

[00:10:07]

So when I first picked up Suttons Richardson's reinforcement learning book. Before sort of this deep learning, before the reemergence of neural networks as a powerful mechanism for machine learning, RL seemed to me like magic. It was as beautiful. So that seemed like what intelligence is R-AL reinforcement learning.

[00:10:32]

So how do you think we can possibly learn anything about the world when the reward for the actions is delayed? So sparse. Like where is why do you think RL works? Why do you think you can learn anything under such sparse awards, whether it's regular reinforcement learning, a deep reinforcement, learning, it's your intuition. The counterpart of that is why, Admiral, why does it need so many samples, so many experiences to learn from? Because really what's happening is when you have a sparse reward, you do something maybe for like, I don't know, you take 100 actions and then you get a reward or maybe a bit like a score of three.

[00:11:16]

And I'm like, OK, three. Not sure what that means. You go again and I get two. And now you know that that sequence of hundred actions that you did the second time around somehow was worse than the sequence of actions you did the first time around. But that's enough to I know which one of those are better or worse. Some might have been good and bad in either one. And so that's why I need so many experience.

[00:11:36]

But once you have enough experience effectively RW using that apart, it's time to say, OK, when what is consistently there, when you get a higher reward and less consistently there, we can get a lower reward. And then kind of the magic of something as the policy grant update is to say. Now let's update the neural network to make the actions that we're kind of present when things are good more likely, and make the actions that are present when things are not as good, less likely.

[00:12:01]

So that's that is the counterpoint. But it seems like you would need to run it a lot more than you do, even though right now people could say that RL is very inefficient, but it seems to be way more efficient than one would imagine on paper that the simple updates to the policy, the policy grading that somehow you can learn is exactly as just said, what are the common actions that seem to produce some good results that that somehow can learn anything?

[00:12:29]

It seems counterintuitive, at least. Is there some intuition behind it? So I think. There's a few ways to think about this, the way I tend to think about it mostly originally when so when we started working on deep reinforcement learning here at Berkeley, which is maybe 2011, 12, 13, around that time, Transhuman was a student initially kind of driving it forward here.

[00:12:56]

And. The way we thought about it at the time was if you think about rectified linear units or kind of rectifier type networks, um, what do you get? You get something that's piecewise linear feedback control. And if you look at the literature, linear feedback control is extremely successful, can solve many, many problems surprisingly well. I remember, for example, when we did helicopter flight, if you're in a stationary flight regime, not a. Now, sticking with a stationary flight resume like hover, you can use linear feedback control to stabilize the helicopter, very complex dynamical system, but the controller is relatively simple.

[00:13:35]

And so I think that's a big part of is that if you do feedback control, even though the system you control can be very, very complex, often relatively simple, control architectures can already do a lot. But then also just Lenar is not good enough. And so one way you can think of these networks is that in some detail, the space which people were already trying to do more by hand or with finite state machines, say this linear controller here, this linear controller here, the whole network learns to tell the linear controller here, another linear controller here.

[00:14:05]

But it's more subtle than that. Yeah. And so it's benefiting from this linear control aspect. It's benefiting from the tiling, but it's somehow telling it one dimension at a time, because if let's say you have a two layer network, even the hidden layer, you make a transition from active to inactive or the other way around. That is essentially one axis, but not axis aligned, but one direction that you change. And so you have this kind of very gradual tiling of the space.

[00:14:32]

We have a lot of sharing between the linear controllers that tile the space. And that was always my intuition as why to expect that this might work pretty well. It's essentially leveraging the fact that linear feedback control is so good, but of course, not enough. And this is a gradual tiling of the space with linear feedback controls that share a lot of expertise across them.

[00:14:53]

So that that's that's really nice intuition. But do you think that scales to the more and more general problems of when you start going up, that the number of control dimensions when you start going? Down in terms of how often you get a clean reward signal, does that intuition carry forward to those crazy or weird of worlds that we think of as the real world?

[00:15:20]

So. I think where things get really tricky in the real world compared to the things we've looked at so far with great success and reinforcement learning is. The time scales, which takes us to an extreme, so. When you think about the real world, I mean, I don't know, maybe some student decided to do a PhD here, right. OK, that's that's a decision. That's a very high level decision. But if you think about their lives, I mean, any person's life, it's a sequence of muscle fiber contractions and relaxations.

[00:15:55]

And that's how you interact with the world. And that's a very high frequency control thing. But it's ultimately what you do and how you affect the world until I guess we have brain readings and you can maybe do it differently. But typically, that's how you affect the world. And the decision of doing a Ph.D. is like so abstract relative to what you're actually doing in the world. And I think that's where credit assignment becomes just completely beyond what any current algorithm can do.

[00:16:23]

And we need hierarchical reasoning at a level that is just not available at all yet.

[00:16:29]

Where do you think we can pick up hierarchical reasoning? By which mechanisms? Yeah, so maybe let me highlight what I think the limitations are of what already was done. Twenty, thirty years ago, in fact, you'll find reasoning systems that reason over relatively long horizons, but the problem is that they were not grounded in the real world. So people would have to hand design some kind of logical dynamical descriptions of the world. And that didn't tie in to perception.

[00:17:03]

And so they didn't tie in to real objects and so forth. And so that that was a big gap. Now with deep learning, we start having the ability to. Really see with sensors process that and understand what's in the world, and so it's a good time to try to bring these things together when I see a few ways of getting there. One way to get there would be to say deep learning can get bolted on somehow to some of these more traditional approaches.

[00:17:29]

Now bolted on would probably mean you need to do some kind of end to end training where you say my deep learning processing somehow leads to a representation that Interm uses some kind of traditional underlying dynamical systems that can be used for planning. And that's, for example, the direction of Tammar and then architecture have been pushing with Kosal and Forgan and of course, other people to that. That's one way. Can we somehow force it into the form factor that.

[00:17:59]

Is amenable to reasoning. Another direction we've been thinking about for a long time and they didn't make any progress on was more information theoretic approaches. So the idea there was that what it means to take high level action is to take in choose a latent variable. Now, that tells you a lot about what's going to be the case in the future, because that's what it means to to take a high level action. I say, OK, what I decide I'm going to navigate to the gas station because I need to get gas for my car.

[00:18:32]

Well, that'll now take five minutes to get there. But the fact that I get there, I could already tell that from the high level action I took much earlier that we had a very hard time getting success with not saying it's a dead end necessarily, but we had a lot of trouble getting that to work. And then we start revisiting the notion of what are we really trying to achieve. What we're trying to achieve is not honestly higher, he said.

[00:18:57]

We could think about what is hierarchic, give us what it's we hope it would give us better credit assignment. Um, kind of what is better criticism is given. It's giving us it gives us faster learning. Right. And so faster learning is ultimately maybe what we're after. And so that's what we ended up with the R-squared paper on learning to reinforcement learning. Which at a time, Rockit one led, and that's exactly the meadow learning approach. We say, OK, we don't know how to design hierarchy.

[00:19:31]

We know what we want to get from it. Let's just end to an optimized for want to get from it and see if it might emerge. And we saw things emerge. The maze navigation had consistent motion down hallways. Which is what you want to call control, should say, I want to go down this hallway and then when there is an option to take a turn, I can decide whether to take a turn or not and repeat even had the notion of where have you been before or not, do not revisit places you've been before.

[00:19:56]

It still didn't scale yet to the real world kind of scenarios I think you had in mind.

[00:20:02]

But it was some sign of life that maybe you can better learn these horrible concepts. I mean, it seems like through these metal learning concepts, get at the what I think is one of the hardest and most important problems of AI, which is transfer learning. So generalization how far along this journey towards building general systems are we being able to do transfer learning? Well, so there's some signs that you can generalize a little bit, but do you think we're on the right path or totally different breakthroughs are needed to be able to transfer knowledge between different learning models.

[00:20:47]

Yeah, I'm pretty torn on this, and then I think there are some very well, there's just some very impressive results already, right? Yes, I would say when even with the initial kind of big breakthrough in 2012 with Alex, not the initial the initial thing is, OK, great. This does better on image and hence image recognition, but then immediately thereafter, there was, of course, the notion that, wow, what was learned on Image Net and you now want to solve a new task.

[00:21:21]

You can fine tune Alex Net for new tasks. And that was often found to be the even bigger deal that you learn, something that was reusable, which was not often the case before. Usually machine learning. You learn something for one scenario and that was it.

[00:21:36]

And that's really exciting. I mean, that's just a huge application. That's probably the biggest success of transfer learning to date in terms of scope and impact.

[00:21:44]

That was a huge breakthrough. And then recently I feel like similar. Kind of by scaling things up, it seems like this has been expanded upon, like people trying even bigger networks, they might transfer even better. If you looked at, for example, some of the opening results on language models and some of the recent Google results on language models. There learn for just prediction, and then they get reused for other tasks, and so I think there is something there where somehow if you train a big enough model and enough things, it seems to transfer some deep mine results that I thought were very impressive, the unreal results where it was learning to navigate mazes in ways where it wasn't just doing reinforcement learning, but it had other objectives was optimizing for.

[00:22:33]

So I think there's a lot of interesting results already.

[00:22:36]

I think maybe where it's hard to wrap my head around this. To which extent or when do we call something generalisation. Right. Or the levels of generalization involved in these different tasks. Right.

[00:22:51]

So you draw this, by the way, just to frame things I've heard you say somewhere, it's the difference between learning to master versus learning to generalize that it's a nice line to think about. And I guess you're saying that's a gray area of what learning to master and learning to generalize where one starts thinking, might have heard this, I might have heard it somewhere else.

[00:23:14]

And I think it might have been one to one of your interviews that maybe the one with your short, but I'm not hundred percent sure, but I like the example and I'm going to I'm not sure who it was. But the example was essentially, if you use current deep learning techniques, what we're doing to predict, let's say, the relative motion of, um, of our planets, it would do pretty well. But there now is a massive new mass enters our solar system.

[00:23:45]

It would not predict what will happen. Right. And that's a different kind of channelization. That's a generalization that relies on the ultimate simplest, simplest explanation that we have available today to explain the motion of planets. I just pattern recognition could predict our current solar system motion pretty well. No problem. And so I think that's an example of a kind of generalization that. It's a little different from what we've achieved so far, and it's not clear if just, you know, regularising more and forcing it to come up with a simpler, simpler, simple, I can say, look, this is not simple, but that's what physics researchers do.

[00:24:22]

Right to say, can I make this even simpler? How simple can I get? This was a simplistic question that can explain everything, right? Yeah. The master equation for the entire dynamics of the universe, we haven't really pushed that direction as hard and in deep learning. I would say. I'm not sure if it should be pushed, but it seems a kind of general you get from that that you don't get in our current methods so far.

[00:24:44]

So I just talked to Vladimir APNIC, for example, who was a statistician, statistical learning, and he kind of dreams of creating these are the equals equals EMC squared for learning the general theory of learning. Do you think that's a fruitless pursuit? In the near term, in within the next several decades, I think that's a really interesting pursuit and in the following sense, in that there is a lot of evidence that. The brain is pretty modular, and so I wouldn't think of it as a theory, maybe the underlying theory, but more kind of the principle.

[00:25:27]

Where there have been findings were. People who are blind will use the part of the brain usually used for vision for other functions. And even after some kind of people get rewired in some way, they might able to reuse parts of their brain for other functions. And so what that suggests is some kind of modularity. And I think it is a pretty natural thing to strive for, to see. Can we find that modularity? Can we find this thing?

[00:25:59]

Of course it's not. Every part of the brain is not exactly the same. Not everything can be rewired arbitrarily. But if you think of things like the neocortex, which is a pretty big part of the brain, that seems fairly modular from what the findings so far, can you design something equally modular? And if you can just grow and become some more capable? Probably I think that would be the kind of interesting underlying principle to shoot for that is not unrealistic.

[00:26:26]

Do you think you prefer math or empirical trial and error for the discovery of the essence of what it means to do something intelligent? So reinforcement learning embodies both groups, right? To prove that something converges, prove the bounds, and then at the same time, a lot of those successes are. Well, let's try this and see if it works. So which do you gravitate towards? How do you think of those two parts of your brain? So.

[00:26:56]

Maybe I would prefer we can make the progress with mathematics and the reason maybe I would prefer that is because often if you have something you can mathematically formalize, you can leapfrog a lot of experimentation and experimentation takes a long time to get through and a lot of trial and error kind of reinforcement learning your research process. But you need to do a lot of trial and error before you get to success. So if we can leapfrog that, to my mind, that's what the math is about.

[00:27:27]

And hopefully once you do a bunch of experiments, you start seeing a pattern and you can do some derivations that leapfrog some experiments. But I agree with you. I mean, in practice, a lot of the progress has been such that we have not been able to find the math that allows us to leapfrog ahead. And we are. Kind of making gradual progress one step at a time. A new experiment here, a new experiment there that gives us new insights and gradually building up, but not getting to something yet where we're just OK.

[00:27:53]

Here is an equation that now explains how, you know, that would be to have been two years of experimentation to get there. But this tells us what the results are going to be. Unfortunately, not so much as not so much yet, but your hope is there in trying to teach robots or systems to do everyday tasks or even in simulation. What do you think you're more excited about? Imitation learning or self play, so letting robots learn from humans or letting robots plan their own to try to figure out their own way and eventually play in, eventually interact with humans or whatever problem is, what's the more exciting to you?

[00:28:38]

What's more promising you think is a research direction? So. When we look at. Self play, what's so beautiful about it is because back to kind of the challenges and reinforcement learning, so the challenge of reinforcement is getting signal. And if you don't never succeed, you don't get any signal in self play. You're on both sides, so one of you succeeds and the beauty is also one of you fails. And so you see the contrast, you see the one version of me that is better than the other version.

[00:29:10]

And so every time you play yourself, you get a signal. And so whenever you can turn something into self play, you're in a beautiful situation where you can naturally learn much more quickly than in most other reinforce learning environments. So I think I think if somehow we can turn more reinforcement learning problems into self play formulations, that would go really, really far. So far, South Play has been largely around. Games where the natural opponents but if we could do self play for other things and let's say a robot learns to build a house, I mean, that's a pretty advanced thing to try to do for a robot, but maybe try to build a hut or something.

[00:29:48]

If that can be done through self play, it would learn a lot more quickly if somebody can figure that out. And I think that would be something where it goes closer to kind of the mathematical leapfrogging, where somebody figures out a formalism to say, OK, any RL problem by playing this in this idea, you can turn it into a self play problem where you get signal a lot more easily. Reality is many problems, we don't know how to turn to self.

[00:30:12]

And so either we need to provide detailed reward that doesn't just reward for achieving a goal, but rewards for making progress. And that becomes time consuming. And once you're starting to do that, let's say you want a robot to do something, you need to give all this detailed reward. Well, why not just give a demonstration? Because why not just show the robot? And now the question is, how do you show the robot? One way to show is to tell the operator robot.

[00:30:35]

And then the robot really experiences things. And that's nice because that's really high signal to noise ratio there that we've done a lot of that. And you teach a robot skills in just 10 minutes, you can teach a robot a new basic skill like, OK, pick up the bottle, placed it somewhere else. That's a skill. No matter where the bottle starts, maybe it always goes onto a target or something. That's fairly easy to teach you about with Talia.

[00:30:56]

Now, what's even more interesting, if you can now teach a robot through third person learning where the robot watches you do something and doesn't experience it, but just watches it and says, OK, well, if you're showing me that, that means I should be doing this and I'm not going to be using your hand because I don't get to control your hand, but I'm going to use my hand. I do that mapping. And so that's where I think one of the big breakthroughs has happened this year.

[00:31:20]

This was led by Chelsea Finn here. It's almost like learning a machine translation for demonstrations where you have a human demonstration and the robot Lunesta translated into what it means for the robot to do it. And that was a meta learning from listen, learn from one to get the other. And that, I think, opens up a lot of opportunities to learn a lot more quickly.

[00:31:41]

So my focus is on autonomous vehicles. Do you think this approach of third person watching is the autonomous driving is amenable to this kind of approach?

[00:31:50]

So for autonomous driving, I would say it's. That person is slightly easier, and the reason I'm going to say is slightly easier to do a third person is because the hard dynamics are very well understood. So the easier than first person you mean or easier.

[00:32:12]

So I think the distinction between third person and first person is not a very important distinction for autonomous driving. They're very similar because the distinction is really about who turns the steering wheel and or maybe, let me put it differently, how to get from a point where you are now to a point, let's say, a couple meters in front of you. And that's a problem that's very well understood. And that's the only distinction being third in first person there.

[00:32:38]

Whereas with the robot manipulation interaction, forces are very complex and it's still a very different thing for autonomous driving. I think there is still the question imitation versus RL. So imitation gives you a lot more signal. I think where imitation is lacking and needs some extra machinery is it doesn't in its normal format, doesn't think about goals or objectives. And of course there are versions of imitation learning, inverse reinforcing any type of imitation which also thinks about goals.

[00:33:11]

I think then we're getting much closer, but I think it's very hard to think of a fully reactive HA generalising well, if it really doesn't have a notion of objectives to generalize well to the kind of general that you would want, you'd want more than just that reactivity that you get from just behavioral cloning's supervised learning. So a lot of the work, whether itself play or even imitation learning, would benefit significantly from simulation, from effective simulation, and you're doing a lot of stuff in the physical world and in simulation.

[00:33:46]

Do you have hope for greater and greater power of simulation being boundless eventually to where most of what we need to operate in the physical world would could be simulated to a degree that's directly transferable to the physical world? Are we still very far away from that? So. I think. We could even rephrase that question in some sense, please. And so so the power of simulation, right. Our similarities get better and better, of course, becomes stronger and we can learn more assimilation, but there's also another version, which is where you said assimilated doesn't even have to be that precise as long as it's somewhat representative.

[00:34:35]

And instead of trying to get one simulator that is sufficiently precise to learn in and transfer really well to the real world, I'm going to build many simulators, ensemble of simulators, ensemble of simulators. Not any single one of them is sufficiently representative of the real world such that it would work if you trained in there. But if you train in all of them. Then there is something that's good in all of them, the real world will just be, you know, another one of them that's, you know, not identical to any one of them, but just another one of them.

[00:35:07]

Now, this sample from the distribution of simulator's, we do live in a simulation. So this is just one one other one. I'm not sure about that, but it's definitely a very advanced simulator of it is. Yeah, it's pretty good.

[00:35:22]

I've talked to Russell, something you think about a little bit, too. Of course, you're like really trying to build these systems. But do you think about the future of AI?

[00:35:30]

A lot of people have concerned about safety. How do you think about A.I. safety as you build robots that are operating in the physical world? What is. Yeah, how do you approach this problem in an engineering kind of way? In a systematic way? So. When a robot is doing things, you can have a few notions of safety to worry about. One is that the robot is physically strong and of course, could do a lot of damage.

[00:35:59]

Same for cars, which we can think of as robots doing way. And this could be completely unintentional, so it could be not the kind of long term safety concerns, OK, I smarter than us and now what do we do? But it could be just very practical. OK, this robot, if it makes a mistake, what are the results going to be? Of course, simulation comes in a lot there, too, to test in simulation.

[00:36:23]

It's a difficult question and I'm always wondering, like I always wonder, let's say you look at let's go back to driving because a lot of people know driving. Well, of course. What do we do to test somebody for driving, right, to get a driver's license? What do they really do? I mean, you fill out some test and then you drive. And I mean, for a few suburban California, the driving test is just you drive around the block, pull over.

[00:36:51]

You need to do a stop sign successfully and then, you know, you pull over again and you're pretty much done and you're like, OK, if a self-driving car did that, would you trust it that it can drive and be like, no, that's not enough for me to trust. But somehow for humans, we've figured out that somebody being able to do that is representative of them being able to do a lot of other things. And so I think somehow for humans, we figured out representative tests of what it means.

[00:37:20]

If you can do this, where you can really do, of course, testing, you must you must want to be tested at all times. Self-driving cars or robots can be tested more often. Probably you can have replicas that get tested or known to be identical to use the same neural net and so forth. But still, I feel like we don't have this kind of unit tests or proper tests for for robots. And I think there's something very interesting to be thought about there, especially as you update things, your software improves.

[00:37:46]

You have a better self-driving car suite, you update it. How do you know it's indeed more capable on everything than what you had before, that you didn't have any bad things creep into it. So I think that's a very interesting direction of research, that there is no real solution yet except that somehow for humans we do because we say, OK, you have a driving test, you passed. You can go on the road now and you must have accidents every like a million or ten million miles, something pretty phenomenal compared to that short test.

[00:38:16]

Yeah, that is being done.

[00:38:18]

So let me ask you've mentioned you've mentioned that Andrew NG by example showed you the value of kindness and to do you think the space of.

[00:38:30]

Policies, good policies for humans and for A.I. is populated by policies that. With kindness or ones that are the opposite exploitation, even evil, so if you just look at the sea of policies we operate under as human beings or if A.I. system had to operate in this real world, do you think it's really easy to find policies that are full of kindness that we naturally fall into them? Or is it like a very hard optimization problem? I mean, there is kind of two optimisations happening for human rights, for humans, just kind of the very long term optimization which evolution has done for us.

[00:39:13]

And we're kind of predisposed to like certain things. And that's in some sense what makes our learning easier, because, I mean, we know things like pain and hunger and thirst. And the fact that we know about those is not something that we were taught that's kind of innate. When we're hungry, we're unhappy when we're thirsty, were unhappy, won't have pain, we're unhappy. And ultimately, evolution built that into us to think about those things.

[00:39:39]

So I think there is a notion that it seems somehow humans evolved in general to. Prefer to get along in some ways, but at the same time also to be very territorial and kind of centric to their own tribe. It seems like that's the kind of space we converge down to. I mean, I'm not an expert in anthropology, but it seems like we're very kind of good within our own tribe, but need to be taught to be nice to other tribes.

[00:40:11]

Well, if you look at Steven Pinker, he highlights this pretty nicely and, uh, but better angels of our nature, where he talks about violence decreasing over time consistently. So whatever tension, whatever teams we pick, it seems that the long arc of history goes towards us getting along more and more so.

[00:40:33]

So do you think that. Do you think it's possible to cheat, teach RL based robots this kind of kindness, this kind of ability to interact with humans, this kind of policy, even to let me ask let me ask a follow on. Do you think it's possible to teach our Elbaz robot to love a human being and to inspire that human to love the robot back so to like a RL based algorithm that leads to a happy marriage?

[00:41:04]

That's interesting question, maybe I'll I'll answer it with with another question, right? Because, I mean, it's but I'll come back to it's another question. Can have a OK. I mean, how close to some people's happiness get from interacting with just a really nice. Dog like I mean, dogs, you come home, that's what dogs that they greet you, they're excited, makes you happy. When you come home to your dog, you're just like, OK, this is exciting.

[00:41:33]

They're always happy when I'm here. And if they don't greet you because maybe whatever your partner took them on a trip or something, you might not be nearly as happy when you get home. Right. And so the kind of. It seems like the level of reasoning, a doghouses is pretty sophisticated, but then is still not yet at the level of human reasoning. And so it seems like we don't even need to achieve human level reason to get like very strong affection with humans.

[00:41:58]

And so my thinking is, why not? Why couldn't with an eye, couldn't we achieve the kind of level of affection that humans feel among each other or with. Friendly animals and so forth. It's a question. Is it a good thing for us or not that that's another thing? Right, because I mean. But I don't see why not, why not? So Elon Musk says love is the answer. Maybe he should say love is the objective function and then RL is the answer.

[00:42:32]

Oh, maybe. Peter, thank you so much. I don't want to take up more of your time. Thank you so much for talking today. Well, thanks for coming by. Great to have you visit.