Editor's Note: This transcript was automatically transcribed, so mistakes are inevitable. You can contribute by proofreading the transcript or highlighting the mistakes. Sign up to be amongst the first contributors.
The following is a conversation with Elon Musk, he's the CEO of Tesla, SpaceX NewLink and a co-founder of several other companies. This conversation is part of the Artificial Intelligence Podcast. The series includes leading researchers in academia and industry, including CEOs and CEOs of automotive robotics AI at technology companies. This conversation happened after the release of the paper from our group at MIT and Driver of Functional Vigilance.
During use of Tesla's autopilot, the Tesla team reached out to me, offering a podcast conversation with Mr. Musk.
I accepted with full control of questions I could ask and the choice of what is released publicly. I ended up editing out nothing of substance. I've never spoken with Elon before this conversation, publicly or privately. Neither he nor his companies have any influence on my opinion, nor on the rigor and integrity of the scientific method that I practice in my position at MIT. Tesla has never financially supported my research, and I've never owned Tesla vehicle. I've never owned Tesla stock.
This podcast is not a scientific paper. It is a conversation I respect you on, as I do all other leaders and engineers I've spoken with. We agree on some things and disagree on others. My goal is always with these conversations is to understand the way the guest sees the world. One particular point of disagreement in this conversation was the extent to which camera based driver monitoring will improve outcomes and for how long it will remain relevant for A.I. assisted driving.
As someone who works on and is fascinated by human centered artificial intelligence, I believe that if implemented and integrated effectively, camera busdriver monitoring is likely to be of benefit in both the short term and the long term. In contrast, Iran and Tasos focus is on the improvement of autopilot such that its statistical safety benefits override any concern of human behavior and psychology. Elon and I may not agree on everything, but I deeply respect the engineering and innovation behind the efforts that he leads.
My goal here is to catalyze a rigorous, nuanced and objective discussion in industry and academia, and I assisted driving, one that ultimately makes for safer and better world. And now here's my conversation with Elon Musk. What was the vision, the dream of autopilot when in the beginning, the big picture system level when it was first conceived and started being installed in 2014, the hardware in the cars, what was the vision? The dream?
I would characterize it a vision or dream simply that there are obviously two massive revolutions and in the automobile industry, one is the transition to electrification, and then the other is autonomy. And it became obvious to me that. In the future, any any car that does not have autonomy would be about as useful as a horse, which is not to say that there's no use, it's just rare and somewhat idiosyncratic. If somebody has a horse at this point, it's just obvious that cars will drive themselves completely.
It's just a question of time. And if we did not. Participate in the autonomy revolution than our cause would not be useful to people relative to cars that are autonomous.
I mean, an autonomous car is arguably worth five to ten times more than I know, which is not autonomous.
In the long term, depends what you mean by long term, but let's say at least for the next five years, perhaps 10 years. So there are a lot of very interesting design choices with autopilot early on, first is showing on the instrument cluster or in the model three on the center stack display what the combined sensor suite sees. What was the thinking behind that choice? Was there a debate? What was the process?
The whole point of the display is to provide a health check on the vehicle's perception of reality. So the vehicles are taking information from the sensors, primarily cameras, but also radar and ultrasonics, GPS and so forth. And then that that information is then rendered into vector space and that, you know, with a bunch of objects with with properties like lane lines and traffic lights and other cars and then in vector space that is rendered onto your display so you can confirm whether the car knows what's going on or not by looking out the window.
Right. I think that's an extremely powerful thing for people to get an understanding, sort of become one with the system and understanding what the system is capable of.
Now, have you considered showing more so if we look at a computer vision? You know, like road segmentation, lead detection, vehicle, detect object detection underlying the system there is at the edges some uncertainty. Have you considered revealing the parts that the uncertainty in the system, the set of probabilities associated with with, say, image recognition or something like that?
So right now it shows like the vehicles in the vicinity of very clean, crisp image. And people do confirm that there's a car in front of me and the system sees there's a car in front of me. But to help people build an intuition of what computer vision is by showing some of the uncertainty. Well, I think it's my car.
I always look at the sort of the debug view, and there's this to debug views.
One is augmented vision wear, which I'm sure you've seen, where basically we draw boxes and labels around objects that are recognized. And then there's what we call the visualizer, which is basically the vector space representation, summing up the input from all sensors. That does not show any pictures, but it shows all of the it basically shows the court's view of of of the world in vector space. But I think this is very difficult for people to not know, people to understand.
They would not know what they were looking at.
So it's almost a nationwide challenge to the current things that are being displayed is optimized for the general public understanding of what the system is capable of.
It's like if you have no idea what how computer vision works or anything, you can look at the screen and see if the car knows what's going on. And if you're, you know, if you're a development engineer or if you're you know, if you're if you have the development build like I do, then you can see, you know, all the debugging information. But those would just be like total gibberish to most people.
What's your view on how to best distribute effort? So there's three, I would say, technical aspects of autopilot that are really important since the underlying algorithms like the neural network architecture, there's the data so that the strain on and then there's the hardware development.
There may be others, but so look, algorithm data, hardware, you only have so much money, only have so much time. What do you think is the most important thing to to allocate resources to a do you see it as pretty evenly distributed between us?
Three, we automatically get vast amounts of data because all of our cars have. Eight external facing cameras and radar and usually 12 ultrasonic sensors, GPS, obviously, and Ayumu. And so we basically have a fleet that has and we've got about 400000 cars on the road that have that level of data. I think you keep quite close track of it, actually. Yes. Yeah. So we're we're approaching half a million cars on the road that have the full sensor suite.
Yeah. So this is I'm I'm not sure how many other cars on the road have this sensor suite, but I would be surprised if it's more than 5000, which means that we we have 99 percent of all the data. So there's this huge inflow of data, absolutely massive inflow of data, and then we it's taken about three years, but now we're finally developed for self driving computer, which can. Process and an order of magnitude as much as the interior system that we currently have in the cars, and it's really just up to you that you unplug the computer and plug the Tesla computer in and that's it.
And it's it's, uh, in fact, we're not we still are exploring the boundaries of the capabilities of being able to run the cameras at full frame rate for resolution, not even crop of the images. And it's still got headroom even on one of the systems.
The force of driving computer is really two computers, two systems on a chip that are fully redundant. So you could put a bullet through basically any part of that system. And it still works the redundancy.
Are they perfect copies of each other or. Yeah.
So it's purely for redundancy as opposed to an argue machine kind of architecture where they're both making this. This is purely for redundancy, if you like.
It's if you have a twin engine aircraft, commercial aircraft. The system will operate best if both systems are operating. But it's capable of operating safely on one, so but as it is right now, we can just run we haven't even hit the edge of performance, so. Well, there's no need to actually distribute functionality across both associates, we can actually just run a full duplicate on each one.
You haven't really explored or hit the limit of this?
No, not yet at the limit. So the magic of deep learning is that it gets better with data. He said there's a huge inflow of data. But yeah, the thing about driving the really valuable data to learn from is the edge cases.
So how do you I mean, I've heard you talk somewhere about autopilot, disengagement being an important moment of time. Yes. To use. Is there other cases or perhaps can you speak to those cases? What aspects of there might be valuable or if you have other ideas, how to discover more and more and more cases in driving?
Well, there's a lot of things that are learnt, though, certainly edge cases where I say somebody is on autopilot and they they take over and then that that that that's a trigger that goes off to a system that says, OK, so the take over for convenience or do they take over because the autopilot wasn't working properly? There's also like let's say we're trying to figure out what is the optimal spline for traversing an intersection.
Then are the ones where there are no interventions and are the right ones, so you then say, OK, when it looks like this, do the following and then you get the optimal swine for a complex now getting a complex intersection. So that's for this this kind of the common case, you're trying to capture a huge amount of samples of a particular intersection, how when things went right. And then there's the edge case where, as you said, not for convenience, but something that somebody took over somebody's sort of manual control from autopilot and really like.
The way to look at this is view all input as error. If the user had to do input at all, input is error.
That's a powerful line to think of it that way, because it may very well be error. But if you want to exit the highway or if you want to, it's a navigation decision that all autopilot is not currently designed to do, then the driver takes over.
How do you know that's going to change with navigating autopilot, which we've just released and with talking of? So the navigation like lane change based like asserting control in order to their lane change or exit the freeway or or highway interchange, the vast majority of that will go away with the release that just went out.
Yeah. So that that I don't think people quite understand how big of a step that is.
Yeah, they don't. So if you drive the car then you do.
So you still have to keep your hands on the steering wheel.
Currently when it does the automatic lane change, what are so there's these these big leaps that are the development of autopilot through its history. And what stands out to you as the big leaps? I would say this one, navigate an autopilot without a confirm, without having to confirm is a huge leap.
It is a huge leap to automatically overtake slow cars. So it's both navigation and seeking the fastest lane. So it'll it'll overtake slower cars and exit the freeway and take highway interchanges and.
And then we have traffic like traffic lights, recognition. Which introduced initially as a as a warning, I mean, on the development version that I'm driving, the car fully, fully stops and goes at traffic lights.
So those are the steps, right. You just mentioned some things that an inkling of a step towards full autonomy.
What would you say are the biggest technological roadblocks to full self-driving?
Actually, I don't think I think we're just the force driving computer that we just that that has a recall, the first computer that that's now in production.
So if you order any more or less or X or any model three that has the full self-driving. A package will get the FSD computer that that was that's important that have enough basic computation, then refining the neural net and the control software. But all of that can just be providers and over their update. The thing that's really profound and I'll be emphasizing at the sort of what that Investor Day that we're having focused on autonomy is that the cars currently being produced with the hardware currently being produced is capable of full, self-driving.
But capable is an interesting word because the hardware is.
And as we refine the software. The capabilities will increase dramatically and then the reliability will increase dramatically and then it will receive regulatory approval. So essentially buying a car today is an investment in the future. You're essentially buying. You're buying. I think the most profound thing is that if you buy a Tesla today, I believe you are buying an appreciating asset, not a depreciating asset.
So that's a really important statement there, because if hardware is capable enough, that's the hard thing to upgrade. Yes, usually. Exactly. So then the rest is a software problem.
Yes, I have software. I no marginal cost. Really.
But what's your intuition on the software side, how hard are the remaining steps to to get it to where, you know, the experience, not just the safety, but the full experience is something that people would enjoy?
Well, I think we will enjoy it very much on the highway. So it's a total game changer for quality of life, for using the Tesla autopilot on the highways. So it's really just extending that functionality to city streets and the traffic like traffic, like recognition, navigating complex intersections and and then being able to navigate complicated parking lots. So the car can exit a parking space and come and find you, even if it's in a complete maze of a parking lot.
And and then and then you can just drop you off and find a parking spot by itself. Yeah.
In terms of enjoy ability and something that people would actually find a lot of use from the parking lot is a really you know, it's it's rich of annoyance when you have to do it manually.
So there's a lot of benefit to be gained from automation there. So let me start injecting the human into this discussion a little bit. So let's talk about full autonomy. If you look at the current level, four vehicles being Cesaro, like WAMMO and so on, they're only technically autonomous.
They're really level two systems. With just a different design philosophy, because there's always a safety driver in almost all cases and they're monitoring the system.
Do you see? Teslas for self driving as still for a time to come. Requiring supervision of the human being, so its capabilities are powerful enough to drive, but nevertheless requires a human to still be supervising, just like a safety driver is in a. Other fully autonomous vehicles, I think it will require.
Detecting hands on wheel for at least six months or something like that from here, it really is a question of like.
From a regulatory standpoint, what how much safer than a person does auto pilot need to be for it took to be OK to not monitor the car? You know, and this is a debate that one can have it, and then if you need a large sample, a large amount of data, so you can prove with high confidence, statistically speaking, that the car is dramatically safer than a person and that adding in the person monitoring does not materially affect the safety.
So it might need to be like two or three hundred percent safer than a person.
And how do you prove that incidents per mile, incidents per mile crashes and fatalities?
So the fatality would be a factor, but there are just not enough fatalities to be statistically significant at scale. But there are enough crashes. You know, there are far more crashes than there are fatalities. So you can assess what is the probability of a crash. That then there's another separate probability of injury and probability of permanent injury, the probability of death and all of those need to be much better than a person by at least. Perhaps two hundred percent, and you think there's a.
The ability to have a healthy discourse with the regulatory bodies on this topic, I mean, there's no question that the regulators paid a disproportionate amount of attention to that which generates press. This is just an objective fact.
And Tesla generates a lot of press so that, you know, in the United States there's, I think, almost 40000 automotive deaths per year. But if there are four and Tesla, they'll probably receive a thousand times more press than anyone else. So the psychology of that is actually fascinating, I don't think we'll have enough time to talk about that. But I have to talk to you about the human side of things. So myself and our team at M.I.T. recently released the paper on functional vigilance of drivers while using autopilot.
This is work we've been doing since autopilot was first released publicly over three years ago, collecting video driver faces and driver body.
So I saw that you tweeted a quote from the abstract so I can at least guess that you've glanced at it. Yeah. Can I talk you through what we found? Sure.
OK, so it appears that in the data that we've collected that drivers are maintaining functional vigilance such that we're looking at eighteen thousand disengagement from autopilot. Eighteen thousand nine hundred and annotating. Were they able to take over control in a timely manner? So they were there present looking at the road to take over control.
OK, so this goes against what many would predict from the body of literature and vigilance with automation.
Now, the question is, do you think these results hold across the broader population? So ours is just a small subset. Do you think one of the criticism is that, you know, there's a small minority of drivers that may be highly responsible where their vigilance decrement would increase with autopilot use?
I think this is all really going to be swept. I mean, the systems are proving so much. So fast that this is going to be a moot point very soon. Where vigilance is like if something's many times safer than a person, then adding a person does leave. The effect on safety is is limited and in fact, it could be negative. That's really interesting.
So the the so the fact that a human made some percent of the population may exhibit a visual Sekhmet will not affect overall statistics, numbers of safety.
No. In fact, I think it will become very, very quickly, maybe even towards the end of this year. But I'd say I'd be shocked if it's not next year at the latest. But having the person having a human intervene will decrease safety. Degrees, it's like, imagine if you're in an elevator, I used to be that the elevator operators and and you couldn't go in an elevator by yourself and work the lever to move between floors. And now nobody wants an elevator operator because the automated elevator that stops the floors is much safer than the elevator operator.
And in fact, it would be quite dangerous to have someone with a lever that can move the elevator between floors.
So that's a that's a really powerful statement and really interesting one. But I also have to ask, from a user experience and from a safety perspective, one of the passions for me algorithmically is camera based detection of sensing a human, but detecting what the driver is looking at, cognitive load, body pose and the computer vision side.
That's a fascinating problem. But do you and there's many in industry who believe you have to have a camera based driver monitoring.
Do you think there could be benefit gained from driver monitoring if you have a system that's that's out of that error below human level reliability, then driver monitoring makes sense. But if your system is dramatically better, more reliable than than a human, then driving, monitoring, monitoring is not does not help much.
And like I said, just like you wouldn't want someone, like you would want someone in the elevator or in an elevator, do you really want someone with a big lever? So some random person operating the elevator between floors? I wouldn't trust that or rather have the buttons.
OK, you're optimistic about the pace of improvement of the system from what you've seen with the force driving computer, the rate of improvement is exponential.
So one of the other very interesting design choices early on that connects to this is the operational design domain of autopilot.
So where autopilot is able to be turned on the contrast, another vehicle system that we're studying is the Cadillac Supercar system. That's in terms of very constrained to particular kinds of highways, well mapped, tested, but much narrower than the idea of Tesla vehicles. What's there's there's ADT.
Yeah. As good as it gets a good life.
What was the design decision? What's in that different philosophy of thinking where there's pros and cons, what we see with a wide Odiase drive, Tesla drivers are able to explore more the limitations of the system, at least early on. And they understand together with the instrument cluster display, they start to understand what are the capabilities. So that's a benefit. The con is you're got you're letting drivers use it basically anywhere.
So any way that you could detect lanes with Klayman's was their philosophy of design decisions that were challenging, that were being made there or from the very beginning, was that done on purpose with intent?
Well, I mean, I think, frankly, it's pretty crazy giving it letting people drive it a two ton death machine manually. That's crazy.
Like I like in the future, people were like, I can't believe anyone was just allowed to drive one of these two ton death machines. I think it's drive wherever they wanted, just like elevators was like the elevator with the lever, wherever you want. It can stop it halfway between floors if you want. It's pretty crazy. It's going to seem like a mad thing in the future that people were driving cars, so they have a bunch of questions about the human psychology, about behavior and so on.
And that would be that time because. You have faith in the system, not faith, but the both on the hardware side and the deep learning approach of learning from data will make it just far safer than humans. Yeah, exactly.
Recently, there are a few hackers who tricked the autopilot to act in unexpected ways, adversarial examples. So we all know that neural network systems are very sensitive to minor disturbances. These adversarial examples on input. Do you think it's possible to defend against something like this for long for the industry?
Can you elaborate on the on the confidence behind that answer?
Well, the you know, a neural net is just like a bunch of matrix math, or you have to be like a very sophisticated somebody who really has neural nets and like basically reverse engineer how the matrix is being built and then create a little thing. That's just exactly what causes the Matrix math to be slightly off. But it's very easy then block. Block that by by having. Basically, and he wrote negative recognition. It's like if if the system sees something that looks like a Matrix hack excluded.
This is such an easy thing to do, so learn both on the validator and the invalid data, so basically learn on the adversarial examples to be able to exclude them.
Yeah, like you basically want to both know what is what is a car and what is definitely not a car. And you train for this is a car and this is definitely not a car. Those are two different things. You have no idea. Neural nets really probably thinking unless it was like, you know, fishing net on me.
So as you know, so taking a step beyond just has an autopilot. Current deep learning approaches still. Seem in some ways to be. Far from general intelligence systems, do you think the current approaches will take us to general intelligence or do totally new ideas need to be invented?
I think we're missing a few key ideas for general intelligence, general artificial general intelligence. But it's going to be upon us very quickly. And then we'll need to figure out what shall we do if we even have that choice? But it's amazing how he can differentiate between, say, the narrow eye that, you know, allows a car to figure out what lane line is and and, you know, and navigate streets versus general intelligence like these are just very different things, like your toaster and and your computer are both machines, but one's much more sophisticated than another.
You're confident with that you can create the world's best toaster, the world's best history as the world's first self-driving? I'm yes, it seemed to me the right now this seems game, set, match. I don't I mean that I must be complacent or overconfident, but that's what it is. That is just literally what it how it appears right now. I could be wrong, but it appears to be the case that Tesla is vastly ahead of everyone.
Do you think we will ever created a system that we can love and loves us back in a deep, meaningful way, like in the movie her? I think I will be capable of convincing you to fall in love with it very well, and that's different than us humans. You know, we're not getting into a metaphysical question of like, do emotions and thoughts exist in a different realm, the physical and maybe they do. Maybe they don't. I don't know.
But but from a physical standpoint, I don't think I tend to think of things, you know, like physics was my main sort of training. And and from physics standpoint, especially if it loves you in a way that you can't tell whether it's real or not. It is real.
It's a physics view of love.
Yeah. If there's no if if you cannot just if you can't prove that it does not if there's no test that you can apply. That would make it. May allow you to tell the difference then there is no difference, right? And it's similar to seeing our world of simulation. There may not be a test to tell the difference between what the real world simulation and therefore, from a physics perspective, it might as well be the same thing.
Yes, and there may be ways to test whether it's a simulation. There might be I'm not saying there aren't, but you could certainly imagine that a simulation could could correct that once an entity in the simulation found a way to detect the simulation, it could either restart, pause the simulation or start a new simulation or do one of many other things that then corrects for that error. So when maybe you or somebody else creates an ajai system and you get to ask her one question, what would that question be?
What's outside the simulation? Elaine, thank you so much for talking today as a pleasure. All right, thank you.