#104 – David Patterson: Computer Architecture and Data StorageLex Fridman Podcast
- 910 views
- 27 Jun 2020
David Patterson is a Turing award winner and professor of computer science at Berkeley. He is known for pioneering contributions to RISC processor architecture used by 99% of new chips today and for co-creating RAID storage. The impact that these two lines of research and development have had on our world is immeasurable. He is also one of the great educators of computer science in the world. His book with John Hennessy “Computer Architecture: A Quantitative Approach” is how I first learned about and was humbled by the inner workings of machines at the lowest level. Support this podcast by
The following is a conversation with David Paterson, Turing Award winner and professor of computer science at Berkeley. He's known for pioneering contributions to risk processor architecture used by 99 percent of new chips today and for creating rate storage. The impact that these two lines of research and development have had in our world is immeasurable.
He's also one of the great educators of computer science in the world. His book, John Hennessy, is how I first learned about and was humbled by the inner workings of machines at the lowest level. Quick summary of the ads to sponsors the Jordan Harbinger Show and Kashyap. Please consider supporting the podcast by going to Jordan Harbage complex and downloading Kashyap and using Code Leks podcast. Click on the links, buy the stuff. It's the best way to support this podcast and in general, the journey I'm on in my research and startup.
This is the Artificial Intelligence Podcast. Enjoy it. Subscribe on YouTube. Review five, starting up a podcast supporting on Patrón or connect with me on Twitter. And Lex Friedman spelled without the E just f our ID I'd man. As usual, I'll do a few minutes of ads now and never any ads in the middle that can break the flow of the conversation. This episode is supported by the Jordan Harbinger show. Go to Jordan Harbage dot com slash Lex.
It's how he knows I set you on that page. There's links to subscribe to it. And Apple podcast, Spotify and everywhere else. I've been bingeing on this podcast. It's amazing. Jordan is a great human being. He gets the best out of his guests, does deep calls them out when it's needed. It makes the whole thing fun to listen to his interviewed Kobe Bryant, Mark Cuban, Neil deGrasse Tyson, Garry Kasparov and many more. I recently listened to his conversation with Frank Abagnale, author of Catch Me If You Can, one of the World's Most Famous Con Men.
Perfect podcast length and topic for a recent long-Distance run that I did again go to Jordan Harbinger Dotcom Lex. To give him my love and to support this podcast, subscribe also on Apple podcast, Spotify and everywhere else. The show is presented by Kashyap, the greatest sponsor of this podcast ever, and the number one finance app in the App Store, when you get to use Coeliacs podcast, Kashyap lets you send money to friends, buy Bitcoin and invest in the stock market with as little as one dollar since Kastrup allows you to buy Bitcoin.
Let me mention that cryptocurrency in the context of the history of money is fascinating. I recommend Ascent of Money as a great book on this history. Also, the audio book is amazing. Deverson credits on ledgers started around 30000 years ago, the US dollar created over two hundred years ago, and the first decentralized cryptocurrency released just over 10 years ago. So given that history, cryptocurrency still very much in its early days of development, but it's still aiming to and just might redefine the nature of money.
So, again, if you get cash out from the App Store or Google Play and use the Code Leks podcast, you get ten dollars in cash. That will also donate ten dollars to first, an organization that is helping to advance robotics or STEM education for young people around the world. And now here's my conversation with David Paterson. Let's start with the big historical question how have computers changed in the past 50 years at both the fundamental architectural level and in general in your eyes?
Well, the biggest thing that happened was the invention of the microprocessor. So computers that used to fill up several rooms could fit inside your cell phone. And not only and not only to get smaller, they got a lot faster.
So they're a million times faster than they were 50 years ago. And they're much cheaper and they're ubiquitous. You know, there's seven point eight billion people on this planet. Probably half of them have cell phones.
But now it's just remarkable. It's probably more microprocessors than there are people sure. I don't know what the ratio is, but I'm sure it's above one. Maybe it's ten to one or some number like that.
What is a microprocessor? So I wait to see what a microprocessor is to tell you what's inside a computer. So Computer Forever has classically had five pieces. There's input and output, which kind of naturally, as you'd expect, is input is like speech or typing and output is display's. There's a memory and like the name sounds, it, it remembers things, so it's integrated circuits whose job is you put information in and when you ask for it, it comes back out.
That's memory. And the third part is the processor where the microprocessor comes from. And that has two pieces as well. And that is the control, which is kind of the brain of the processor and the what's called the arithmetic unit.
It's kind of the brawn of the computer. So if you think of the as a human body, the arithmetic unit, the thing that does the number crunching is the is the body and the control is the brain. So those five pieces input output, memory, arithmetic unit and control are have been in computers since the very dawn and the last two are considered the processor. So a microprocessor simply means a process of the fits on a microchip. And that was invented about, you know, 40 years ago was the first microprocessor.
It's interesting that you refer to the arithmetic unit as the like connected to the body and the controllers of the brain. So I guess I never thought of it. That was a nice way to think of it, because most of the actions the microprocessor does in terms of literally sort of computation, but the microprocessor does computation and process information. And most of the thing it does is basic arithmetic operations. What are the operations, by the way? It's a lot like a calculator.
So there are add instructions, subtract instructions, multiply and divide. And kind of the brilliance of the invention of the micro of the computer or the processor is that it performs very trivial operations, but it just performs billions of them per second. And what we're capable of doing is writing software that can take these very trivial instructions and have them create tasks that can do things better than human beings can do today.
Just looking back through your career, did you anticipate the kind of how good we would be able to get at doing these small basic operations?
How many surprises along the way? We just kind of set back and said, wow, that I didn't expect it to go this fast, this good? Well, the the fundamental driving force is what's called Moore's Law, which was named after Gordon Moore, who's a Berkeley alumnus.
And he made this observation very early in what are called semiconductors in semiconductors. Are these ideas you can build these very simple switches and you can put them on these microchips. And he made this observation over 50 years ago. He looked at a few years and said, I think what's going to happen is the number of these little switches called transistors is going to double every year for the next decade. And he said this in nineteen sixty five. In the nineteen seventy five, he said, well maybe it's going to double every two years and that with other people since name that Moore's Law guided the industry.
And when Gordon Moore me that prediction, he, he wrote a paper back in I think in the, in the 70s and said not only is this going to happen, he wrote, what would be the implications of that? And in this article from 1965, he he shows ideas like computers being in cars and computers being in something that you would buy in the grocery store and stuff like that. So he kind of not only called his shot, he called the implications of it.
So if you were in in the computing field and if you believed Moore's prediction, he kind of said what the what would be happening in the future? So so it's not kind of it's at once sense. This is what was predicted. And you could imagine it was easy to believe that Moore's Law was going to continue. And so this would be the implications on the other side. There are these shocking events in your life. Like I remember driving in Marion across the bay in San Francisco and seeing a bulletin board at a local civic center and had a Eurail on it.
And it was like for free for the people at the time, these first URLs. And that's the w w w select stuff with the P people thought it was look like alien alien writing. Right. They did see these advertisements and commercials or bulletin boards that had this alien writing on it. So for the lay people, it's like, what the hell is going on here? And for those people, interesting, it's oh my God, this stuff is getting so popular, it's actually leaking out of our nerdy world and into the real world.
So that I mean, there is events like that. I think another one was I remember with. Early days of the personal computer, when we started seeing advertisements in magazines for personal computers, like it's so popular that it's it made the newspapers so at one hands, you know, Gordon Moore predicted it and you kind of expected it to happen. But when it really hit and you saw it affecting society, it was it was shocking.
So maybe taking a step back and looking at both engineering and philosophical perspective, what what do you see as the layers of abstraction in the computer? Do you see a computer as a set of layers of abstractions? I think that's one of the things that the computer science fundamentals is the way these things are really complicated in the way we cope with complicated software and complicated hardware as these layers of abstraction. And that simply means that we, you know, suspend disbelief and pretend that the only thing you know is that layer and you don't know anything about the layer below it.
And that's the way we can make very complicated things. And probably it started with hardware that that's the way it was done. But it's been proven extremely useful. And, you know, I would think in a modern computer today, there might be 10 or 20 layers of abstraction. And they're all trying to kind of enforce this contract is all you know is this interface. There's a set of commands that you can are allowed to use and you stick to those commands that we will faithfully execute that.
And it's like peeling the layers of a London of an onion. You get down, there's a new set of layers and so forth. So for people who want to study computer science, the exciting part about it is you can keep peeling those layers. You take your first course and you might learn to program in Python and then you can take a follow on course and you can get it down to a lower level language like C and you know, you can go and then you can if you want to, you can start getting into the hardware layers and you keep getting down all the way to that transistor that I talked about that Gordon Moore predicted.
And you can understand all those layers all the way up to the highest level application software.
So it's it's a very kind of magnetic field. If you're interested, you can go into any depth and keep going. In particular, what's happening right now or what's happened in software the last 20 years and recently in hardware, there's going to be open source versions of all of these things. So at open source means is what the engineer, the programmer designs. It's not secret that belonging to a company, it's out there on the World Wide Web. So you can see it.
So you can look at for lots of pieces of software that you use, you can see exactly what the programmer does if you want to get involved.
That used to stop at the hardware.
Recently, there's been an effort to make open source hardware and those interfaces open. So you can see that. So instead of before you had to stop at the hardware, you can now start going layer by layer below that and see what's inside there. So it's it's a remarkable time that for the interested individual can really see in great depth what's really going on and the computers that power everything that we see around us.
Are you thinking also when you say open source at the hardware level, is this going to the design architecture instruction set level or is it going to literally the the, you know, the manufacture of the of the actual hardware of the actual chips, whether that's a specialized a particular domain or the general?
So let's talk about that a little bit.
So when you get down to the bottom layer of software, the way software talks to hardware is in a vocabulary and what we call that vocabulary, we call that the words of that vocabulary called instructions. And the technical term for the vocabulary is instruction set.
So those instructions are like we talked about earlier, that can be instructions like add, subtract, multiply, divide. There's instructions to put data into memory, which is called a store instruction, and to get data back, which is called the load instructions.
And those simple instructions go back to the very dawn of computing. And you know, in nineteen fifty, the commercial commercial computer had these instructions. So that's the instruction set that we're talking about. So up until I'd say ten years ago, these instruction sets are all proprietary. So a very popular one is owned by Intel, the one that's in the cloud and in all the pieces in the world. Intel owns that instruction set it's referred to is the eighty six.
There have been a sequence of ones that the first number was called eighty eighty six. And since then there's been a lot of numbers but. They all end in 86, so there's been that kind of family of instruction sets, and that's proprietary and that's proprietary. The other one that's very popular is from ARM, that kind of powers all the all the cell phones in the world, all the iPads in the world, and a lot of things that are so-called Internet of Things devices arm.
And that one is also proprietary. ARM will license it to people for a fee, but they own that. So the new idea that got started at Berkeley, kind of unintentionally 10 years ago, is in early in my career, we pioneered a way to do of these vocabularies, instruction sets that was very controversial at the time.
At the time, in the 1980s, conventional wisdom was these vocabulary instruction sets should have, you know, powerful instructions. So polysyllabic kind of words, you can think of that.
And and so instead of just add, subtract and multiply, they would have polynomial vide or sort a list. And the hope was of those powerful vocabularies that make it easier for software.
So we thought that didn't make sense for microprocessor service people at Berkeley and Stanford and IBM, who argued the opposite and will be called that was a reduced instruction set computer in the abbreviation was RISC and typical for computer people. We use the abbreviations that are pronouncing it. So risk was so we said, for microprocessors, which with gaudens more is changing really fast. We think it's better to have a pretty simple set of instructions, reduce set of instructions that that would be a better way to build microprocessors since they're going to be changing so fast due to Moore's Law.
And then we'll just use standard software to cover the use, generate more of those simple instructions. And one of the pieces of software that it's in that software stack going between these layers of abstractions is called a compiler. And it's basically translates to a between levels. We said the translator will handle it. So the technical question was.
Well, since they're these reduced instructions, you have to execute more of them. Yeah, that's right. But maybe you could execute them faster. Yeah, that's right. There's simpler so they could go faster, but you have to do more of them. So what's what's that tradeoff look like? And it ended up that we ended up executing maybe 50 percent more instructions, maybe a third more instructions, but they ran four times faster. So so this risk, controversial risk ideas proved to be maybe factors of three or four better.
I love that this idea was controversial and almost kind of like a rebellious. So that's in the context of what was more conventional is the complex and structural set computing. So how would you pronounce that Cesc Cesc risk versus risk?
And and believe it or not, this sounds very, very you know, who cares about this, right? It was it was violently debated at several conferences.
It's like, what's the right way to go as is? And people thought risk was, you know, was a devolution. We're going to make software worse by making the instructions simpler. And they're fierce debates at several conferences in the 1980s and then later in the 80s that kind of settled to these benefits.
It's not completely intuitive to me why risk has, for the most part, won.
So why did that happen? Yeah. Yeah. And maybe I can sort of say a bunch of dumb things that could lay the land for further commentary. So to me, this is this is kind of interesting thing. If you look at C++ versus C with modern compilers, you really could write faster code with C++. So relying on the compiler to reduce your complicated code into something simple and fast. So to me, comparing risk, maybe this is a dumb question, but why is it that focusing the definition of the design of the instruction set on very few simple instructions?
In the long run, provide faster execution versus coming up with, like you said, a ton of complicated instructions, then over time, you know, years, maybe decades, you come up with compilers that can reduce those into simple instructions for you. Yes. Let's try and that into two pieces. So. If the compiler can do that for you, if the compiler can take, you know, a complicated program and produce simpler instructions, then the programmer doesn't care programmer.
I don't care just how how fast is the computer I'm using? How much does it cost? And so what we what happened kind of in the software industry is right around before the 1980s, critical pieces of software were still written, not in languages like C or C++. They were written in what's called assembly language, where there's this kind of humans writing exactly at the instructions, at the level that that a computer can understand. So they were writing add, subtract, multiply, you know, instructions.
It's very tedious. But the belief was to write this lowest level of software that that people use, which are called operating systems. They had to be written in assembly language because these high level languages were just too inefficient, they were too slow or the the programs would be too big.
So that changed with a famous operating system called Unix, which is kind of the the grandfather of all the operating systems today. So the Unix demonstrated that you could write something as complicated as an operating system in a language like C. So once that was true, then that meant we could hide the instruction set from the programmer. And so that meant then it didn't really matter. The programmer didn't have to write lots of these simple instructions. So that was up to the compiler.
So that was part of our arguments for risk is if you were still writing assembly, language may maybe a better case for SIST Constructions. But if the compiler can do that, it's going to be, you know, that's done once the computer translates it once and then every time you run the program, it runs at this this potentially simpler instructions. And so that that was the debate.
Right, is because people would acknowledge that the simpler instructions could lead to a faster computer. You can think of monosyllabic instructions. You could say if you think of reading, you probably read them faster, say them faster than long instructions. The same thing. That analogy works pretty well for hardware. And as long as you didn't have to read a lot more, those instructions, you could win. So that's that's that's the basic idea for. But it's interesting that in that discussion of UNIX and see that there's only one step of levels of abstraction from the code that's really the closest to the machine, to the code that's written by human.
It's at least to me, again, perhaps a dumb intuition, but it feels like there might have been more layers, sort of different kinds of humans stacked well of each other.
So it's true and not true about what you said is. Several of the layers of software. So if you two layers would be exposed, we just talk about two layers, that would be the operating system like you get from from Microsoft or from Apple, like iOS or the Windows operating system and let's say applications that run on top of it like word or Excel. So both the operating system could be written in C and the application could be written in C, so but you could construct those two layers and the applications absolutely do call upon the operating system.
And the the change was that both of them could be written in higher level languages. So it's one step of a translation, but you can still build many layers of abstraction of software on top of that. And that's how things are done today. So still today, many of the layers that you'll you'll deal with, you may deal with debuggers, you may deal with linkers, there's libraries. Many of those today will be written in C++. So even though that language is pretty ancient and even the Python interpreters probably written in C or C++, so lots of layers there are probably written in these some old fashioned, efficient languages that still take one step to produce these instructions, produce risk instructions, but they're composed.
Each layer of software invokes one another through these interfaces and you can get 10 layers of software that way. So in general, the risk was developed here. Berkely, it was kind of the three places that were these radicals that advocated for this against the rest of the community where I.B.M., Berkeley and Stanford, you're one of these radicals and.
How radical did you feel? How confident did you feel, how doubtful were you that risk might be the right approach?
Because it made you can also into that is kind of taking a step back into simplicity, not forward into simplicity.
Yeah, no, it was easy to make. Yeah, it was easy to make the argument against it. Well, this was my colleague John Hennessey at Stanford and we were both assistant professors. And for me, I just believed in the power of our ideas, my thought, but we were saying made sense. Moore's Law is going to move fast. The other thing that I didn't mention is one of the surprises of these complex instruction sets.
You could certainly write these complex instructions if the programmers writing them themselves, it turned out to be kind of difficult for the compiler to generate those complex instructions. Kind of ironically, you'd have to find the right circumstances that just exactly fit this complex instruction. It was actually easier for the compiler to generate these simple instructions. So not only did these complex instructions make the hardware more difficult to build, often the compiler wouldn't even use them. And so it's harder to build.
The compiler doesn't use them that much. The simple instructions go better with Moore's Law that the number of transistors is doubling every every two years. So we're going to have, you know, the you want to reduce the time to design the microprocessor. That may maybe more important than no instructions. So I think we believed in the time that we were right that this was the best idea.
Then the question became in these debates, well, yeah, that's a good technical idea. But in the business world, this doesn't matter. There's other things that matter. It's like arguing that if there's a standard with the railroad tracks and you've come up with a better with but the whole world is covered railroad tracks, so you'll hear your ideas have no chance of success. Commercial success. It was technically right, but commercially it'll be insignificant.
Yeah, it's kind of sad that this world, the history of human civilization is full of good ideas that are lost because somebody else came along first with a worse idea. And it's good that in the computing world, at least some of these have.
While you could I mean, there's probably still some people that say, yeah, well, and what happened was what was interesting until a bunch of the system companies were cischke instruction sets of vocabulary, they gave up, but not Intel. What Intel did to its credit, because Intel's vocabulary was in the in the personal computer. And so that was a very valuable vocabulary, because the way we distribute software is in those actual instructions. It's in the instructions of that instruction set.
So they you don't get that source code. What the programmers wrote you get after it's been translated into the lowest level. That's if you were to get a floppy disk or download software. It's in the instructions of that instruction set. So the instruction set was very valuable. So what Intel did, cleverly and amazingly, is they had their chips in the hardware do a translation step. They would take these complex instructions and translate them into essentially and risk instructions in hardware on the fly, you know, at at gigahertz clock speeds.
And then any good idea the risk people had, they could use and they could still be compatible with this with this really valuable PC software software base and which also had very high volumes, you know, one hundred million personal computers per year. So the Cisco architecture in the business world was actually one in this PC era.
So just going back to the the time of designing risk. When you design an instruction set architecture, do you think like a programmer? Do you think like a microprocessor engineer? Do you think like an artist, a philosopher? Do you think in software and hardware? I mean, is that art science?
Yeah, I'd say I think designing a good instruction set is an art.
And I think you're trying to balance the the simplicity and speed of execution with how well, easy it will be for compilers to use it. You're trying to create instruction set that everything in there can be used by compilers. There's not things that are missing that'll make it difficult for the program to run the run efficiently. But you want it to be easy to build as well. So it's that kind of. So you're thinking I'd say you're thinking hardware, trying to find a hardware software compromise that will work well.
And and it's you know, it's you know, it's a matter of taste. Right. It's it's kind of fun to build instruction sets. It's not that hard to build an instruction set, but to build one that catches on and people use you know, you have to be, you know, fortunate to be the right place at the right time or have a design that people really like are using metrics.
So that quantifiable because you kind of have to anticipate the kind of progress that people will write ahead of time so that can you use numbers, can you use metrics, can you quantify something ahead of time? Or is this again, that's the art part where you're kind of a big a big change kind of what happened?
I think from Hennessy's and my perspective in the 1980s, what happened was going from kind of really, you know, taste and hunches to quantifiable. And in fact, he and I wrote a textbook at the end of the 1980s called Computer Architecture, a quantitative approach. I heard that.
And and it's it's the thing it had a pretty big impact in the field because we went from textbooks that kind of listed. So here's what this computer does. Here's the pros and cons and here's what this computer does. And pros and cons to something where there were formulas and equations where you could measure things. So specifically for instructions.
That's what we do in some other fields do is we agree upon a set of programs which we call benchmarks and a suite of programs. And then you develop both the hardware and the compiler and you get numbers on how well your your computer does, given its instruction set and how well you implemented it in your microprocessor and how good your compilers are in computer architecture using professors terms, we grade on a curve rather than great an absolute scale.
So when you say, you know, this these programs run this fast, well, that's kind of interesting. But how do you know it's better while you compare it to other computers at the same time? So the best way we know how to make turns it into a kind of more science and experimental and quantitative is to compare yourself to other computers of the same era that have the same access, the same kind of technology on commonly agreed benchmark programs. So maybe to toss up two possible directions, we can go one is what are the different tradeoffs in designing architectures?
We've been talking about this risk, but maybe a little bit more detail in terms of specific features that you were thinking about. And the other side is what are the metrics that you're thinking about when looking at these trade offs?
Yeah, well, let's talk about the metrics. So during these debates, we actually had kind of a hard time explaining, convincing people the ideas. And partly we didn't have a formula to explain it. And a few years into it, we hit upon a formula that helped explain what was going on.
And I think if we can do this, how it works aurally does it so the if I can do a formula orally, so the so fundamentally, the way you measure performance is how long does it take a program to run the program if you have 10 programs? And typically these benchmarks are sweet because you'd want to have 10 programs so they could represent lots of different applications. So for these 10 programs, how long would it take to run one? Now when you're trying to explain why it took so long, you could factor how long it takes a program to run into three factors.
One of the first one is how many instructions did it take to execute? So that's the that's what we've been talking about. The instructions will be how many did it take? All right. The next question is, how long did each instruction take to run on average? So you multiply the number of instructions, how long it took to run and that time. OK, so that's but now let's look at this metric of how long the take the instructions to run.
Well, it turns out the way we could build computers today is they all have a clock. And you've seen this when you if you buy a microprocessor, it'll say three point one gigahertz or two point five gigahertz and more gigahertz is good.
Well, what that is, is the speed of the clock. So two point five gigahertz turns out to be four billions of instruction or four nanoseconds. So that's the clock cycle time. But there's another factor, which is what's the average number of clock cycles that takes per instructions? So it's number of instructions, average number of clock cycles in the clock cycle time. So in these recist debates, we would they would concentrate on. But risk needs to take more instructions.
And we'd argue that maybe the clock cycle is faster.
But what the real big difference was, was the number of clock cycles per instruction or instruction, as I say.
What about the mass of the beautiful mass of parallelism in the whole picture, parallelism, which has to do with, say, how many constructions could execute in parallel and things like that you could think of that is affecting the clock cycles per instruction because the average clock cycles per instruction. So when you're running a program, if it if it took one hundred billion instructions and on average it took two clock cycles per instruction and they were four nanoseconds, you could multiply that out and see how long it took to run.
And there's all kinds of tricks to try and reduce the number of clock cycles per instruction.
But it turned out that the way they would do these complex instructions is they would actually build what we would call an interpreter in a simpler, a very simple hardware interpreter. But it turned out that for the six constructions, if you had to use one of those interpreters, it would be like ten clock cycles per instruction where the risk instructions could be two. So there'd be this factor of five advantage and clock cycles for instruction. We have to execute, say, 25 or 50 percent more instruction.
So that's where the wind would come. And then you could make an argument whether the clock cycle times are the same or not, but pointing out that we could divide the benchmark results time per program into three factors. And the biggest difference in risk insists, was the clock cycles per execute. A few more instructions, but the clock cycles for instruction is much less.
And that was what this debate once we made that argument, then people say, oh, I get it. And so we went from it was outrageously controversial in nineteen eighty two that maybe, probably by nineteen eighty four or so people said, oh yeah, technically they've got a good argument.
What are the instructions in the risk instruction set just to get it. Intuition. OK. Nineteen ninety five. I was asked to predict the future. What microprocessor future. So I and that as I'd seen these predictions and usually people predict something outrageous just to be entertaining. And so my prediction for twenty twenty was, you know, things are going to be pretty much they're going to look very familiar to what they are and they are. If you were to read the article, you know, the things I said are pretty much true.
The instructions that have been around forever are kind of the same.
And that's the outrageous prediction actually, given how fast computers are going.
Well, and, you know, Moore's Law was going. Go on, we thought for twenty five more years, you know, who knows, but kind of the surprising thing, in fact, Hennessy and I, you know, one the the ACM a.m. Turing Award for both the risk constructions that contributions and for that text book I mentioned.
But, you know, we're surprised that here we are 35, 40 years later after we did our work. And the conventional wisdom of the best way to do instruction sets is still those risk instruction sets that look very similar to what we looked like we did in the nineteen eighties.
So those are surprisingly, there hasn't been some radical new idea, even though we have, you know, a million times as many transistors as we had back then.
But what are the basic instructions and how do they change over the years. So we're talking about addition, subtraction.
These are the specific. So the the the things that are in a calculator are in a computer. So any of the buttons that are on the calculator in the computer, so the the button. So if there's a memory function key and like I said, those are turns into putting something in memories called a store brings it back to load. Just a quick tangent.
When you say memory, what does memory mean?
Well, I told you there were five pieces of a computer. And if you remember in a calculator, there's a memory key. So you want to have intermediate calculation and bring it back later. So you'd hit the memory plus key and plus maybe and it would put that into memory. And then you'd hit an R.M. like reconstruction and it bring it back on the display so you don't have to type it, you have to write it down, bring it back again.
So that's exactly what memory is that you can put things into it as temporary storage and bring it back when you need it later. So that's memory and loads in stores. But the big thing, the difference between a computer and a calculator is that the computer can make decisions. And amazingly, decisions are as simple. Is is this value less than zero or is this value bigger than that value? So there's and those instructions, which are called conditional branch instructions, is what give computers all its power.
If you were in the early days of computing before the what's called the general purpose, microprocessor, people would write these instructions kind of in hardware and but it couldn't make decisions.
It would just it would do the same thing over and over again with the power of having branch instructions that can look at things and make decisions automatically. And it can make these decisions, you know, billions of times per second. And amazingly enough, we can get, you know, thanks to advances, machine learning, we can we can create programs that can do something smarter than human beings can do. But if you go down that very basic level, it's the instructions are the keys on the calculator, plus the ability to make decisions.
These conditional branch instructions and all decisions fundamentally can be reduced down to these basic functions.
So, in fact and so, you know, going way back in the stack back to we did four research projects at Berkeley in the 1980s. They did a couple at Stanford in the nineteen eighties. In 2010, we decided we wanted to do a new instruction set, learning from the mistakes of those risk architectures, the 1980s. And that was done here at Berkeley almost exactly 10 years ago. And the people who did it, I participated. But other Christus, Danovitch and others drove it.
They called it five to honor those the forest projects of the 1980s.
So what is risk five involve? SALIERS five is another instruction set vocabulary. It's learned from the mistakes of the past, but it still has. If you look at the there's a core set of instructions that's very similar to the simplest architectures from the nineteen eighties. And the big difference about risk five is it's open. So I talked earlier about proprietary versus open source software. So this is an instruction set. So it's a vocabulary, it's not it's not hardware, but by having an open instruction set, we can have open source implementations, open source processors that people can use.
What do you see that going? So it's a really exciting possibilities. But you just think in the Scientific American, if you were to predict 10, 20, 30 years from now that kind of ability to utilize open source instruction, set architectures like risk five, what kind of possibilities might that unlock?
Yeah, and so just make it clear, because this is confusing. The specification of risk five is something that's like in a textbook. There's books about it. So that's what that's defining an interface. There's also the way you build hardware is you write it in languages. They're kind of like C, but they're specialized for hardware that gets translated into hardware. And so these implementations of this specification are what are they, open source. So they're written in something that's called Vilborg or VDL, but it's put up on the web just like you can see the C++ code for Linux on the web.
So that's the open instruction set enables open source implementations of response.
They can literally build a processor using this instruction that people are people are. So what happened to us? The story was. This was developed here for our use to do our research, and we made it, we licensed under the Berkeley software distribution license, like a lot of things get licensed here. So other academics use it. They wouldn't be afraid to use it. And then about 2014, we started getting complaints that we were using it in our research, in our courses, and we got complaints from people in industries.
Why did you change your instruction set between the fall and the spring semester and. Well, we get complaints of industrial time.
Why the hell do you care what we do with our instruction set? And then when we talk to them, we found out there was this thirst for this idea of an open instruction set architecture, and they had been looking for one. They stumbled upon hours at Berkeley. Thought it was by this looks great. We should use this one. And so once we realized there is this need for an open instruction set architecture, we thought that's a great idea.
And then we started supporting it and tried to make it happen. So this was we accidentally stumbled into this into this need in our timing was good. And so it's really taking off.
There's a you know, universities are good at starting things, but they're not good at sustaining things. So like Linux has the Linux Foundation, there's a risk free foundation that we started. There's there's an annual conferences. And the first one was done, I think January 2015. And the one that was just last December, and it had 50 people at it. And the last one last December had seventeen hundred people were at it and the companies excited all over the world.
So predicting into the future, you know, if we were doing twenty five years, I would predict that Risk five will be, you know, possibly the most popular instruction set architecture out there, because it's a pretty good instruction set architecture and it's open and free and there's no reason lots of people shouldn't use it. And there's benefits just like Linux is so popular today compared to 20 years ago.
I and you know, the fact that you can get access to it for free, you can modify it, you can improvement for all those same arguments. And so people collaborate to make it a better system for everybody to use. And that works in software. And I expect the same thing will happen in hardware.
So if you look at the ARM Intel MIPS, if you look at just the lay of the land and what do you think? Oh, just for me, because I'm not familiar how difficult this kind of transition would be, how much challenge is this kind of transition would entail?
Do you see me ask my dumb question.
And there were no that's I know where you're headed, but there's a bunch.
I think the thing you point out, there's there's these very popular proprietary instructions. That's the X six.
And so how do we move to risk five potentially in sort of in the span of five, ten, 20 years kind of unification? And given that the device is the kind of way we use devices, Iot mobile devices and in the cloud is keeps changing?
Well, part of it a big piece of it is the software stack.
And what right now, looking forward, there seem to be three important markets.
There's the cloud and the cloud is simply companies like Alibaba and Amazon and Google, Microsoft having these giant data centers with tens of thousands of servers in maybe one, maybe one hundred of these data centers all over the world. And that's what the cloud is. So the computer that dominates the cloud is the 636 structures of the instructions are the instructions that using the cloud of the exit is almost almost one hundred percent of that today is x 86.
The other big thing are cell phones and laptops. Those are the big things today. I mean, the PC is also dominated by the second set, but those sales are dwindling. You know, there's maybe two hundred million PCs year and there's one and a half billion phones a year. There's numbers like that. So for the phones, that's dominated by arm and now and a reason that I talked about the software stacks and the third category is Internet of Things, which is basically embedded devices, things in your cars and your microwaves everywhere.
So what's different about those three categories is for the cloud. The software that runs in the cloud is determined by these companies, Alibaba, Amazon, Google, Microsoft. So they control that software stack for the cell phones. There's both for Android and Apple, the software they supply, but both of them have marketplaces where anybody in the world can build software and that. Software is translated or, you know, compiled down and shipped in the vocabulary of.
So that's what's referred to as binary compatible because the actual it's the instructions are turned into numbers, binary numbers and shipped around the world. So.
And so just a quick interruption.
So arm what his arm his arm is in instructions like a risk based.
Yeah. It's a risk based instruction that has a proprietary one. Arm stands for Advanced Risk Machine. ARMM is the name where the company goes. So it's a proprietary risk architecture. So and it's been around for a while. And the surely the most popular instruction set in the world right now, they every year billions of chips are using the arm design.
In this post PC era, it was the one of the early risk adopters of the risk idea. The first arm goes back, I don't know, eighty six or so. So Berkely instead did their work in the early 80s. Their arm guys needed instruction set and they read our papers and it heavily influenced them. So getting back my story, what about the Internet of Things?
Well, software is not shipped in Internet of Things. It's the the the embedded device. People control that software stack. So the opportunities for risk five, everybody thinks is in the Internet of Things, embedded things, because there's no dominant player like there is in the cloud or the smartphones. And, you know, it's it's doesn't have a lot of licenses associated with it. And you can enhance the instructions said if you want, and it's in it.
People have looked at instruction sets and think it's a very good instruction set.
So it appears to be very popular there.
It's possible that in the cloud, people, those companies control their software stacks. So it's possible that they would decide to use race five.
If we're talking about 10 and 20 years in the future, the one of the harder it would be the cell phone since people shipped software in the arm instruction set that you'd think be the more difficult one. But if Risk Five really catches on and, you know, you could in a period of a decade, you can imagine that's changing over to.
Do you have a sense why risk five or ARM has dominated? You mentioned these three categories. Why has why did armed dominate? Why does it dominate the mobile device space? And maybe the my naive intuition is that there's some aspects of power efficiency that are important that somehow come along with the risk.
Well, part of it is for these old construction that's like in the next eighty six, it it was more expensive to these for, you know, they're older, so they have disadvantages in them because they were designed forty years ago. But also they have to translate in hardware from Cesc Constructions to risk instructions on the fly, and that costs both silicon area that the chips are bigger to be able to do that and it uses more power. So ARM has, which has followed this philosophy, is seen to be much more energy efficient.
And in today's computer world, both in the cloud and cell phone and things, it isn't the limiting resource, isn't the number of transistors you can fit in the chip. It's what how much power can you dissipate for your applications? So by having a reduced instruction set, that's possible to have the simpler hardware, which is more energy efficient and energy efficiency is incredibly important in the cloud. When you have tens of thousands of computers in a data center, you want to have the most energy efficient ones there as well.
And of course, for embedded things running off of batteries, you want those to be more energy efficient in the cell phones, too. So I think it's believed that there's a energy disadvantage of using these more complex instructions that architectures.
So the other aspect of this is if we look at Apple, Qualcomm, Samsung, while they all use the arm architecture, and yet the performance of the systems varies.
I mean, I don't know whose opinion you take on, but, you know, Apple for some reason seems to perform better in terms of these implementations architectures. So where's the magic and the pictures that happen?
So what ALM pioneered was a new business model, as they said. Well, here's our proprietary instruction set and we'll give you two ways to do it. Either will give you one of these implementations written in things like Sea called Vilborg, and you can just use ours. Will you have to pay money for that? Not only we'll give you the you know, we'll license you to do that or you could design your own. And so we're talking about numbers like tens of millions of dollars to have the right to design your own since they.
It's the. And set belongs to them. So Apple got one of those. The right to build their own. Most of the other people who build like Android phones just get one of the designs from ARM to do it themselves. So Apple developed a really good microprocessor design team. They, you know, acquired a very good team that had was a building other microprocessors and brought them into the company to build their designs. So the instruction sets are the same, the specifications are the same, but they're hardware design is much more efficient than I think everybody else's.
And that's given Apple an advantage in the marketplace and that the iPhones tend to be faster than most everybody else's phones that are there. It'd be nice to be able to jump around and kind of explore different little sides of this. Let me ask one the sort of romanticised question, what to you is the most beautiful aspect or idea of risk instruction set or instruction sets for this?
Well, I think, you know, I'm you know, I I was always attracted to the idea of small is beautiful.
Is that the temptation in engineering? It's kind of easy to make things more complicated.
It's harder to come up with. It's more difficult, surprisingly, to come up with a simple, elegant solution. And I think there's a bunch of small features of of risk in general that, you know, where you can see this examples of keeping it simpler makes it more elegant specifically. And Risk five, which know I was kind of the mentor in the program, but was really driven by Christos Sandvik and two grad students, Andrew Waterman. And simply because they hit upon this idea of having a subset of instructions, a nice simple subset instructions like 40 instructions that all software, the software stuff risk five can run just on those 40 instructions, and then they provide optional features that could accelerate the performance instructions that, if you needed them, could be very helpful, but you don't need to have them.
And that that's a new really a new idea. So Risk Five has right now maybe five optional subsets that you can pull in, but the software runs without them. If you just want to build the just the core for the instructions, that's fine. You can do that. So this is fantastic educationally. As you can explain computers, you only have to explain 40 instructions and not thousands of them. Also, if you invent some wild and crazy new technology like, you know, biological computing, you'd like a nice simple instruction set and you can risk five.
If you implement those core instructions, you can run, you know, really interesting programs on top of that. So this idea of a core set of instructions that the software stack runs on and then optional features that if you turn them on, the compiler is reused, but you don't have to. I think it's a powerful idea. What's happened in the past for the proprietary instruction sets is when they add new instructions, it becomes required piece and so that all all microprocessors in the future have to use those instructions.
So it's kind of like for a lot of people, as they get older, they gain weight.
That weight in age are correlated. And so you can see these instruction sets get getting bigger and bigger as they get older. So Risk five lets you be as slim as you're as a teenager. And you only have to add these extra features if you're really going to use them rather than you have no choice. You have to keep growing with the instructions.
I don't know if the analogy holds up, but that's a beautiful notion that there's it's almost like a nudge towards here's the simple core. That's the essential. You know, I think the surprising thing is still, if we if we brought back the pioneers from the 1950s and showed them the instruction set architecture, they understand it. They say, well, that doesn't look that different. Well, I'm surprised. And it may be something to talk about philosophical things.
I mean, there may be something powerful about those, you know, forty or fifty instructions that all you need is these commands, like these instructions that we talked about. And that is sufficient to build to bring about, you know, artificial intelligence. And so it's a remarkable. Surprising to me, that is complicated as it is to build these things, you know, microprocessors were the line with our narrower than the wavelength of light, you know, is this amazing technologies at some fundamental level?
The commands that software executes are really pretty straightforward and haven't changed that much in in decades. What a surprising outcome.
So underlying all computation, all the touring machines, all artificial intelligence systems perhaps might be a very simple instruction set like like a risk five or it's.
Yeah, I mean, that's kind of what I said. I was interested to see. I had another more senior faculty colleague, and he he had written something in Scientific American and, you know, his twenty five years in the future.
And his turned out about when I was a young professor and he said, yep, I checked it as I was interested to see how that was going to turn out for me and pretty much held up pretty well.
But yeah. So there's there's probably there's some you know, there's there must be something fundamental about those instructions that we're capable of creating, you know, intelligence from pretty primitive operations and just doing them really fast.
You kind of mentioned a different maybe radical computational medium like biological. And there's other ideas. So there's a lot of space in a sort of domain specific and then there could be quantum computers. And so we can think of all of those different mediums and types of computation. What's the connection between swapping out different hardware systems and the instruction set? Do you see those as disjoint or are they fundamentally coupled? Yeah.
So what's so kind of if we go back to the history, you know, when Moore's Law is in full effect and you're getting twice as many transistors every couple of years, you know, kind of the challenge for computer designers is how can we take advantage of that? How can we turn those transistors into better computers faster typically? And so there was an era, I guess, in the 80s and 90s where computers were doubling performance every 18 months. And if you weren't around, then what would happen is you had your computer and your friend's computer, which was like a year, year and a half newer, and it was much faster than your computer.
And you he he or she could get their work done much faster than you can because you were so people took their computers perfectly good computers and threw them away to buy a newer computer because the computer, one or two years later was so much faster. So that's what the world was like in the 80s and 90s.
Well, with the slowing down of Moore's Law, that's no longer true right now. With that, just like computers, with the laptops, I only get a new laptop when it breaks out. Damn, the disk broke or the display broke. I got to buy a new computer, but before you would throw them away, because it just they were just so sluggish compared to the latest computers.
So that's, you know, that's a huge change of what's gone on.
So but since this lasted for decades, kind of programmers and maybe all of society is used to computers getting faster regularly. It we now now believe those of us who are in computer design called computer architecture, that the path forward is instead is to add accelerators that only work well for certain applications. So since Moore's Law is slowing down, we don't think general-purpose computers are going to get a lot faster. So the internal processes of the world are not going to haven't been getting a lot faster.
They've been barely improved like a few percent a year. It used to be doubling every 18 months and now it's doubling every 20 years. So it was just shocking. So to be able to deliver on what Moore's Law used to do, we think what's going to happen, what is happening right now is people adding accelerators to their microprocessors that only work well for subdomains.
And by sheer coincidence, at the same time that this is happening has been this revolution in artificial intelligence called machine learning.
So with as I'm sure your other guests have said, you know, I had these two competing schools of thought is that we could figure out artificial intelligence by just writing the rules top down or that was wrong. You had to look at data and infer what the rules are. The machine learning and what's happened in the last decade or eight years is machine learning has won and at. Turns out that machine learning the hardware you built for machine learning is pretty much multiply, the matrix multiply is a key feature for the way people machine learning is done.
So that's a godsend for computer designers. We know how to make Matrix multiply, run really fast. So general-purpose microprocessors are slowing down. We're adding accelerators for machine learning that fundamentally are doing matrix multipliers much more efficiently than general-purpose computers have done. So we have to come up with a new way to accelerate things. The danger of only accelerating one application is how important is that application turns? It turns like machine learning and gets used for all kinds of things.
So serendipitously we found something to accelerate that's widely applicable. And we don't even we're in the middle of this revolution of machine learning. We're not sure what the limits of machine learning are. So this has been kind of a godsend if you're going to be able to deliver on improved performance. As long as people are moving their programs to be embracing more machine learning, we know how to give them more performance even as Moore's Law is slowing down.
And counterintuitively, the machine learning mechanism, you can say is domain specific, but because it's leveraging data, it's actually could be very broad in terms of in terms of the domains that could be applied in.
Yeah, that's exactly right. Sort of. It's almost sort of people sometimes talk about the idea of Software 2.0. We're almost taking another step up in the abstraction layer, in designing machine learning systems, because now you're programming this piece of data in the space of hyper parameters. It's changing fundamentally the nature of programming.
And so the specialized devices that that accelerate the performance, especially neural network based machine learning systems, might become the new general.
Yeah, so the this the thing that's interesting point out, these are not these are not tied together. The enthusiasm about machine learning, about creating programs driven from data that we should figure out the answers from data rather than kind of top down, which classically the way most programming is done in the way artificial intelligence used to be done. That's a movement that's going on at the same time, coincidentally. And the first word, machine learning machines. Right.
So that's going to increase the demand for computing because instead of programmers being smart, writing those those things down, we're going to instead use computers to examine a lot of data to kind of create the programs.
That's the idea. And remarkably, this gets used for all kinds of things very successfully.
The image recognition, the language translation, the game playing, and it gets into the pieces of the software stack like databases and stuff like that. We're not quite sure how general purpose is, but that's going on independent hardware stuff. What's happening on the hardware side is Moore's Law is slowing down right. When we need a lot more cycles. It's failing us. It's failing us right when we need it, because there's going to be a greater and a greater increase in computing.
And then this idea that we're going to do so-called domain specific. Here's a domain that your greatest fear is you'll make this one thing work and that'll help, you know, five percent of the people in the world. Well, this looks like it's a very general purpose thing.
So the timing is fortuitous that if we can perhaps if we can keep building hardware that will accelerate machine learning, the the neural networks that'll beat the time will be right that that neural network revolution will transform the software, the so-called Software 2.0 and the software. The future will be very different from the software, the past and just are microprocessors, even though we're still going to have that same basic risk instructions to run a big pieces of software stack like user interfaces and stuff like that, we can accelerate the kind of the small piece that's computationally intensive.
It's not lots of lines of code, but it takes a lot of cycles to run that code that that's going to be the accelerator piece. So that's what makes this from a computer designer's perspective, a really interesting decade. But Hennessy and I talked about in the title of our touring speech is A New Golden Age. We we see this as a very exciting decade, much like when we were assistant professors and the risk stuff was going on. That was a very exciting time where we were changing what was going on.
We see this happening again, tremendous opportunities of people because we're fundamentally changing how software is built and how we're running it.
So which layer of the abstraction do you think most of the acceleration might be happening? If you look in the next 10 years or so, Google is working on a lot of exciting stuff with the Tipu sort of there's a culture of the hardware that could be optimisations around the Iraq, closer to the instruction set. That could be optimization of the compiler level. It could be even at the higher level software stack.
Yeah, it's going to be if you think about the old RECIST debate, it was both. It was software hardware, it was the compilers improving as well as the architecture improving. And that that's likely to be the way things are now with machine learning. They they're using domain specific languages. The languages like Tensor Flow and PI Torch are very popular with the machine learning people that those are the raising the level of abstraction. It's easier for people to write machine learning in these domain specific languages like like PI Torch and tensor flow.
So where that most optimization happens, so and so they'll be both the compiler piece and the hardware piece underneath it. So as you kind of the fatal flaw for hardware people is to create really great hardware, but not have brought along the compilers. And what we're seeing right now in the marketplace because of this enthusiasm around hardware for machine learning is getting, you know, probably billions of dollars invested in startup companies. We're seeing startup companies go belly up because they focus on the hardware but didn't bring the software stack along.
We talked about Benchmark's earlier, so I participated in machine learning, didn't really have a set of benchmarks. I think just two years ago they didn't have a set of benchmarks. And we've created something called Amelle PERF, which is machine learning benchmarks, sweet. And pretty much the companies who didn't invest in software stack couldn't run MLP very well. And the ones who did invest in software Stack did. And we're seeing, you know, like kind of in computer architecture, this is what happens.
You have these arguments about risk versus this. People spend billions of dollars in the marketplace to see who wins. And it's not it's not a perfect comparison, but it kind of sorts things out. And we're seeing companies go out of business. And then companies like like there's a company in Israel called Habana. They came up with Machine Learning Accelerator's. They had good Molpus scores.
Intel had acquired a company earlier called Nirvana a couple of years ago. They didn't reveal them first scores, which was suspicious. But a month ago, Intel announced that they're cancelling the Nirvana product line and they've bought Habana for two billion dollars. And Intel is going to be shipping Kobana chips, which have hardware and software, and run the MLP programs pretty well. And that's going to be their product line of the future.
Brilliant. So maybe just Olinger briefly. I'm a perv. I love metrics. I love standards that everyone can gather around. What are some interesting aspects of that portfolio of metrics?
But one of the interesting metrics is, you know what we thought it was? We I was involved in the start. You know, we that Peter Mattsson is leading the effort from Google. Google got it off the ground, but we had to reach out to competitors and say there's no benchmarks here.
This we think this is bad for the field. It'll be much better if we look at examples like in the risk days, there was an effort to create a for the the people in the risk community got together, competitors got together, a building risk microprocessors to agree on a set of benchmarks that were called spec and that was good for the industry is rather before the different risk architectures were arguing, well, you can believe my performance others, but those other guys are liars and that didn't do any good.
So we agreed on a set of benchmarks and then we could figure out who was faster between the various risk architectures, but it was a little bit faster. But that grew the market rather than, you know, people were afraid to buy anything. So we argued the same thing would happen with PERF know companies like Invidia were, you know, maybe worried that it was some kind of trap. But eventually we all got together to create a set of benchmarks and do the right thing.
Right. And we agree on the results. And so we can see whether to use or use or CPU's are really faster and how much the faster. And I think from an engineer's perspective, as long as the results are fair, you can live with it. OK, you know, you kind of tip your hat to to your colleagues at another institution. Boy, they did a better job than this. What you what you hate is if it's it's false.
Right. They're making claims and it's just marketing bullshit and, you know, and that's affecting sales. So from an engineer's perspective, as long as it's a fair comparison and we don't come in first place, that's too bad. But it's fair. So we wanted to create that environment for small perf.
And so now there's ten companies, I mean, ten universities and fifty companies involved.
So pretty much Anmol perf has is this is the way you measure machine learning. Performance and and it didn't exist even two years ago, one of the cool things that I enjoy about the Internet has a few downsides. But one of the nice things is people can see through both a little better with the president's of metrics. So it's really nice. Companies like Google and Facebook and Twitter now, it's the cool thing to do is to put your engineers forward and to actually show off how well you do on these metrics.
There's not sort of it. Well, there's less of a desire to do marketing less. So am I in my sort of naive. I think I was trying to understand that, you know, what's changed from the 80s in this era, I think because of things like social networking, Twitter and stuff like that, if you if you put up, you know, bullshit stuff. Right. That's just, you know, purposely misleading, you know, you can get a violent reaction.
And social media pointing out the flaws in your arguments. Right.
And so from a marketing perspective, you have to be careful today that you didn't have to be careful that there will be people who put off the floor. You can get the word out about the flaws in what you're saying much more easily today than in the past. You used to be used to be easier to get away with it. And the other thing that's been happening in terms of shutting off engineers is just in the software side. People have largely embraced open source software.
It was 20 years ago, it was a dirty word at Microsoft. And today, Microsoft is one of the big proponents of open source software, the kind of that's the standard way most software gets built, which really shows off your engineers, because you can see if you look at the source code, you can see who are making the committee, who's making the improvements, who are the engineers at all.
These companies who are are really great programmers and engineers and making really solid contributions, which enhances their reputations and the reputation of the companies.
So but that's, of course, not everywhere, like in the space that I work more in is autonomous vehicles. And they're still. The machinery of hype and marketing is still very strong there, and there's less willingness to be open in this kind of open source way and sort of benchmark. So and PERF is represents the machine learning world is much better at being open source about holding itself to standards of different demand, of incredible benchmarks in terms of the different computer vision, natural language processing, actually incredible.
You know, historically, it wasn't always that way. I had a graduate student working with me, David Martin. So in computer, in some fields, benchmarking is been around forever. So computer architecture, databases, maybe operating systems, benchmarks are the way you measure progress. But he was working with me and then started working with Jitendra Malik and he Jitendra Malik and Computer Vision Space.
I guess you've interviewed Jeff and Dave Martin. Tell me they don't have benchmarks. Everybody has their own vision algorithm in the way. Here's my image. Look at how well I do. And everybody had their own image. So David Martin, back when he did his dissertation, figured out a way to do benchmarks. He had a bunch of graduate students identify images and then ran benchmarks to see which algorithms run well. And that was, as far as I know, kind of the first time people did benchmarks in computer vision and which was predated all the things that eventually led to image net and stuff like that.
But then the vision community got religion.
And then once we got as far as image net, then that let the guys in Toronto be able to win the image that competition and then that changed the whole world is a scary step, actually, because when you enter the world of benchmarks, you actually have to be good to participate as opposed to.
Yeah, you can just you just believe you're the best in the world. I think the people I think they weren't purposely misleading. I think if you don't have benchmarks, I mean, how do you know? You know, you could have your intuition is kind of like the way we do computer architecture. Your intuition is that this is the right instruction set to do this job. I believe in my experience, my hunch is that's true. We had to get that, make things more quantitative to make progress.
And so I just don't know how, you know, fields that don't have benchmarks. I don't understand how they figure out how they're making progress.
We're kind of in the vacuum tube days of quantum computing. What are your thoughts in this wholly different kind of space of architectures?
You know, I actually you know, quantum computing is idea has been around for a while. And I actually thought, well, I sure hope I retire before I have to start teaching this.
I'd say because I talk about give these talks about the slowing of Moore's Law and, you know, when we need to change by doing domain specific accelerators, common questions say what about computing? The reason that comes up? It's in the news all the time. So I think to keep in the third thing to keep in mind is quantum computing is not right around the corner. There have been two national reports, one by the National Academy of Engineering and other by the computing consortium, where they did a frank assessment of of quantum computing.
And both of those reports said, you know, as far as we can tell, before you get error corrected, quantum computing, it's a decade away. So I think of it like nuclear fusion.
There have been people who've been excited about nuclear fusion a long time. If we ever get nuclear fusion, it's going to be fantastic for the world. I'm glad people are working on it. But, you know, it's not right around the corner. Those two reports to me say probably it'll be 20, 30 before quantum computing is something that could happen. And when it does happen, you know, this is going to be big science stuff. This is, you know, micro Kelvin, almost absolute zero things that if they vibrate, if a truck goes by, it won't work.
Right. So this will be in data center stuff. We're not going to have a quantum cell phone and it's probably a 20, 30 kind of thing. So I'm happy that our people are working on it. But just, you know, it's hard with all the news about it not to think that it's right around the corner. And that's why we need to do something as Moore's Law, as slowing down to provide the computing, keep computing getting better for this next decade.
And and, you know, we shouldn't be betting on quantum computing. I are expecting quantum computing to deliver in the next few years, it's it's probably further off, you know, I'd be happy to be wrong. It'd be great if quantum computing is going to commercially viable, but it will be a set of applications. It's not a general-purpose computation. So it's going to do some amazing things.
But there'll be a lot of things that probably, you know, the the old fashioned computers are going to keep doing better for quite a while.
And there will be a teenager 50 years from now watching this video saying, look how silly David Paterson was saying. I said I said 20, 30.
I didn't say so. I say never. We're not going to have quantum cell phones. So he's going to be watching.
Well, I mean, I think this is such a you know, given that we've had Moore's Law, I just I feel comfortable trying to do projects that are thinking about the next decade. I admire people who are trying to do things that are 30 years out. But it's such a fast moving field.
I just don't know how to I'm not good enough to figure out what what's the problem is going to be in 30 years. 10 years is hard enough for me.
So maybe if it's possible to untangle your intuition a little bit, I spoke with Jim Keller. I don't know if you're familiar with Jim. And here he is trying to sort of be a little bit rebellious and to try to think that he quotes me as being wrong. Yeah.
So this is what the way for the rich. For the record, Jim talks about that. He has an intuition that Moore's Law is not, in fact, in fact, dead yet and that it may continue for some time to come.
What are your thoughts about Jim's ideas in this space?
Yeah, this is just this is just marketing.
So what Gordon Moore said is a quantitative prediction. We can check the facts. Right, which is doubling the number of transistors every two years so we can look back at Intel for the last five years and ask him, let's look at DRAM chips six years ago. So that would be three, two year periods. So then our DRAM chips have eight times as many transistors as they did six years ago. We can look at Intel microprocessors six years ago.
If Moore's Law is continuing, it should have eight times as many transistors as six years ago. The answers in both of these cases is no. The problem has been because Moore's Law was kind of genuinely embraced by the semiconductor industry, is they would make investments in severe equipment to make Moore's Law come true, semiconductor improving. And Moore's Law in many people's minds are the same thing. So when I say and I'm factually correct that Moore's Law is no longer holds, we are not doubling the transistors every use years.
The downside for a company like Intel is people think that means it's stopped, that technology has no longer improved. And so Jim is trying to counteract the impression that semiconductors are frozen in 2019, are never going to get better. So I never said that.
All I said was Moore's Law is no more. And I'm strictly looking at the number of transistors because that's what more that's what Moore's Law is. There's the I don't know, there's been this aura associated with Moore's Law that they've enjoyed for 50 years about look at the field we're in. We're doubling transistors every two years. What an amazing field, which is an amazing thing that they were able to pull off. But even as Gordon Moore said, you know, exponential can last forever.
It lasted for 50 years, which is amazing. And this is a huge impact on the industry because of these changes that we've been talking about. So he claims because he's trying to act and he claims, you know, Patterson says Moore's Law is no more. And look at all look at it. It's still going. And TSMC say it's all but but there's plenty of evidence that Moore's Law is not continuing. So what I say now to try and OK, I understand the perception problem when I say Moore's Law stopped, OK, so now I say Moore's Law is slowing down.
And I think, Jim, which is another way, if he's if it's predicting every two years and I say it's slowing down, then that's another way of saying it doesn't hold anymore. And I think Jim wouldn't disagree that it's slowing down because that sounds like it's things are still getting better, just not as fast, which is another way of saying Moore's Law isn't working anymore. It's still good for marketing.
But but what's your you're not you don't like expanding the definition of Moore's Law? Sort of.
Well, naturally, as an educator, you know, it's it's like bond and politics. It's everybody get their own facts or do we have you know, Moore's Law was crisp. You know, it was Carver Mead looked at his Moore's compositions, drawing on a log log scale, a straight line. And that's what the definition of Moore's Law is. There's this other what Intel did for a while.
Interestingly, before Jim joined them, they said, oh, no, Moore's Law is in the number of doubling isn't really doubling transistors every two years. Moore's Law is the cost of the individual transistor going down, cutting in half every two years. Now, that's not what he said, but they reinterpreted it because they believed that the cost of transistors was continuing to drop even if they couldn't get twice as many chips. Yes. Many people in industry have told me that's not true anymore, that basically than in more recent technologies that got more complicated, the actual cost of transistor went up.
So even even the corollary might not be true. But certainly, you know, Moore's Law, that was the beauty of Moore's Law. It was a very simple it's like Eagles EMC squared, right? It was like, wow, what an amazing prediction. It's so easy to understand. The implications are amazing. And that's why it was so famous as a as a prediction. And this this reinterpretation of what it meant and changing is is revisionist history.
And I said I'd be happy. And they're not claiming there's a new Moore's Law. They're not saying, by the way, it's instead of every two years, it's every three years. I don't think they I don't think they want to say that. I think what's going to happen is the new technology innovations. Each one is get a little bit slower. So it is slowing down. The improvements won't be as great. And that's why we need to do new things.
Yeah, I don't like that the idea of Moore's Law is tied up with marketing.
It would be nice if it's whether it's marketing or it's it's a well, it could be affecting business, but it could also be infecting the imagination of engineers is if if Intel employees actually believe that we're frozen in twenty nineteen. Well that's, that would be bad for Intel, the not just Intel, but everybody since Moore's Law is inspiring.
To everybody. But what's happening right now, talking to people who have working in national offices and stuff like that, a lot of the computer science community is unaware that this is going on, that we are in an era that's going to need radical change at lower levels that could affect the whole software stack. This, you know, if if if Intel if you were using cloud stuff and the servers that you get next year are basically only a little bit faster than servers you got this year, you need to know that.
And we need to start innovating to start delivering on it. If you're counting on your software, your software, going to a lot more features, assuming the computers can get faster, that's not true. So are you going to have to start making your software stack more efficient? Are you going to have to start learning about machine learning? So it's you know, it's kind of a it's a warning or call for arms that the world is changing right now.
And a lot of people have computer science are unaware of that.
So a way to try and get their attention is to say that Moore's Law is slowing down and that's going to affect your assumptions. And we're trying to get the word out. And when companies like TSMC and Intel say, oh, no, no, no, Moore's Law is fine, then people think, OK, I don't have to change my behavior, I'll just get the next servers. And, you know, if they start doing measurements, they'll realize what's going on.
It'd be nice to have some transparency and metrics for for the layperson to be able to know if computers are getting faster. Not to forget. There are. Yeah, there are.
There are a bunch of people kind of use clock rate as is a measure performance know it's not a perfect one. But if you've noticed, clock rates are more or less the same as they were five years ago. Computers are a little better than they are, aren't they? They haven't made zero progress, but they've made small progress. So there's some indications out there and in our behavior, right. Nobody buys the next laptop because it's so much faster than the laptop from the past for cell phones.
I think I don't know why people buy new cell phones, you know, because the new ones announced the cameras are better.
But that's kind of domain specific, right? They're putting special purpose hardware to make the processing of images go much better. So that's that that's the way they're doing it. They're not particularly it's not that the arm processor in there is twice as fast as much as they've added accelerators to help the experience of the phone.
Can we talk a little bit about one other exciting space, arguably the same level of impact as your work with risk is read in your. In nineteen eighty eight, you co-authored a paper, a case for redundant arrays of inexpensive disks, hence orated rate. So that's where you introduced the idea of Ra'ed. Incredible that that little I mean, little that paper kind of had this ripple effect and had a really a revolutionary effect. So first, what is right is right.
So this is work I did with my colleague Randy Katz and a star graduate student, Garth Gibson. So we had just done the fourth generation risk project.
And Randy Katz, which had at early Apple Macintosh computer at this time, everything was done with floppy disks, which are old technologies that could store things that didn't have much capacity and you had to to get any work done.
You're always sticking your little floppy disk in and out because they didn't have much capacity. They started building what are called hard disk drives, which is magnetic material that can remember information storage for the Mac. And Randy asked the question when he saw this disk next to his Mac.
Gee, these are brand new, small things before that for the big computers that the disk would be the size of washing machines. And here's something the size of a kind of the size of a book or so I wonder what we could do with that.
Well, we read he was involved in the and the fourth generation project here at Berkeley A.D. So we figured out a way how to make the computation part, the processor part a lot faster. But what about the storage part, the can we do something to make it faster?
So we hit upon the idea of taking a lot of these disks developed for personal computers at Macintoshes and putting many of them together instead of one of these washing machine sized things. And so we worked the wrote the first draft of the paper and we'd have 40 of these little pictures instead of one of these washing machine sized things. And they would be much cheaper because they're made for PCs and they could actually kind of be faster because there is 40 of them rather than one of them.
And so he wrote a paper like that and sent it to one of our former Berkeley students at IBM. And he said, well, this is all great and good, but what about the reliability of these things? Now you have 40 of these devices, each of which are kind of PC quality. So they're not as good as these IBM washing machines.
IBM dominated the the the storage Ghengis. So the reliability can be awful. And so when we calculated it out, instead of, you know, it breaking on average once a year, it would break every two weeks.
So we thought about the idea and said, well, we got to address the reliability. So we did it originally performance, but we had the reliability. So the name redundant array of inexpensive disks is array of these disks, inexpensive life for PCs. But we have extra copies. So if one breaks, we won't lose all the information will have enough redundancy that we could let some break and we can still preserve the information. So the name is an array of inexpensive disk.
This is a collection of these pieces and are part of the name was the redundancy. So they'd be reliable. And it turns out if you put a modest number of extra disks in one of these arrays, it could actually not only be as faster and cheaper that one of these washing machine disks, it could be actually more reliable because you could have a couple of breaks even with these cheap disks, whereas one failure with the washing machine thing would knock it out.
Did you did you have a sense, just like the risk, that in the 30 years that followed a raid would take over?
As much as I think I'd say it, I think I'm naturally an optimist, but I thought our ideas were right.
I thought kind of like Moore's Law. It seemed to me if you looked at the history of the disk drives, they went from washing machine sized things and they were getting smaller and smaller. And the volumes were with the smaller disk drives because that's where the PCs were. So we thought that was a technological trend. That disk drives the the volume of disk drives was going to be smaller, getting smaller and smaller devices, which were true. They were the size of the, I don't know, eight inches diameter than five inches than three inches diameter.
And so that it made sense to figure out how to deal things with an array of this.
So I think it was one of those things where logically we think the technological forces were on our side, that it made sense. So we expected that to catch on. But there was that same kind of business question. Know IBM was the big pusher of these disk drives in the real world where the technical advantage get turned into a business advantage or not, it proved to be true.
And so, you know, we thought we were technically and it was unclear whether the. Business side, but we kind of as academics, we believe that technology should win, and it did. And if you look at those 30 years, just from your perspective, are there interesting developments in the space of storage that have happened in that time? Yeah, the big thing that happened was a couple of things that happened. What we did had a modest amount of storage.
So as redundancy is people built bigger and bigger storage systems, they've added more redundancy. So they get had more failures. And the biggest thing that happened in storage is for decades it was based on things physically spinning, called hard disk drives.
We used to turn on your computer and it would make a noise. What that noise was, was the disk drives spinning and they were rotating at like 60 revolutions per second. And it's like, if you remember the vinyl vinyl records, if you have ever seen those. That's what it looked like. And there was like a needle like on a vinyl record that was reading it. So the big drive change is switching that over to a similar technology called flash.
So within the last I'd say about decade is an increasing fraction of all the computers in the world are using semiconductor for storage. The flash drive, instead of being magnetic, their optical, they're they're well, they're semiconductor writing of information is very densely.
And that's been a huge difference.
So all the cell phones in the world use flash, most of the laptops use flash, all the embedded devices use flash instead of storage still in the cloud.
Magnetic disks are more economical than flash, but they use both in the cloud. So it's been a huge change in the storage industry. Thus the switching from primarily disk to be primarily semiconductor for the individual disk.
But still, the rate mechanism applies to those different kinds of.
Yes, the people will still use great ideas because it's kind of what's different, kind of interesting, kind of psychologically. If you think about it, people have always worried about the reliability of computing since the earliest days.
So kind of.
But if we're talking about computation, if your computer makes a mistake and the computer says the computer has ways to check and say, we screwed up, we made a mistake, what happens is that program that was running, you have to redo it, which is a hassle for storage. If you've sent important information away and it loses that information, you go nuts.
This is the worst I. Oh, my God.
So if you have a laptop and you're not backing it up on the cloud or something like this and your disk drive breaks, which it can do, you'll lose all that information and you just go crazy. Right. So the importance of reliability for storage is tremendously higher than the importance of reliability for computation because of the consequences of it. So, yes, a great ideas are still very popular, even with the switch of the technology, although flash drives are more reliable.
You know, if you're not doing anything like backing it up to get some redundancy so they handle it, you're you're taking great risks. You said that for you and possibly for many others, teaching and research don't conflict with each other, as one might suspect, and in fact, they kind of complement each other. So maybe a question I have is, how is teaching helped you in your research or just in your.
ENTIRETY as a person who both teaches and does research and just thinks and creates new ideas in this world.
Yes, I think I think what happens is, is when you're a college student, you know, there's this kind of tenure system and doing research. So kind of this model that, you know, is popular in America.
I think America really made it happen is we can attract these really great faculty to research universities because they get to do research as well as teach. And that especially in fast moving fields, this means people are up to date and their teaching and things. So but when you run into a really bad professor, a really bad teacher, I think the students think, well, this guy must be a great researcher because why else could he be here? So it's I you know, I after 40 years at Berkeley, we had a retirement party and I got a chance to reflect and I looked back at some things.
That is not my experience.
There's a I saw a photograph of five of us in the department who won the Distinguished Teaching Award from campus, a very high honor when I've got one of those one, the highest honors. So there are five of us on that picture.
There's Manuel Bluhm, Richard Karpe me, Randy Carson, John Ostrow, contemporaries of mine. I mentioned Randy already. All of us are in the National Academy of Engineering. We've all won the Distinguished Teaching Award. Bluhm, Karp and I are all have Turing Awards scoring awards, the highest award in computing. So it's the opposite, right? It's what's happens if it's it's they're highly correlated. So probably the way to think of it, if you're very successful, people are maybe successful in everything they do.
It's not an either or.
And it's an interesting question whether specifically that's probably true, but specifically for teaching, if there is something in teaching that as the Richard Feynman. Is there something about teaching that actually makes your research, makes you think deeper and more outside the box?
And yeah, absolutely. I was going to bring up Feynman. I mean, he he criticized the Institute of Advanced Studies. He so this advanced study was this thing that was created near Princeton where Einstein and all these smart people went. And when he was invited, he thought it was a terrible idea. This is a university was it was supposed to be heaven, right? A university without any teaching. But he thought it was a mistake is getting up in the classroom and having to explain things to students and having them ask questions like, well, why is that true?
Makes you stop and think so. He thinks he thought and I agree.
I think that interaction between a research university and having students with bright young minds asking hard questions the whole time is synergistic. And, you know, a university without teaching wouldn't be is vital and exciting a place.
And I think it helps stimulate the the research.
Another romanticized question, but what's your favorite concept idea to teach, what inspires you? You see inspired the students. Is there something to pass them by or puts the fear of God in them? I don't know whichever is most effective.
I mean, in general, I think people are surprised. I've seen a lot of people who don't think they like teaching come, come give guest lectures or teach a course and get hooked on seeing the lights turn on. It is people you can explain something to people that they don't understand and suddenly they get something, you know, that's not that's important and difficult. And just seeing the lights turn on is, you know, it's a real satisfaction there.
I don't think there's any specific example of that. It's just the general joy of seeing them and seeing them understand.
I have to talk about this because I've wrestled. Oh, yeah. Yeah. Oh, yeah. I love wrestling. I'm a huge I'm Russian, so I'm sure I have to talk to Dan Gable.
Oh yeah. I guess so.
Yeah. Gable's my era kind of guy. So you wrestled UCLA, among many other things you've done in your life competitively in sports and science on you.
You've wrestled maybe again continue the romance questions, but what have you learned about life and maybe even size from wrestling or from.
Yeah, that's in fact, I wrestled at UCLA, but also at El Camino Community College. And just right now, we were in the state of California. We were state champions at El Camino. And in fact, I was talking to my mom and I got into UCLA, but I decided to go to the community college, which is it's much harder to go to UCLA, the community college. And I guess why didn't I make the decision? Because I thought it was because of my girlfriend.
She said, well, it was the girlfriend. And you thought the wrestling team was really good. And we were right. We had a great wrestling team. We actually wrestled against UCLA at a tournament and we beat UCLA is a community college which just freshmen and sophomores and the Padres and I brought this up is I'm going to go they've invited me back at El Camino, give a lecture next month. And so I'm with my friend who is on the wrestling team and that we're still together.
We're right now reaching out to other members of the wrestling team we can get together for. But in terms of me, it was a huge difference.
I was I was both I was kind of the age cutoff. I was it was December 1st. And so I was almost always the youngest person in my class. And I matured later on, you know, our family budget later. So it's almost always the smallest guy.
So, you know, I took, you know, kind of nerdy courses, but I was wrestling. So wrestling was huge for my, you know, self-confidence in high school. And then, you know, I kind of got bigger at El Camino and then college. And so I had this kind of physical self-confidence. And it's translated into research, self confidence. And and also kind of I've had this feeling even today in my seventies, you know, if something if something going on in the streets that is bad physically, I'm not going to ignore it.
I'm going to stand up and try and straighten that out.
And that kind of confidence just carries through the entirety of your life. Yeah. And the same things happens intellectually. If there's something going on where people are saying something that's not true, I feel it's my job to stand up just like I was in the street. If there's something going on, somebody attacking some woman or something, I'm not I'm not standing by and let that get away. So I feel it's my job to stand up. So it's kind of ironically translates the other things that turned out for both.
I had really great college in high school coaches and they believed even though wrestling is an individual sport that will be more successful as a team, if we bonded together, do things that we would support each other rather than everybody in wrestling, it's a one on one. And you could be everybody's on their own. But he felt if we bonded as a team, we'd succeed. So I kind of picked up those skills of how to form successful teams and how do you from wrestling.
And so I think when most people say one of my strengths is I can create teams of faculty, large teams of faculty, grad students, pull all together for a common goal and, you know, and often be successful at it.
But I got I got both of those things from wrestling. Also, I think I heard this line about if people are in kind of. You know, collision, you know, sports with physical contact, like wrestling or football and stuff like that, people are a little bit more, you know, assertive or something.
And so I think I think that also comes through is, you know, and I was I didn't shy away from the debates. You know, I was I enjoyed taking on the arguments and stuff like that. So it was it was a I'm really glad I did wrestling. I think it was really good for my self-image and I learned a lot from it. So I think that's, you know, sports done well. You know, there's really lots of positives you can take about it.
Leadership, you know, how to how to form teams and how how to be successful.
So we've talked about metrics a lot. There's a really cool in terms of bench press and weightlifting polynyas metric you develop that we don't have to talk about. But it's a really cool thing that people should look into. It's rethinking the way we think about metrics and weightlifting. But let me talk about metrics more broadly since that appeals to you and all forms. Let's look at the most ridiculous, the biggest question of the meaning of life. If you were to try to put metrics on a life well lived, what would those metrics be?
Know a friend of mine, Randy Katz, said this, he said, you know, when when it's time to sign off, it's it's the measure isn't the number of zeros in your bank account. It's the number of inches in the obituary in The New York Times. He said it. I think, you know, having.
And, you know, the cliche is that people don't die wishing they'd spent more time in the office, right. As I reflect upon my career, there have been, you know, a half a dozen or a dozen things say I've been proud of. A lot of them aren't papers or scientific results. Certainly my family, my wife, we've been married more than 50 years, kids and grandkids. That's really precious education things I've done. I'm very proud of, you know, books and courses.
I did some help with underrepresented groups that was effective. So it was interesting to see what were the things I reflected.
You know, I had hundreds of papers, but some of them were the papers like the risk and write stuff I'm proud of.
But a lot of them, whether or not those things so people who are just spend their lives going after the dollars are going after all the papers in the world. You know, that's probably not the things that are afterwards you're going to hear about.
When I was just when I got the offer from Berkeley before I showed up, I read a book where they interviewed a lot of people in all walks of life. And what I got out of that book was the people who felt good about what they did was the people who affected people as opposed to things that were more transitory. So I came into this job assuming that it wasn't going to be the papers is going to be relationships to the people over time that I would I would value.
And that was a correct assessment. Right. It's it's the people you work with, the people you can influence, the people you can help. It's the things that you feel good about towards in your career. It's not not the the stuff that's more transitory. I don't think there's a better way to end it than talking about your family, the over 50 years of being married to your childhood sweetheart.
I think I can add is how when you tell people you've been married 50 years, they want to know why, how, why.
I can tell you the nine magic words you need to say to your partner to keep a good relationship in the nine magic words. Ah, I was wrong.
You were right. I love you. OK, and you got to say all nine. You can't say I was wrong. You were right. Your jerk. You know, you can't say so freely acknowledging that you made a mistake.
The other person was right and that you love them really gets over a lot of bumps in the road. So that's what I pass along beautifully.
But David is a huge honor. Thank you so much for the book you've written, for the research you've done for changing the world. Thank you for talking today.
Oh, thanks for the interview. Thanks for listening to this conversation with David Paterson and thank you to our sponsors, the Jordan Harbage, a show and catch up. Please consider supporting this podcast by going to Jordan Harbage complex and downloading cash app and using collects podcast. Click the links, buy the stuff. The best way to support this podcast and the journey I'm on. If you enjoy this thing, subscribe on YouTube, review it with five in a podcast supporting a patron or connect with me on Twitter, Allex Friedman spelled without the E, try to figure out how to do that is just F.R..
I'd Amaan. And now let me leave you with some words from Henry David Thoreau. Our life is frittered away by detail. Simplify. Simplify. Thank you for listening and hope to see you next time.