Sept. 13, 2023

S4E7: Google DeepMind’s Clément Farabet on AI Reasoning

S4E7: Google DeepMind’s Clément Farabet on AI Reasoning

Our guest for Episode 7 is Dr. Clément Farabet, VP of Research at Google DeepMind. For the past 15 years, Dr. Farabet’s work has been guided by a central mission: figuring out how to build AI systems that can learn on their own — and ultimately redefine how we write software. We discuss the conundrum in the Chinese Room Argument to explore whether computers can achieve artificial general intelligence.

In this season of Theory and Practice, we explore newly emerging human-like artificial intelligence and robots — and how we can learn as much about ourselves, as humans, as we do about the machines we use. As we near the end of Season 4, we explore whether decision-making and judgment are still the final preserve of humans.

 

Our guest for Episode 7 is Dr. Clément Farabet, VP of Research at Google DeepMind. For the past 15 years, Dr. Farabet’s work has been guided by a central mission: figuring out how to build AI systems that can learn on their own — and ultimately redefine how we write software. We discuss the conundrum in the Chinese Room Argument to explore whether computers can achieve artificial general intelligence. 

 

Dr. Farabet outlines four modules required for computers to demonstrate understanding. These modules include a predictive model of its environment that can create a representation of its world and an ability to store memories. He also points to the ability to perform reasoning about possible futures from its representation and memories. And finally, he explains how the ability to act in the world is key to illustrating understanding.

 

Dr. Fabaret believes that we can build computers to become more human-like than most people may realize, but the overarching goal should be to build systems that improve human life.

 

 

Transcript

Anthony  00:00

Welcome to GV Theory and Practice. This series is exploring what it means to be human in the age of human-like AI. 

 

I'm Anthony Philippakis.

 

Alex  00:09

And I'm Alex Wiltschko.

 

Anthony  00:11

This is our seventh episode. And today we're going to be talking about computers using judgment and making decisions. These really are tasks that until now have been the preserve of humans. So for example, which young person should we operate on if we suspect appendicitis? And what should we watch carefully hoping the appendix doesn't burst and that we have saved them from an unnecessary operation? Similarly, what cancer treatment regime is optimal for each person given the prognosis of their cancer and their own circumstances? These are human to human decisions today. But is it possible that computers could one day do this better than us?

 

Alex  00:49

Judgment and decision making are really one of the key properties for me in determining if we've created machines that have intelligence. I don't know about general intelligence, that's a definition that keeps slipping away into the horizon. But because to use judgment, you have to have understanding and like you said there that judgment is mixed with empathy, and wanting the best for each particular patient. I'm not sure that AI today really has all of this. 

 

Anthony  01:19

This is such a fascinating topic, Alex, why don't we do our Hammer and Nails discussion first, and look in depth at the issue of computers understanding? Before we dive into the details with our guest, for new listeners, as a reminder, this is the section of the podcast where in honor of our in person meetups in Boston many moons ago, where we would discuss a solution i.e. a hammer, or a problem, a nail, looking for a solution.

 

Alex  01:46

Okay, well, let's go for it. So do you have a nail or a hammer today, Anthony,

 

Anthony  01:51

I have what used to be a very good hammer. It goes back to a philosopher named John Searle in the 1980s, who called it the Chinese Room Argument. And it was a thought experiment about whether or not we could ever have a computer with human-like understanding and general intelligence. And one thing I will say is here, the word Chinese room, it's a little bit archaic, wouldn't necessarily use that metaphor today. But at the time, it was just meant to represent Chinese as a language that most people in the English speaking world didn't understand. 

 

And so here's kind of roughly how it works. So let's imagine we have a person in a room who's locked inside of it, and has no windows, no sound. And that person does not speak Chinese. Outside the room, you have a person that is fluent in Chinese, and can slip under the door cards with Chinese characters into the person inside the room. Could we imagine an experiment or a way of setting this up, where the person outside the room could be tricked into thinking that the person inside the room knows Chinese, even though they don't?

 

Alex  03:07

Oh, interesting. Okay, so we've got one person who's fluent in Chinese and is trying to have a conversation with somebody on the other side of the door. And all they can do is put characters underneath the door, and they get characters back on a new card. And the thought experiment is, is there a way for the person who's fluent in Chinese to think that somebody or something on the other side, actually is thinking and understanding and reasoning and responding in Chinese even though that's not true? Is that the idea? 

 

Anthony  03:35

Exactly right. And so here's the way you might set it up, is: give the person in the room, a really elaborate set of rules on how to string the characters together, when they respond based on the input. And it might be deterministic rules, like if this string of characters then this, or honestly, it might be something a little bit more like ChatGPT, where you have some kind of probabilistic predict the next character in response. And so you could imagine a situation where the person is able to string together responses of characters and trick the person on the other side of the room into thinking that they actually understand.

 

Alex  04:20

Interesting. So I remember this, it was the argument that I guess slayed the Turing test that we discussed in our first episode, the idea that if a computer can pass as a human, then we can regard it as intelligent and Searle's thought experiment here shows, that creating the perfect sentences that follow grammar, that have great structured syntax that contain what we believe to be meaning or purpose, that doesn't imply understanding, but there were a number of refutations of the Chinese room argument. So maybe we should go over the most salient ones for today's discussion. What do you think?

 

Anthony  04:57

Yeah, I think that'd be perfect. You know, Alex, you thought about this a lot, so I'll let you dive in.

 

Alex  05:01

So there were two key arguments against Searle's thought experiment. And the first is the virtual mind reply. So, not all parts of the system have to have understanding, but as a whole meaning is created. So the person in the room, the person that's actually looking through the manual, and kind of being a manual computer, they might not have understood what they wrote. But the person outside understood it. So there was meaning created. So for example, a virtual agent, such as a character, in a computer game, may show meaningful goal directed behavior and language, even though when you dig into the code that supports that, the individual parts of the computer programme might not have understanding. So another way to say that is the person that follows the rulebook might not have understanding, but the combination of the person and the rulebook together, actually, inside of that, there is understanding and meaning. Does that make sense?

 

Anthony  05:55

I think so. So this is the level of description fallacy.

 

Alex  05:58

So exactly. The other refutation adds another thought experiment, another layer of learning and meaning that I think is an interesting one to consider, which is, what happens if the person in the room internalizes, this Chinese syntax programme memorizes, all these books, because we've been using them so often. But again, they're locked in a room, right? So they might have been having conversations about apples and bananas, and all kinds of things that are real in the world. But then the door unlocks, and they go out into the world. And they have eyes and ears, they have other sensors. And they're able to have the same kinds of robotic conversations in Chinese but all of a sudden, they can now associate what they see and what they hear and what they feel and what they smell with the language that they have robotically memorized. And maybe at that point in time, associating the symbol in Chinese for apple, with the visual and auditory and sensory impressions of apple starts to create a model of the world. A true, what we would call understanding of the Chinese language. However, Searle doesn't think that either of these refutations, either of these arguments against the Chinese room argument is any stronger than the other. So for instance, if you let a person who's memorized the rulebook out into the world with eyes and ears, all the sensors can do is provide additional input to the computer, and it will just be syntactic input. It's as if there's just more information on the cards being slipped under the door.

 

Anthony  07:35

That's fascinating. I'll be honest, I find the first refutation a little bit more compelling. And let me kind of riff on it with you for a second. Going back to this rulebook, let me go meta for a second. As I said, the rulebook could be deterministic ..If then, but it could also be the person inside the room has access to a probabilistic rulebook that predicts the next characters. And so in some sense, they have ChatGPT. And it gives ChatGPT a string of characters? And then somebody gives a response? Sure, the person inside the room doesn't know Chinese. But how do I know that that rule book that gives him the answer doesn't know Chinese itself? And in fact, how do I know that it's ChatGPT, rather than another human being, you know, putting stuff into the computer. And so it almost ends up being a kind of self loop of, you know, layers upon layers of interpretation,

 

Alex  08:34

Completely. This is, I think, an interesting set of, I guess, branching trees to go down of ..where's meaning actually stored. And we've touched on this a couple times before, but the grandfather of the computer age, Claude Shannon, wrote a very famous but small book, about the structure of information, he kind of created and then closed a field of mathematics all in one go. And very famously, he describes the flow of bits and bytes through communication channels, and says, but the role of meaning is going to be left out completely of this discussion. And we've kicked the can down the road. And now here we are, where we have these systems that can manipulate and store and transmit information with such incredible fidelity that we're running up into this definition of, well, it looks like there's meaning here. So what is it really, we have to actually revisit that thing that Claude Shannon put off so, so long ago? Now, I think it's time that we bring in a practitioner, bring in an expert, what do you think?

 

Anthony  09:36

I think, perfect. And before we do, the book you referenced is the original paper, as you know, was called A Mathematical Theory of Communication. I have a funny story early on in my relationship with my wife, we had one of our first fights and it was basically about how she felt like I wasn't expressing my feelings enough and opening it up enough to her. And so as part of the kind of reconciliation process, I was like, Honey,  I totally understand you, and I love you. And you're right. And I bought a book to like, help me think through how to better communicate with you. And then, like I showed her that book. And it's because like, there's just like looking at her eyes like, oh, he really understood me. And he bought a book to like, you know, self reflect and, you know, improve himself. And then I mean, at that moment, I knew she was either going to laugh and love me, or dump me on the spot. And thankfully, she laughed and loved me.

 

Alex  10:31

That's lovely. You found a wonderful partner if she gets the joke and gets you. That's awesome.

 

Anthony  10:41

Let's welcome our guest Clément Farabet, the VP of research for Google DeepMind. On his website, he describes himself as the following: part engineer, part entrepreneur, part mad scientist, building AI Google DeepMind. According to him, in the past 15 years, I've been driven by an obsession, figuring out how to build AI systems that can learn on their own, and ultimately redefine how we write software. So he is someone who has thought deeply about making computers understand and make their own judgments. Clément, welcome to GV theory and practice.

 

Clément  11:15

It's great to be here. Thank you so much for having me today.

 

Alex  11:19

Well, thanks so much for joining us, I think we should just dive right in. Me and Anthony have been just talking about the Chinese Room Argument as a way to start thinking about machines understanding and making their own decisions. And we got to the point where you can argue that if the person in the room is a robot loaded with the Chinese symbol writing programme, and it has sensors to see and hear, you can start to attach meaning to those symbols such as that for an apple or a banana or a cloud. Can a computer understand what those things are? What an apple is? And if so, where is that understanding taking place?

 

Clément  11:55

I think so. So I'm sort of like a firm believer that there will eventually be no difference between a machine performing all sorts of cognitive tasks and a biological system. They will be radically different in terms of how they actually implement those tasks. But I do believe that the machine will essentially be able to convert the symbols into an internal representation that will, you know, let it reason about what a banana is, be able to write about a banana using symbolic language, being able to understand what a banana looks like and project it in the physical 3D, four dimensional world we live in. 

 

And we already see the sort of like emerging phenomenon in models like ChatGPT, and automata that we have on the market now, around the fact that these models, though they were trained exclusively on language actually emerge a pretty deep understanding and a sort of like, almost grounded view of what these underlying concepts are. And so we're just at the beginning, these models being still only trained on data which have limitations. My belief is that as those models get trained increasingly, on more and more modalities, like visual stimuli, you know, text and other modalities, they will actually end up emerging fairly similar internal models of the world as we have as humans.

 

Anthony  13:17

So one of the things you've thought about deeply, Clément, is how computers can have judgment and make decisions. So as we think about that, can you give us an overview of what components are needed to attack that problem?

 

Clément  13:31

Yeah, so I think there's probably three or four modules that are required.  The number one is some kind of like basic ability for the system to have a predictive model of its environment so that the system can have a representation of the world it's evolving in. On top of that, it requires something like memory, and the ability for the system to actually store what's happening in its world as well as what it wants to be able to perform in this memory system, so that it can actually reason things not only in the short term, but in the medium term and long term. On top of that, it needs the ability to essentially perform reasoning of all those internal symbols that emerged. So essentially the act of projecting different futures, and then reason about those and pick whichever is best. And then finally, it requires the ability to act in the world, which is based on this internal model memory, and reasoning, the ability to actually decide you know what to do next. And that I think, is sort of like, the most basic competence that you need. There's probably more than that. But let's start with that.

 

Anthony  14:37

Perfect. So let's walk these through a little bit, one at a time. So first, when you talk about the first and the last, the ability to predict the future, and then the ability to kind of make a decision and act on it. What's the difference between those two? Is it about causality?

 

Clément  14:53

I think the difference between the two is that one is fundamentally passive. You're observing what's happening in the world, by the way, including your actions, but you're treating yourself as an external factor, right? You're observing what's happening in the world and you're building a predictive model of that. So you know, each time the light in the street goes red, it seems that vehicles stop, that's already a very high level statement. By the way, we're already assuming a level of perception here. But that's a very passive, sort of like, system, right? Whereas the second one is basically fundamentally trying to decide what the agent should actually do in this world. Where should it go next? You know, if it's equipped with arms and body, how should it move in the world? If it's equipped with the ability to type text? What should it type next? These are two fundamentally different behaviors, but they sort of like, coexist.

 

Alex  15:43

So if I can jump in, and take Anthony's framing of the question and apply it to other different things. One is prediction, which you mentioned. And then the other was reasoning about the world. And I'd like to understand a bit more about reasoning, because I hear a couple different concepts in there. So one is simulating possible futures. The other is passing judgments or ascribing values to possible actions or possible futures. So how do you take a scalpel and differentiate the ability to predict the next moment in the world? And then I guess, do the forecasting or reasoning piece. So help me understand how those are different?

 

Clément  16:18

Yes, I think that there's different segmentations that are possible here. But if you try to stick with what I said around the predictive model, if you assume that the predictive model is completely passive, all it's doing is it's observing the world and trying to assign a probability distribution to outcomes, then the reasoning layer is going to do a little more than that, what it's going to do is it's going to explore these different paths in the tree of possible futures. And for each of those, essentially assign value around specific outcomes. So for instance, if I observe X, and then I'm going to act and do Y, Z is going to happen, and have the ability to hold these different futures in your mind or in the mind of the machine. And then based on those finally decide to act. And so these are sort of like three complementary systems. And then finally, memory is the fourth one, which is very useful to retain a solid, like more lengthy examples of those things, and bring them back in the future. But there's also an interesting relationship between memory and those other components. Because in some sense, even the predictive model is actually memorizing a lot of things in its internal representation. But yeah, I do think that those three things are sort of like, separate in how they operate?

 

Anthony  17:29

So let me ask one question about memory, is that a solved problem, mean, we're pretty good at being able to record things?

 

Clément 17:37

So it is not solved, in the sense that it is difficult. So we've solved memory for traditional computer systems, right? If you look at a more traditional computer programme, when you have this concept of you know, you can store something in memory and retrieve it explicitly. And the programme is basically doing that in a very deterministic way. Once you move into a paradigm where you're sort of like, reasoning machine, and the predictive model, you have underlying it, all, you know, neural networks, dealing with sort of a continuous representations of the world, then the concept of memory becomes very fuzzy as well. And so we've demonstrated ways to equip a model with memory of different types. But I wouldn't treat that as a solved problem yet, I think we have a lot more research to do and explore this space more deeply, to actually really get to a good place.

 

Alex  18:27

So let's go one step back to the predictive model of the world. And one thing that has always kind of shocked me about just being a human being is that we exist on a tape delay, like we are constantly making predictions about what the world should look like because there's no way that we can have access to the information that is right now, and actually be acting on it. So something our brains are doing must be continually maintaining a model of the world that's at least 30 milliseconds out of date, which just always blows my mind that whatever you're experiencing, is happening in the past, there is no now, how does this work for a computer? How do you build that kind of a predictive model that can understand what the right thing is that or the most likely thing is that should come next?

 

Clément  19:10

That is actually such an exciting topic, by the way, the fact that we do live in the past and it's something that you know, once you realize that you realize the power of the brain to maintain this illusion of the now and the state and how to act upon it. Computers are fundamentally wired the same way, right? Because you do have the same types of latencies in processing any input stream of information. So if you look at a self-driving car, for instance, it's going to be equipped with, you know, eight cameras, 12 cameras surrounding the vehicle, it could be equipped with lidars and all sorts of sensors to actually help it perceive what's happening around the vehicle. The latency from what happens in the physical world to a bit you know, that sort of like moves from zero to one inside the central processing unit is anywhere from, you know, 30/40 to 200/300 milliseconds. So it is not negligible, right. 

 

And so in some sense, like the computer itself also lives in the past. And once it's going to start receiving that input stream of information from the perceived world, it's going to then solve, like push that through all sorts of internal modules, you know, neural networks or an explicit decision tree at some point. And at some point is going to decide, you know, oh, now I need to steer the wheel left or right by N degrees, and make the decision probably, like 3/4/5 hundred milliseconds behind the last observation of the world. And then once the action gets performed, there's another crazy latency in the vehicle actually changing its behavior. And so the sort of like end to end loop of perception to action to actually re-perceiving the consequences of your actions, I think is in the order of seconds very easily. And I think every sort of like, agent, every dynamic actor in some kind of environment is gonna have to deal with this. And the way I think we deal with it is that to your point, we maintain some kind of very strong, smoothed out representation of the world. 

 

And you can feel that for yourself, when you're looking at a street, there’s cars moving around, you close your eyes for five seconds, and you have a sort of like movie flashing within your head of what's happening in this world. And then when you reopen your eyes, most of the time, the movie is pretty spot on. And then sometimes it's not. And that not is basically the learning signal. It's basically the thing that we call surprise, sort of like difference between what your internal model predicted, and what you end up observing is really the thing that I think drives learning. And, you know, hopefully, the more you grow, the more your system gets good at predicting the world, the less such signals they are. And that's why I guess we'll learn less when we grow up. But, but yeah,

 

Anthony  21:45

One of the things that we've seen with a lot of the modern machine learning algorithms, is self learning is very important. Can you say a little bit about how the machine approach and the human approach are the same or different?

 

Clément  21:57

I do think that in the case like Alex said, like biological creations, that there's a deep sense of the sort of like, learning machine as co-evolved with the body, right? It's completely embodied, it's never been able to experience another body. And so it's like, you're 40 years old, that learning machine had to deal with the same sort of like, casing for 40 years. And it's been this continuous progress that process all along. So I think there's something fundamental about that, like in terms of how the algorithm is forced to have an internal consistency. But I think the machine is able to achieve the same if you did, everything we're doing today, and you fast forward a few years, so that we have even like more compute capability, more abilities to like, really absorb and represent complex high dimensional data. And you force that internal algorithmic stack to be contained within one specific form factor, and then leave multiple years through that and have the ability to learn about time. I think at some point, you end up with the same type of thing, or at least from outside, you know, if you queried that thing, I think it'd be able to tell you things that are extremely human-like, somehow,

 

Alex  23:03

We're making really beautiful circles back to the beginning of the conversation with the Chinese room hypothesis. But I want to summarize where we are so far, because I think you've done a really great job parsing out the big modules that have to be built in order to replicate intelligence, which is I would add perception, right some way to get data into the system. And then a prediction module, something that allows us to take the current state and roll it out into the future, at least a little bit. And then something which takes that module and rolls it out even further into alternate possible futures that have values on them. And the choices on those different left and right branches actually might impinge upon our survival or on something we're trying to make happen. And then the ability to remember, incorporate past choices, past tree branches. And then finally, the ability to actually take an action to try to force one of those branches to come into reality. So perception, prediction, reasoning, or forecasting, whatever you want to call that, memory, and then action. Are those kind of the modules that you're talking about?

 

Clément  24:11

I think so I think that's exactly spot on. Yeah. And it was sort of like, absorbing perception to within prediction somehow, but I think you're right to call it out. It's slightly unclear to me whether all these things can be decoupled. I used to think that we could have some kind of feed forward perception system and then start doing prediction on top. And I think that these are very intertwined, actually, because you want to be able to predict at all sorts of levels of the stack. And therefore I think these things are very coupled. But yeah, that's a good summary.

 

Alex  24:39

I wish we had two hours to nerd out on predictive coding, which exists really at every step, like even your eyes and your nose are predicting the stimulus, things that don't even exist, but let's drive the conversation forward for our listeners. So we've talked about the different modules that constitute building an intelligent system. I suppose one area it'd be good to get your take on briefly is what you've described are abstract modules. And I completely buy it. I think a lot of folks buy it. Those modules we learned about because we've been introspected and we've learned about even more, because we tried to replicate them in computers, the implementations are going to be radically different. So I guess, tell me what is different about the implementation in a computer necessarily than in a person

 

Clément  25:26

What computers are really great at is, once you have a sort of like state space that summarizes neatly as a set of discrete symbols, their ability to essentially reason about that set of symbols, in extremely complex ways, is unparalleled. And that's sort of like how we solved chess in the 90s. That's how we've been able to implement extremely impressive programmes. For instance, you know, like the show up on Google Maps and ask for arbitrary point to point directions, and you're able to have the optimal, fastest paths from going through, like lots of intersections, multiple cities, and so on. These are very well formulated algorithms, and the computer is able to crunch through those increasingly fast. So that's really amazing, right. 

 

And us as humans, I think, I would argue that we were actually much less good at that. We do that in ways that are fairly fuzzy and approximate. And we've tried to create the frameworks and systems, mathematics for us to hold these symbols and be able to return to them. But it's a fairly laborious task. The converse is that biological machines are insanely good at essentially converting the external environment and to your point, we're never able to actually absorb it, we don't know what the world that surrounds us is, the only thing we know is that our brain has sort of like mapped that into some internal representation that makes it super easy for us to actually manipulate. Computers up to fairly recently, were very bad at that, you know, and our field has made tremendous progress in the past two decades, roughly, on all the sort of like different ways of perceiving the world, whether it's visual data, lidar data, audio data, and we're really getting to a point now where we can sort of like project these things into an internal representation, that's not super clean still, right. But it's clean enough that we can then fit it into these more traditional methods. And so I do think that because we have these two strengths, we're going to end up building very hybrid systems where you're using this sort of like neural-like systems and algorithms to interface the real world somehow. And both ways, by the way, like for action and perception, but then within that internal box, you'd love to have some kind of really clean state space where you can run all these traditional algorithms. And I think that's how we're going to end up building self-driving cars, it's how we're going to end up building a lot of the systems that mimic human intelligence, except that then you can tap into this world of perfect algorithms, if you will. And that's how you'll be able to build machines that I think exceed human capabilities on multiple dimensions basically.

 

Anthony  27:51

You talk about one day building systems that can exceed humans. But it seems like there are some examples where it's already happening. To use an example, in medicine, which is obviously one that I care a lot about a lot. There was a study that I read that showed that a classically hard problem in clinical medicine is diagnosing an appendicitis. And it's something that physicians often get wrong, and don't realize they got it wrong until they already have done the surgery and taken out the appendix. And so there was a study where a computer got a true positive rate of 81%. Whereas clinicians got 58%. And then even more disturbingly, when you put the two together, it was actually 42%. So worse than either, is it the case that computers can often make better decisions than humans? And what are the types of problems they do better?

 

Clément  28:38

The example you picked here is interesting and we're starting to see that across many lower level tasks. That's the problem you described is almost more of a perception plus classification problem, where you're looking at some data, and you're trying to make up your mind around whether the data is like, does it contain the illness or not? I think that computers are starting to get better than humans on several of these types of tasks, because they're sort of like following similar learning algorithms, but they're exposed to a much larger set of data, they're essentially able to go train themselves about everything that you could possibly find on the web, and not limited by a specific resolution of perceiving the data, they can really go find more advanced correlations within the data. And so I think you're gonna see the sort of like expert systems get all the way to the edge of what's possible from the data. That doesn't mean that they will be able to connect the dots with other tasks on their own. The task you picked is very much like it's a fairly well defined expert problem, where by looking at more data and having a system that can look at the data in ways that maybe a human can't, it can actually go further.

 

Alex  29:46

The other piece I want to pick up here, not just as the amount of data in the data measured in bits, there's some smaller number of some unit of learning that's possible. And I don't even know how we define that. In terms of energy, we use kilojoules. But for bits, I don't think we've got such a unit that converts like just resident gigabytes on a hard disk to like learnable units. But we're not always very good, even placed in the real world as people from learning from our errors. Like I repeatedly, repeatedly order dessert. Out at a restaurant, I repeatedly eat the whole thing. And I repeatedly feel bad afterwards, right? It's like a thing that I am unable to learn from, or there's some other, you know, system ancient that's being hijacked by the world of restaurants and food,

 

Anthony  30:36

Alex's before Clément answers. Let me give you an example. Darwin noted that a monkey will drink alcohol and act stupid, and drink too much of it and next day, wake up with a hangover. And then after that time, the monkey will never again drink.

 

Alex  30:51

Okay, well, that's great for that species.

 

Clément  30:55

Turns out that monkeys are better tuned than humans, then. 

 

Alex  31:00

That's the right answer. Right? So, you know, there's all kinds of challenges in learning, right, and both in terms of like things that are oppositional to us absorbing a lesson or timescale. So, one thing I just want to put in the conversation here is, we place humans on a pedestal in our ability to extract meaning and predictions from patterns. But we make systematic errors, not just as a species, but also as a society in terms of how we set up our ability to learn. So how can computers be better or worse here, like just help me with a contrast between humans and computers and this ability to learn?

 

Clément  31:34

Yeah, that's such a great topic. So my thinking here is that I think humans have a reward signal, something that we're optimizing for, that's basically largely defined by biology. And to your point on the monkeys versus the humans, the, sort of like, reward function might be tuned differently. I think for humans, it might be that it's extremely important for us to enjoy the moment, as long as we don't die. And we actually struggle to deal with long term survival. You know, as a species, we're pretty bad. We drink, we smoke, we do all these things, they give us a lot of pleasure in the next few hours, in the long run, they're actually really bad for us. So it must be that biological reward function we have is weighting the next hours of pleasure more than the long term survival. I think the opportunity with machines is that we can completely specify that reward function. A machine doesn't have to reproduce, so you can, sort of like, cut all the things that go into that bucket, a machine doesn't really have to sort of like enjoy taste or anything like that, at least it doesn't have to, I mean,  Alex you may change that!

 

Alex  32:32

Working on it! Yeah completely specifiable, that they need to be able to at least perceive the world of taste and of scent. I think that's an important part of building embodied systems is that they have access to the world that we act in.

 

Clément  32:44

Hugely important. But I think the point is that as we design these machines, we can decide to weigh what matters into the sort of like very top level reward function. If you want to design a machine to help solve math problems, and maybe do innovative research in mathematics, I don't think you will necessarily need to perceive certain things. But I think you'd want it to be very good at that sort of like long term reasoning and the ability to absorb the entire body of knowledge that humans have created to sort of like start from there and come up with new next generation solutions to problems.

 

If you want to create a machine that's going to help you with daily tasks, like cleaning your house, or  driving you to work, I think you've constrained them differently. And so, you know, I don't think there's such a thing as one single machine that can optimize for everything, like you have finite time, you have finite resources. And so you get to decide how you tune these things. And I think that's where we're going to get better and better as a field. As we build these very advanced AI systems, we're going to want to define and specify fairly clearly what we want them to optimize for, what their reward is. And yes, I think there's going to be a lot more control there.

 

Anthony  33:49

So I mean, when we get to the heart of human decision making, and not just when we get it right, but when we get it wrong, is it fair to say that a lot of what we're limited by is our ability to hold many different perspectives, and to hold a lot of data in our head. And to combine these two things together?

 

Clément  34:08

I don't know. Because, on one hand, I feel like we're able to actually hold a lot of these things and these parallel scenarios in our heads. It sort of depends on what you're optimizing for, you know, if a machine has essentially infinite memory, and the ability to retain every single fact that ever got created by humanity, and retrieve any of those that will, it doesn't necessarily mean that it's better at actually making use of them. And so I think there's some kind of tension between what you can maintain in your sort of like real time cache and what is the set of constraints that you have on your reasoning machine versus constrained creativity, right? Like the notion of constraint I think becomes important at some point. I'm not sure that we're limited by that as humans, I think we're limited by our ability to communicate. You know the bandwidth we have between humans between individuals is actually terrible. It's like, you know, the way the three of us are exchanging right now, it's extremely lossy, you know, there's French accents involved. There's all sorts of things that go in the way and it moves pretty slowly, right? Like maybe after an hour of conversation, we've maybe identified three or four areas where the three of us have learned and added a few bits in our heads. Computers don't have that. It's like this crazy thing where you could mesh 1000, 10,000, a million machines, you know, and have them all talk to each other, and exchange and resync, the internal states pretty much like instantly compared to, to us. And so I think that opens a completely different world for sort of, like collaborative problem solving, and so on. But the individual machine, like the things it has to solve, I think they're always gonna be constrained by IO, like how fast you actually move things in the world, if you're trying to understand some things and properties around the world. You know, if you're trying to have a machine solve nuclear fusion, it's kind of going to have to create experimental settings and do things in the real world. It can push things very far, like in its own internal model, but then I think it's going to have to test hypotheses back, most likely.

 

Anthony  36:05

Amazing. Let me switch gears a little bit. And let's start looking towards the future. And it's a moment in time where many speculate that artificial general intelligence is just right around the corner. And in many ways, what you are focusing on and what we're talking about today, decision making, judgment. That's almost the core problem to solve as we think about AGI. And you're actually also at Google DeepMind, which is really one of the leading organizations driving that future. So tell me what you think the next five years looks like? What are the problems that you're most excited about solving in this world of decision making and judgment? What's going to be easy? What's gonna be hard? What's gonna surprise us?

 

Clément  36:51

So I think the thing that we've sort of cracked as a field in the past few years is this idea that by training, extremely large capacity, neural networks, on a large body of data, which, you know, so far has largely been text data. I know, we've started adding things like images and videos, but it's been very text data heavy. We've sort of shown that these systems can emulate some kind of internal representation of the world, that's grounded enough that you can then actually use it to do things that look like reasoning, that look like pretty advanced behavior. And yet, you're exchanging with the systems purely through natural language with all the messiness of natural language that can be faulty. You can have typos, you can, you know, express things in 20 different ways. And yet, the same sort of combination is expressed and understood by the machine. So that's a huge stepping stone, because we have this sort of glue, now that looks a lot like our own IO, and has a lot of the seminal properties of fuzzy thinking and the ability to like, get to the same action based on many different input paths. And yet, we didn't really code any of those, like, we just taught the system on the web, and it emerged those properties. So that's amazing. 

 

What comes next, I think, in the next five to 10 years is going to be this whole, trying to figure out these other modules, like the memory part, the sort of like, more end to end behavior connecting this predictive model to action. And then I think the embodiment part I'm super excited about like putting these things into an actual actor, whether it's in a video game, or in the physical world, like a robot, embodying these system so that they can actually perform actions and interact with the world and create even more grounding in their understanding of the world, I think is the thing that's just super exciting, because I think that's what's gonna bring these machines closer and closer to us as humans in terms of our own interest, the problems we want to solve. Everything we care about as humans is mostly things that we as humans are constrained by, you know, like through our senses. And so I think we're very interested in building machines that have a lot of anthropomorphic properties. Because we want them to solve our problems, right, our own health issues, our own daily problems, like self driving cars, and so on. So that I'm very, very excited about I'm generally not commenting on timelines. And I think this idea of getting, we are going to build machines that have equivalent sort of cognitive abilities as humans at some point, I think on many, many dimensions. But the list of cognitive properties is long, the environments are very numerous. And I'm not really sure we want to create a machine that's actually completely human-like. I think we want to have certain dimensions where they do get much better than us, like, for instance, if a machine could really really connect the dots between genomics and biology and the different fields, and actually help us get much more creative in solving cures like cancer, that would be amazing. And I think for that you need to create something that's sort of human like in some aspects, but not all of them. So I think we're not trying to reproduce humans and I sort of hate this whole, like, doomerism movement. But I do think that we have to have similar properties. And we have to have these machines perceive a lot of what we perceive the same way.

 

Alex  39:55

Tell me more about one of the things you just mentioned, which brings us to a topic that we think about a lot on the show, which is healthcare. I mean, the whole, the whole premise of what we're talking about is how exactly are we going to be healthier, as a result of all of these changes in our ability to automate cognition. And you mentioned, you know, the ability to create maybe a virtual agent, which helps with healthcare or biology, tell me more, I know, you've thought a bit about this. But the thing analogy, I'd also like for you to draw, you know, as we think about the Industrial Revolution, one person could all of a sudden do the work of 1000, through the use of mechanical leverage. And I think an idea that you've been playing with and talking about is one person doing the work of 1000 with cognitive leverage. So tell me more,

 

Clément  40:39

There's two axes here, right? Like one is around the idea that for tasks that we do understand well, and do well as humans, doing a lot more of those. So for instance, an assembly line in a factory or cleaning up your house, those are fairly well understood tasks. And if we had human-like robots on those specific dimensions, we could automate more of those and actually scale, what we can do there, which is good. And that's one axis.

 

The one that I'm much more interested in is the creativity one, which is the idea that if you created a human-like system that has the same reasoning, abilities, the same abilities to essentially process all the digital information we have access to and exchange with other humans, can you create an Einstein and maybe an Einstein times 10 times 100? Because then what happens is that if you've created the basics of that, and then that system also has these properties of computers, like high bandwidth to other computers, they could collaborate at very high bandwidth. So it's sometimes it's this idea of, you know, if we start creating something that looks like the most creative humans, we know, and then we can actually create lots of them, and they can collaborate very efficiently and without ego. Can you come up with you know, because I don't believe that we have a human today that's able to completely grasp all the different scientific fields we have at our disposal, connecting the dots between genomics, physics, and if you had an agent that could do that, I think they would come up with new ideas. And that, I think, is just the thing that we're all extremely excited about in this field.

 

Alex  42:11

So let's return to where we started the conversation with the idea of the Chinese Room Argument, where the philosopher Searle is arguing that human brains are biological entities, and so we can't compare them to machines, which they're both really information processing machines. So I guess, as we're closing out this discussion, I'm curious, just to ask you plainly, do you agree with or do you think that computers will develop to show understanding?

 

Clément  42:42

I think so. I think the limit for this whole field of AI, we will create entities that could look exactly like humans. And though they might have, like, extremely different underlying hardware and low level processing, at the like cognitive level, they could look exactly like us. And are we going to build that? I'm not sure. It may be that we keep building them in different ways, again, because of the reward function, and how we're going to structure all these systems. But I don't think there's something fundamentally different about these two entities, you know, and maybe that's us computer scientists being extremely naive about what a biological system is. And there's a lot of complexity in biological systems. But I'm hopeful that that argument is wrong.

 

Anthony  43:28

You know, I hope you're right, Clément. This was a wonderful conversation. And I think that's the perfect place to end. Thank you so much for joining us today.

 

Clément  43:35

No, thank you so much. It was amazing. 

 

Alex  43:38

Thank you.

 

Anthony  43:39

Next week is the last in our series, where we'll be pulling together what we have explored in recent weeks, thinking about the future and talking to Dave Munichiello, GV General Partner and leader of the Digital Investing Team, about what human-like AI means for digital investing. 

 

And we'd love to know what you think about this series. Write to us at theoryandpractice@gv.com or tweet @GVteam. 

 

This is a GV podcast and a Blanchard house production. Our science producer is Hilary Guite. Executive producer Duncan Barber with music by DALO. 

I'm Anthony Philippakis.

 

Alex  44:21

I'm Alex Wiltschko.

 

Anthony  44:23

And this is Theory and Practice.