Aug. 9, 2023

Google DeepMind’s Dr. Claire Cui on The Next Frontier for Large Language Models

Google DeepMind’s Dr. Claire Cui on The Next Frontier for Large Language Models

How do we empower the next generation of LLMs with greater deduction skills and efficiency? We explore a future where introspection is added to LLMs, and Dr. Cui gives broader context to our current thinking about AI’s vast potential.

On Season 4 of the Theory and Practice podcast, GV’s Anthony Philippakis and Osmo’s Alex Wiltschko explore being human in the age of AI. Guests this season dive into areas including AI communication, robotic surgery, and decision-making.

 

Episode 2 explores how machine learning evolved to where it is today. Anthony and Alex’s guest is Dr. Claire Cui, a computer scientist from Google DeepMind. They discuss the underlying architecture of LLMs, how self-supervising algorithms work, and the technological developments that have driven innovation. 

 

How do we empower the next generation of LLMs with greater deduction skills and efficiency? We explore a future where introspection is added to LLMs, and Dr. Cui gives broader context to our current thinking about AI’s vast potential.

Transcript

Anthony  00:05

Hello, Welcome to GV Theory and Practice. This series is exploring what it means to be human in the age of human-like AI.

 I'm Anthony Philippakis.

 

Alex  00:15

And I'm Alex Wiltschko.

 

Anthony  00:21

This is our second episode today. And we'll be exploring AI and communication. Yes, I mean, AI driven chatbots like Google BARD, Chat GPT, and others like it that everyone is talking about. We'll be looking specifically at what developments have led to our conversations with chatbots feeling more human-like, and more empathic.

 

Alex  00:41

The impact of these chatbots is and will continue to be enormous. This is what Greg Corrado, our guests from the first episode in this series said:

 

Greg  00:51

Ultimately, I believe that the direction of travel that we're seeing in generative AI and large language models is one where conversational interfaces are going to be so useful and so powerful, that I expect that they'll become dominant.

 

Anthony  01:07

We had a great overview from Greg about safety of AI and guardrails for implementing AI in healthcare.

 

Alex  01:13

Yeah. And today, we are so lucky to be able to speak to one of the software developers and research leaders and engineering managers behind Google's chatbot called BARD.

 

Anthony  01:24

But before we meet Dr. Claire Cui, Alex, can you bring everyone up to speed on the differences between these different chatbots?

 

Alex  01:31

Okay, so we've had the word chatbot. For a while we've had the phrase language model for a while, the difference between what we're seeing today, and what's come before is really something that we can trace on the straight line, which is improvements in algorithms improvements in dataset size improvements in compute. But the fundamentals are really almost the same as they were before Chat, GPT and BARD came out. So everything that's being released today is, I think, in a blanket term we call a large language model, which is a type of artificial intelligence approach that uses inside of it, a deep learning technique called a transformer, trained on extremely large data sets in a semi-supervised fashion. And we're going to get into that in this episode. But all that basically means is these things are trying to learn how to predict the next word. And that seems really simple. But it's a very powerful insight because it means that everything on the Internet, every word, every sentence becomes really useful training data. You don't have to pair a document with a human label, you just try to predict the next word. And then we can use these models trained in a semi-supervised fashion in specific task areas doing something called fine tuning, we can change how they predict the next word, we can change how they basically the personalities that we perceive them to have by tilting their behavior by rewarding some outputs and and penalizing others. So what's really different about these new models, these large language models with transformers at their inside trained on lots of data, is they're trained on many orders of magnitude more data than anything we've had before. And that takes many orders of magnitude more compute expended than we've ever spent before. 

So everything is trained basically on the Internet. The Internet is teaching these models how to speak. And the model behind Google's BARD is called LaMDA. And just to be very concrete, it was trained on 137 billion different pieces of unlabeled text on the Internet. Right. GPT-3 was trained on 175 billion and GPT-4, we don't know for certain, but it might be in the trillions at this point in time. The size of training data isn't everything. The..it’s hard to say…but the personality, how human-like the responses are is also really important. And this is something that Claire's gonna get into with us in this episode. So there's a lot going on here. The story is very much unwritten and we are very privileged to talk to one of the people that is right in the middle of making these things.

 

Anthony  04:08

Totally agree, Alex, it's gonna be so great to talk to Claire Cui. Today, we'll be concentrating on Google's BARD. But what we'll really be focusing on is what it means to be human in this age of human-like chatbots. 

So let's welcome our guest, Dr. Claire Cui, Google fellow in the Google DeepMind team. Claire, welcome to Theory and Practice.

 

Claire  04:32

Thank you, Anthony. Pleasure to be here.

 

Alex  04:34

Glad to have you on the show. Can I start by asking you a question that has multiple parts? It's going to be complex, but I just want to dive right in. Can you summarize the key components of your role in developing Google BARD?

 

Claire  04:48

Yeah, so I've been working in the past few years, with teams, many folks in the previous Google Brain group to work on some base modeling things like PaLM2 and also worked with teams on LaMDA with a lot of folks, of course, not just me. And also, the tuning part to make the base model adapt to a quality that is conversational level and to, to accomplish different tasks. So the tuning part, as well as the efficiency of both the base model and the serving of the overall large language model of BARD.

 

Anthony  05:25

Alright, so maybe you can just unpack some of those things for us. So for example, what's BARD, and then maybe just a little bit on what fine tuning is, as well.

 

Claire  05:33

So BARD is a large language model that basically can take users' questions or instructions, and then respond with the answers or basically follow the instruction to give users what they asked for. And this model is based on like some foundational base model, which is a self supervised learned language model trained to predict the next token, at each time. It's a large scale one. And so that base model usually has the generalized ability to do different tasks, downstream tasks with a relatively small amount of supervised labeled by supervised label is like you give a input, and then you say, Okay, this is the output, okay, so try to learn this input output pair, but self supervised learning is just predict the next token. And the benefit of the base model is, so that it becomes a very good pattern recognition engine actually, and can be very good on general pattern recognition and generation. And then it will be very easy to tune with limited supervised data to accomplish different tasks.

 

Alex  06:43

Can we double click on that? So you mentioned quickly, and it's simple to state, but I think there's a lot there, which is the essence of training these models using self supervision is that you're just trying in a sentence trying to predict the next piece of a word. And sometimes a token is a word, sometimes it tokens, a piece of a word or multiple words. That's really easy to say, but we can in training models with self supervision, where all they're doing is predicting the next word, they can do some surprising things. 

 

Claire 07:15

Yes. 

 

Alex 07:16

So how does that work?

 

Claire  07:17

So I can first describe why self supervised learning is more powerful. I think that is one of the breakthroughs of why the large language model today is actually working astonishingly well, compared to traditional deep learning. So traditional deep learning is that you require data pairs like this is the input, this is output, and usually is task specific. And for each task, you need to give like, millions or hundreds of millions, or even billions of input. And then the output is supposed to be like, This is apple, this is pear, right? So it needs a lot of those task specific data. But self supervised learning mainly uses the previous data to predict the next state or surrounding data. And for every prefix you have the next token, and that is really feeding the deep learning models to learn the connections and associations between different patterns. Like when you see this, the next thing is this one. And the architecture for the self supervised learning model is also important. The transformer is also a breakthrough architecture that makes it really effective. I think if you use an example, you know, when human babies were born, they didn't know what apple is, what is pear, you know what, they just look around their world for the first two years. They just like, oh, this happens. And then there's this next thing that happens. And it looks similar. And so they look at the data without supervision. And then after some time, as soon as the parents say, Oh, this is apple, and this is pear, the parent does not need to tell us 1000 times this is apple, this is apple, this is apple too, but then the kids just automatically pick it up. So that's the power of self supervised learning, learning the patterns, the deep associations, and things of that.

 

Anthony  09:04

So, you know, I think this year, we've all been blown away by the power of large language models. And at least to an outsider, it feels like it came on out of nowhere. What do you feel like are the 2/3/4 insights that brought us to where we are today? And when did they happen?

 

Claire  09:23

Yeah, I think there are a few factors. One is architecture and algorithms. And then the other one is compute. And the other one is tuning and data techniques.

 

Anthony  09:34

That's a great list. Let's take them one by one. So architecture.

 

Claire  09:39

So on the architecture, one transformer, as I mentioned, is a new architecture that really advanced machine learning models.

 

Anthony  09:46

So just let's pause for a second and explain to listeners what a transformer is and how it works and the importance of that first step. So the specific architecture that you mentioned was the transformer. And it was a big leap forward in natural language processing in two ways. So first, it broke down tasks that can be worked on in parallel. And this made things all run a lot faster. Then second, it also had this notion of attention, which is the idea that in trying to predict the next word, you could look at the whole text or a big chunk of it, rather than just the one or two or three, or whatever number of words before it. And so that made it much more accurate in its understanding. You know, just choose an example, when we read, we don't just pay attention to the few words before it, we understand the whole chapter and the bigger context. So attention kind of enabled that. All right, I realized that was really simple, clear, but could you go on and explain how self supervised learning algorithms work?

 

Claire  10:49

Self-supervised learning is the other piece of algorithm that is, in a way a training strategy, and to break through from previous like, you know, deep supervised learning, and I call this deep generalist learning, which is you learn from data to generate other data, and that is a generative way. And these kinds of learning have a very strong power of transfer learning power, like from one task to another task, and combination of transformer and this self supervised learning, I think really advanced it.

 

Anthony  11:21

Okay. So what the transformer architecture really enabled was self supervised learning. Instead of having to label text, we could just try and predict missing words that we blanked out in the text. And so that really reduced the need for human input. Correct?

 

Claire  11:38

Exactly. And also compute, right, like, say, with the latest advancement of computing largest, like really powerful machine learning chips, and in scalable parallel using many, many chips to do this training task, and etc, I think that helps as well. And also, I think the scalability piece, also brought this to the next level, when the model scale with number of parameters were ways in which the model increases from millions to hundreds, millions to billions, to hundreds, billions, right. And that really increases the power.

 

Anthony  12:13

Yeah. So can you say a little bit about why we need so many parameters? Correct me if I'm wrong, but you probably have more parameters than training observations.

 

Claire  12:21

So basically, there is like every data example, right in the training, those parameters will be adjusted. And usually, when the number of parameters are larger, it can easily overfit when the data is small. So that's why self supervised learning is important, because with self supervised learning, you have an almost infinite amount of trillions of tokens and data to churn through. So that helps, you know, train the model better. And usually, larger models are easier to fit the actual pattern with smaller models, right when you have new data in, it has to crowd out other information that has to, you know, sort of trade off. But with more parameters you can fit, like all kinds of patterns inside. 

 

Alex  13:08

So the ultimate test of whether or not it's working is when it's out there in the world, people are using it and they like it. 

 

Claire 13:14

Exactly. 

 

Alex 13:16

So as some context, in our last episode, Anthony was talking about the Turing test. And we decided that we both might get the prize for being the human most often mistaken for a computer. But seriously, what, what is it that makes a conversation with a chatbot? Feel more human like than not?

 

Claire  13:35

I think that's because the model is trained over a lot of the data, it has seen tonnes of data on the web, and especially with a lot of conversation, sometimes data, and I cannot say is actually, that it's more official language, often, right, if you look at it is like official language, and it's trained to predict the language to the next token, so it's very, very fluent. And, like nice language based on the training data, the more and if it's trained with more dialogue data, more conversational data, then it will be even more conversational like. But I would say that the Turing test may not be hard enough, because just sounding like a human does not mean that is actually human.

 

Anthony  14:16

You know, you made reference in passing that we're training it on text, which is often more sort of formal and rigid. What if we took all of YouTube and turned it into text? That would be an enormous corpus of conversational data? Do you think that's what will happen with the next generation of chatbots?

 

Claire  14:34

I think that will definitely help. Yeah, and I think the more data the model is trained on the more capable it can be but as for the detailed, what data to use, and it has a lot of, you know, sort of governance, data governance, things.

 

Alex  14:51

So not to belabor it too much, but I'd love to dig into this notion of you know what a chatbot sounds like, or what it feels like. So how did you work out what human-like responses…this is gonna sound weird… What human-like responses, humans like? Or what, what chatbots, what kind of personality? Or what kinds of style of responses people like across different cultures? I'm sure there's different, there's different ways that people want to interact with a chatbot. How did you, how did you figure that out? How did you create something that felt natural and that that people prefer to work with, prefer to interact with?

 

Claire  15:25

So I think there's data quality and annotation and safety and a lot of things. And so, the idea is that once a self supervise trained, and then later stage, there could be you know, sort of slight guidance on like, I want this sort of style. And those are done by tunings right downstream, like say, some supervised and annotated tunings. I think the safety and then a lot of other things are also like guided after the self supervised pre-train. So there is a pre screening stage, and then there's a downstream tuning stage.

 

Anthony  15:57

So you know, let me switch gears for a second and talk a little bit about groundedness. So one of our previous guests was Greg Corrado. And he talked about groundedness, especially as it relates to safety. And basically, what he said was, it was really important to make sure that AI relates to things we know about in the real world, I know that you've had a recent paper that you called The Mind's Eye, that kind of relates to this, maybe you can say a little bit about that.

 

Claire  16:21

Yeah. So I think grounding is really important, because I think LLM models today are very knowledgeable. And it's also can be very creative. But sometimes it doesn't really always provide grounded answers, it can create and add things. And it's actually interesting, because sometimes we humans can be creative as well. But we know when we can be creative, and when we should stay grounded and factual. And I think the LLM today still does not quite get there yet. And the purpose the goal of the Mind's Eye paper is saying that for things like you know, physical things that we cannot really trust LLM models to be grounded and know all the facts is to grounded to physical simulations, so that it can connect the real world information and simulation and things like with the LLM, to use the real world information to help that. Another grounding example is to basically use tools to connect the fresh data or latest update data or something of that, in the world to ground, the LLM to use that as context or grounding assistant information.

 

Anthony  17:39

Okay, well, let me pull on another thread, you know, one of the theses of the show is that by understanding AI better, we can understand ourselves better. So when we think about humans, we come to learn our physical environment. And we start to intuit rules about the world from our experience. So you know, go back to your Mind's Eye paper and physics, you know, as we learn how to drive a car, we gain a physical intuition for the amount of time it takes to break based on a feel for the momentum of the car. And in some sense, we're kind of intuitively learning classical mechanics as we go. So somehow, we have that knowledge embedded inside of our brain. Is that an example of grounding? And how would you transport that to a computer?

 

Claire  18:30

I think in a way, it's common sense that people human beings experience, but not everything human beings experience are reflected in the web data that is used to train large language models. So that's why I think it's important to sometimes use, you know, simulations to fill that gap. And that's how I feel like say, for example, self-driving car and other things, LLM does not always have those physical worlds information in its pre-trained data. So that's why it's important to use that.

 

Anthony  19:07

Understood.  And let me ask you another question that I just find fascinating. As someone who kind of trained in mathematics originally, you know, the LLM’s their kind of naively doing curve fitting. And they do it amazingly well. And then you give them even kind of relatively simple math problems, you know, arithmetic with lots of parentheses,

 

Claire 19:27

Yeah. 

 

Anthony 19:28

And they, they kind of fail miserably. 

 

Claire 19:31

Exactly. 

 

Anthony 19:32

And so there's a way in which they're able to do induction at the highest of high levels. And yet, even the most rudimentary aspects of deduction are lost. How do we empower the next generation with deductive capabilities? I mean, we as humans can do both. Is this fair when you're smiling? So I hope you like it?

 

Claire  19:51

Because this one I thought, actually a lot I don't have the answer, though. But I do think that LLM is not human Intelligence yet or not matching human intelligence and exactly as you said, you know, no matter how much data you feed the LLM, it cannot do accurate mathematics like deduction, right? It cannot actually follow a programme, exactly like computer today just run the programme or just follow my instruction like, you know, literally infinite, I think because it doesn't have the generalization of infinite generalization. Basically, you know, it can do patterns to some level to some length of content dense, but it doesn't know, like, for example, if we do arithmetic, addition or subtraction, like it doesn't do the deduction accurately yet, right. And most of the LLM today is in a way, induction. And then reproduction of the generation piece to some level. So there are two things. One is the infinite piece, recursion piece. And the other one is the accuracy piece, the verification piece, and LLM right now is all probability based. And then it pulls its ways together in some way that it has the highest probability, but it doesn't get the accuracy. And that answer to that is still missing. And that's actually the reason I think it's still exciting to continue to do research, like we're not done yet. We're just like, at the beginning, and it's the current like [instruction tuned?] LLM is beyond a lot of people's expectations, but it's still not there yet. And there are lot further to be done.

 

Alex  21:31

I want to pick up on one thing that you said there, which is a key term, which is probability. And I work with some really brilliant people when I was at then Google Brain now Google DeepMind, who thought really, you know very deeply about uncertainty quantification. And one area of concern about chatbots is they'll give you a coherent sentence of english or any language that you ask of it that it trained on, but they won't give you a confidence score. And in some cases, the sentence appears very confident, but is in fact a hallucination. which ties into this idea of groundedness, which is a really important technology,  that doesn't exist yet, but people are working hard on. But then the other flip side of this is, well, if the output of the model is simply uncertain, then we can as humans as we interact with it, we can we can know how to take that output and and maybe apply a different layer of judgment on it. So how do we think about this? What's the state of the art here? 

 

Claire  22:27

Yeah, that's absolutely a great point. And that's exactly I think what's missing nowadays for the LLM? I think one of the biggest gap is I call introspection of the model is basically say, I know what I don't know. Right? And I know when I'm not confident and human beings know that. And right now, any of the models always spit out something especially like generative models always give you something. And without telling hmm, am I sort of making it up? Am I confident, I think this is going to be a really important piece of research that will be really critical to the grounding factual and trustworthiness of LLM, to be honest, like, because if I asked it to create something, and is whether I know, this is factual, like it needs to have this introspection, and also confidence, right, like say, whether I think I have that answer, right, don't have an answer. So these are really important research directions for LLM.

 

Anthony  23:30

Okay, so we've talked about a few different ideas for where LLMs should go next. We've talked about the notion of grounding. We've talked about the notion of empowering them with more deductive capabilities. And then you just talked about being able to have introspection. What else do we need to do? What do you think is the next frontier of LLM research?

 

Claire  23:54

I think there are a few gaps in factuality, grounding and introspection. These are related to one big area of gap that we have right now in LLM. And I think another gap or opportunity is efficiency that like right now, if we want to be capable, and want to have say, high quality, then we require hundreds of billions of weights to be fully connected, right? And that's just not sustainable for the longer term. And I think sparse modeling, and this dynamic decision of which pathway to trigger or activate is going to be an important area to lead to really human level, like efficient LLM models, not burning too much compute and things like that. And then I think another big opportunity or not completely solved, I believe is reasonably in planning and tool use, I think to tool use is a big one that has opened the horizon but how do we use it effectively and intelligently

 

Alex  24:55

Tool use is a really interesting one. I'd love you to dig in there. That's, I guess, two pieces that touch on what Anthony mentioned with induction and deduction. There's other programmes, which you can call, which will just do exactly what you want it to do, right? If you want to add two huge numbers, you can call a calculator program, have it done. And I think the idea behind tool use is you can just have the LLM have access to a calculator programme, and have some way of invoking it. And you can imagine there's many other tools. Could you maybe say, more there? What's the frontier look like? Well, how does this even work?

 

Claire  25:29

Yeah, so I think this is a really exciting area, I think, in a way, that's why LLM were the models, machine models can be more capable than human beings, because I think the importance for the tool use, there are a few stages. One is you have a lot of tools at your hand, like calculators, you know, simulators, whatever, like say, look up tools, right? Like all of those tools, the first question that model needs to know is which tool to use. That's like human beings,imagine, I asked you a question, what would you do? You said, Should I do a search? Should I use calculators? Right?

 

Alex  26:04

Like, do I pull a book off my bookshelf?

 

Claire 26:07

Exactly 

 

Alex 26:08

Or do I go grab a screwdriver? Exactly. I mean, these are things I know intuitively, but had to learn, right?

 

Claire  26:11

Or do I need a tool? Maybe I can just answer your question directly. Right. So that's the triggering and tool selection piece. So that's the first step. And the second step is, once I select a tool, or a few tools, how do I use that? That is like, what is the language? What is the input format? Or like APIs, the interface to that tool? And then how should I frame my question or query to that tool? And that query generating pieces like say, How do I use this tool, basically? And then you use that tool? And then you get the response? And then how do you handle that response? And how do you use that response to generate your final response to me or to other people tool users? And sometimes this is not just one path. There are multiple paths. Basically, sometimes you may need to decompose the question, say, Oh, I cannot directly get the answer from one tool, I need to first ask a first question, and then use some tool to answer that question. And then use that intermediate result and to ask my next question, and then send that question with that query to a different tool. And then I need to basically integrate all of the information I got now, right, that's a little bit like a research process, right? And then you, you generate the response. And even when you generate the response, you may need to use some tool, right? Like to generate charts or something like that.

 

Anthony  27:39

You know, it's interesting, as you're talking, I can't help but think a little bit about the human brain, in the sense that the brain has pocket calculators, you know, you have the cerebellum, which you kind of outsource a lot of coordination tasks to, or the basal ganglia for kind of initiation of movements, is the future of AI, assembling different architectures. So transformers for language, CNNs for vision, maybe something else for coordination? Yeah, again, you're nodding. So tell me more.

 

Claire  28:11

Yeah, I think modular is going to be the direction, right? Modular neural nets is going to be the direction and it's an important direction to go. Like the human brain, there are different modules that are specialized in some tasks, or some type of tasks, let's say, however, I'm not sure what is exactly the best architecture for what module yet, I just think that they may be different. Because the reason I say this is that it's not completely clear that CNN is the best for vision. CNN used to be amazing convolution. Neural nets used to be amazing for vision and it's actually in a way it mimics the human vision system, how it works, but transformers later like say, it has been proven that it can work even better. Some of those Yeah, so as for the detailed architecture is not sure but I think there could potentially be different kinds of structure or the type of connection patterns or different modules to make it most efficient for that purpose of that module.

 

Anthony  29:14

Yeah. And then related to that, or maybe not, you'll tell me it is kind of multi-modality. You know, we talked about text, but you know, you can talk about building models that also incorporate images and you know, you get an example of a parent who says “This is an apple.” Well, this usually points to something visual. So say a little bit more about multimodal learning.

 

Claire  29:35

Yeah, I think multimodal is definitely really huge. Not only because we can do different tasks but it's also good for grounding as well, for human beings we ground our language with ambition and with the audio sometimes and with each other. And to do multimodal learning, we need you know, ideally like different modules for different modality of data And then each module is responsible for encoding or understanding, you know, one modality and then feeding it into a centralized kind of language or reasoning network. And then when we output like we can output so different things through language or through different control systems. So that's how the human brain works. But then similarly, for machine learning models, it should be like, say you have modules. Each module is specialized to handle one modality, ideally,

 

Alex  30:28

We've covered an amazing assortment of topics. And it's clear that we're only really scratching the surface. It sounds like there's so many different promising research directions, there's so many different promising technologies that you're working on to bring to the world at risk of opening up a can of worms, is there anything else that you wanted to talk about? Or that we should have asked you?

 

Claire  30:49

I just think that overall, this area of AI is called deep generalist learning, some people call it foundation modeling, or some people call it LLM. But it's actually multimodal and is really fascinating. And it's like, it really went beyond a lot of people's expectations. And some people felt that it's, it's almost like it can terminate the world or something, some folks, and some folks feel like it's, I feel it's still sort of just the beginning of showing the capability. And I hope that we are like, say on the bold and safe front, or the being responsible front, I hope that we are not, in either extremes are one extreme, it's like, okay, let's just be bold, and do whatever. And then not like sort of, say, guard railing anything. But on the other side, like say, Oh, like this is going to be the end of the world. And let's stop, I hope that there is some middle ground that we can keep developing the capability, but still with a very careful sort of thinking on how to put it to good use. And I like to use the analogy with fire, right? If we don't invent fire, like we won't be the human being that we are today. But fire can be used in a lot of ways that can actually do damage and hurt are things like that but how do we enable innovations like this, but still make it safe and have good policies and things to prevent the damage?

 

Anthony  32:18

You know, clear, this was just such a fascinating conversation. Thank you so much for joining us today.

 

Claire  32:23

Thank you for the opportunity to talk to you folks.

 

Alex  32:26

Thank you, it's a pleasure.

 

Anthony  32:36

Let's move on to the hammer and nail part of our podcast where you and I talked about a nail, a problem, or a hammer the solution in honor of our in person meetups in Boston many moons ago. So Alex, this week, I have a hammer. And I've been thinking about this hammer a lot. And I'm trying to figure out what's the right nail for it.

 

Alex  32:56

Okay, let's This is my favorite kind of brainstorm where it's just I think our minds are built a little bit which is like, Look at this cool thing. Like what's it useful for? But of course, we've been hardened by the real world and knowing that we must find a purpose for the cool shiny toys that we stumble across. So I love this kind of conversation. Tell me more. 

 

Anthony  33:15

All right, so, you're gonna be shocked to hear that I've been thinking about LLMs and chatbots. I mean, nobody in the world is talking about LLMs and chatbots. Right now you're off.

 

Alex  33:26

You're a hermit, you know, contemplating mysteries that nobody else can even comprehend or much less pops into their mind. But you go on, of course, this is taken by storm by these things. But I'm curious what your angle is here? What are you thinking about?

 

Anthony  33:39

Well, specifically, I'm looking for nails in the realm of clinical medicine. And what is their role in actually helping in the care of patients? There are so many aspects where it's interesting to think about applying these, you know, the first thing I'll call out is one of the things that I think is very sad about the current physician patient encounters is that when you go to your doctor, your doctor is sitting there staring at a computer screen with their back to the patient, typing feverishly to try and get their note written while they see the patient. And you contrast that with my grandfather practiced medicine. And you know, the doctor would sit down and actually listen to the patient and talk to them, maybe take a few notes off and talk to them for hours on it. So one area right away is could we improve clinical documentation? And the answer might be actually take most of the history and physical with a chatbot before the patient comes to the doctor. Another could be summarizing the patient physician encounter, and turn it into a note and all the doctor does is talk to the patient.

 

Alex  34:47

Using speech to text technology as well. Well,

 

Anthony  34:51

you don't just want a dictation of the patient physician encounter. You want the patient physician to encounter First, turn into text And then summarize into a clinical note.

 

Alex  35:02

Okay, so first digitize the interaction word for word, and then apply an intelligent transformation that summarizes it or extracts relevant information for future encounters for future decision making.

 

Anthony  35:14

Exactly. And then, you know, even beyond that, there's a lot of other interesting things that you could imagine a role for chatbots. So let me give one example, I think that a lot is a part of the patient physician encounter that is notoriously poorly documented, is the family history. And the reason why is to take a proper minimum of three generation, family history takes a long time. And very few physicians have the time in the office to do it. But you often learn really surprising things. So let me tell you one story. And this is one that really illustrated to me. And where we could do better is I remember, there's a patient when I was a cardiology fellow who had had sudden cardiac arrest at the gym. And you know, which is to say he died. And so somebody at the gym knew CPR and brought him back to life. 

 

Alex 36:08

Wow.

 

Anthony 36:09

Yeah. And so that night, I was on call and took care of him. And over the coming weeks, we did a lot of elaborate tests, like did angiogram, where you shoot dye into the coronaries of  the heart, and we did cardiac MRI. But the single most important thing we did was actually call up his brother. And so I called his brother and said, "Does anybody in your family have a history of heart disease? And he was like, no, nobody. We've all had hearts that were strong as an ox. Like, okay, it was just the two of you. Do you have any other siblings? He's like, Well, we had a sister, that was really strange. She was a champion swimmer. And she died when she was 35 in a bizarre drowning accident. Okay, hmm. Okay, well, what about your parents? Are they still alive? Well, mom's still alive. But you know, dad, it was really weird when he was 40. He just fell down the stairs and cracked his head open and died terribly. They're like, Well, did dad have any brothers or sisters? Yeah, you know, dad actually had a sister as well, who was, you know, driving a car and just drove off the road when she was about 35 years old. And you know, we always just assume she'd fallen asleep. And so you start going through the family history in this family that supposedly had no history of heart disease. And there's a pattern of autosomal, dominant, traumatic death around the age of 40. That they all seem to have had. So very clearly, it was not that the sibling fell down the stairs and died, but rather died and fell down the stairs. Ah, yeah. And so that's an example of one where physicians almost never take good care of a family history. And so you miss a lot of important things. You can imagine a really rewarding chatbot experience, that some version of “Tell me about your family.”

 

Alex  37:47

Yeah, it's, you're describing an Agatha Christie story where the murderer are genes being passed? Yeah, through the family. Right. And like, that's, that's a detective story. Like, wouldn't you want to know, if there's some kind of a drama or, you know, danger lurking inside of your family, I mean, I want to know that that information is in the hands of the people taking care of my health.

 

Anthony  38:09

But again, like, it's the kind of thing that very few doctors will have the time to be able to take this kind of thing in the office. And so again, it's really about the first place my brain goes, but there could be a lot of them. It really is the physician patient encounter, and how we make that better. And I'll tell you, there's another thing that really has me thinking about this, which is there's a paper that came out in late April in JAMA Internal Medicine, which is definitely a very respectable medical journal, where they took 195 patients from a social media forum, and basically had either licenced healthcare professionals, talk to them, or chatbots. And basically asked, you know, the patients, what was your experience?

 

Alex 38:54

Oh, boy. 

 

Anthony 38:56

And it was kind of amazing that, in general, the patients found the chatbots more empathetic, which is very concerning. And ironically, a lot of it was that the chatbots spent longer talking to the patients, you know, the doctors would kind of quickly ask questions and want to get on to the next patient, where the chatbot doesn't get bored, isn't pressured for time. And so you know, just talk to the patients longer. So what the patients experienced is greater empathy.

 

Alex  39:21

I find this really fascinating. The question appears in my mind, our chatbots looking so great, because the bar is already so low.

 

Anthony  39:31

Yeah. Okay. So to be fair, I probably should be a little kinder to the doctors in my description of my studies. So just to go one level deeper. This was on Reddit and there's a you know, ask Doc's questions on Reddit. And so there were times where doctors were not viewing this is like a traditional patient physician encounter. You know, a patient would say, I swallowed a toothpick, am I okay? And the doctor would say, if you're not dead already, you're fine. 

 

Alex 39:58

Okay. 

 

Anthony 40:00

So then they would ask the chatbot the same questions. So to be fair, you know, it was not a perfectly done study. But there's a bit of a kernel of truth to this, that actually has something there. 

 

Alex 40:10

Yeah, there's something there 

 

Anthony 40:14

is it the current generation of doctors is busy, kind of interacting in a way that's often a little bit curt for reasons of honesty, just, you know, the scheduling rate of patients in a clinic. And I do think there's an opening to improve the patient physician encounter through this technology.

 

Alex  40:30

Yeah, the thing, I don't know if this is going too far afield, but the kernel of the problem seems to be the limits of the human mind, okay. It's a weird thought experiment here, if we experienced time at twice the rate, or if our days were twice the length, but the workload for whatever reason, hadn't adapted to that, that we had kind of twice the density or volume of moments in which to interact with each other in which to do things, we might not have this problem, you know, it might be back in the world of your grandfather practicing medicine where things were simpler and times were slower. But things have sped up enormously. And a way I think about what's happening is, in many domains, we're reaching the limits of what our minds can do in this current era. 

 

Anthony 41:15

That's right. 

 

Alex 41:17

And we're building these systems which do not currently have the full capacity of the human mind. But in very narrow domains, they can do things we've previously called intelligent, like play chess, or play, go, or have a short but meaningful and mostly accurate, you know, patient physician interaction. But the difference is, just like machines can scale mechanical work, to a degree simply not possible with muscles, these kinds of systems can scale, some narrow kinds of cognitive work well beyond the range of what our minds can support. And in some domains, it sounds like we need that.

 

Anthony  41:55

 Yeah, I totally agree. This is a great point. And let me bring like to you another use case about the role of chatbots in medicine, which is the challenges of remembering everything that you're supposed to know. So every medical student goes through this period, when you first start, the phrase that's often used is you're trying to sip from a firehose, and there's just this deluge of information that starts on day one with anatomy. But, you know, in the modern world, you not only have to know anatomy and physiology, you have to know innumerable lists of proteins that become drug targets and cellular pathways, and mechanisms of actions of drugs and drug toxicities.

 

Alex  42:33

Yeah, it seems crazy, I mean, just the amount of information that's out there, there's a name for every eyelash, but not only that, there's names for different parts of the eyelash.

 

Anthony  42:43

And, you know, it's very clear to any practitioner that you're well past the point of, there is more information than any one person can know. And the sad thing is that mistakes are often made, because people forgot something or missed something. And so again, as a bicycle of the mind, or a bicycle of the clinicians’ diagnostic capabilities or management capabilities, it's, I think, a very intriguing possibility of being able to, you know, to use, the current parlance, a copilot for the physician that helps them remember the guidelines. And remember toxicities and drug drug interactions or, you know, diagnoses to consider that they might have forgotten about things like that.

 

Alex  43:28

I think that's incredibly important. And it's a part of the trajectory of our species, frankly, I mean, scissors are sharper than any one piece of our body and can help us do things that we can't do. And the systems that we've used to try to predict if it's going to rain today or tomorrow, since the 1960s, have augmented well past our ability to sense the world to forecast what's going to happen next. And we're taking this even further and maybe in domains that you know, are happening faster than we anticipated. There's a concept that wraps a lot of this together for me from one of my favorite science fiction authors. And he wrote a piece of nonfiction that I think is brilliant called Summa Technologiae. It's kind of a book of Futurism, like what will happen in the future. He predicts the metaverse and augmented reality in the 1960s, which is really lovely. He has this concept called the information barrier. And the information barrier is kind of just a catch all phrase for the moment at which the human mind can't do the tasks that we're asking it to do. In keeping everything we need to know in mind, of sensing everything we need to sense and of forecasting the future, or predicting outcomes from present state. Using all this information is just ridiculously hard. And we're in this amazing era where we actually can push the information barrier back using a totally new class of tool. I just think it's gotta be one of the most exciting times to be alive, is today.

 

Anthony  44:58

No question. I mean, that's a really uplifting and optimistic way to think about it is that you know, instead of focusing on all of the shortcomings of this current situation, but rather, the opportunities for them to change rapidly for the better in our lifetime.

 

Alex  45:15

Completely. Today's the best day to be alive. And I think tomorrow is going to be even better. I believe this really deeply. Call me, an optimist or a techno optimist. I don't know what the phrase is, or the philosophy is, but times are confusing, times are tough, but what is happening and what we could do for our species, given all that's being developed, all that is being shared amongst us is just truly incredible.

 

Anthony  45:40

Totally agree with you, my friend. It's been a great episode.

 

Alex 45:45

It's always always always a delight to talk with you, Anthony. 

 

Anthony 45:48

Until next time.

 

Our thanks to Dr. Claire Cui for joining us this week.

 

Next week, we're going to talk to Dr. Reardon about how he is simplifying human control over computers. And finally, we'd love to know what you think of this series. You can write to us at theoryandpractice@gmail.com or tweet @GVTteam. 

This is a GV podcast and a Blanchard House production. Our series producer was Hilary Guite executive producer Duncan Barber with music by DALO. 

I'm Anthony Philippakis.

 

Alex  46:21

And I'm Alex Wiltschko. 

 

Anthony 46:23

And this is Theory and Practice.