We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Can AI Advance Science? DeepMind's VP of Science Weighs In

2024/5/1

a16z Podcast

AI Deep Dive AI Chapters Transcript

People

Pushmeet Kohli

Vijay Pande

旁

旁白

知名游戏《文明VII》的开场动画预告片旁白。

Topics

Pushmeet Kohli：人工智能已成为科学研究的必要工具，用于理解和解决复杂问题；深度学习彻底改变了机器学习及其应用；DeepMind的目标是将AI应用于解决世界上的重大挑战；DeepMind的科学项目从最初的几个人和两个项目发展到如今的100多人参与多个领域的研究；AlphaFold2的成功证明了AI在解决基础科学问题上的潜力；AlphaFold2数据库已被全球广泛使用，其影响力遍及各个科学领域；AlphaFold2数据库的开放获取极大地促进了科学研究的全球合作；AlphaFold2已被应用于多个科学领域，例如药物研发和疫苗开发；大型语言模型的兴起为AI在科学研究中的应用带来了新的机遇；DeepMind未来将继续在蛋白质结构预测、基因组学和材料科学等领域开展研究。 Vijay Pande：生物学领域正在经历一场工业革命，AI和海量数据是其关键驱动力；AI正在推动医疗保健和生命科学领域的工业化进程；AI促进了生命科学领域的工业化，将手工劳动转变为工程化和工业化流程；AlphaFold2可以快速预测蛋白质结构，大大缩短了研究时间；蛋白质结构决定其功能，对生物学和药物研发至关重要；AlphaFold2将蛋白质结构预测转变为数据库查询，极大提高了研究效率；AI改变了科研的经济效益，使小型团队能够更高效地开展研究；预测临床试验结果是AI在生物医学领域面临的重大挑战；AI模型对人体药物作用的预测能力有待提高，以减少对动物实验的依赖。

Deep Dive

Chapters

The episode introduces DeepMind's VP of Science, Pushmeet Kohli, and a16z General Partner Vijay Pande, discussing the transformative potential of AI in scientific exploration. They explore whether AI can lead to fundamentally new discoveries in science, highlighting the groundbreaking impact of AlphaFold 2 in predicting protein structures.

AI is becoming a necessity for making sense of complex scientific problems.
AlphaFold 2 has been utilized by over 1.7 million scientists worldwide.
The conversation sets the stage for deeper discussions on AI's role in advancing science.

Shownotes Transcript

Translations:

中文

AI is not sort of nice to have. Its basically of almost a necessity for us to make sense in something about any problem that be now looking at.

I think this is me, this really fun cultural shift where ten years ago, people would say out ridiculous that computer could try to do these things. I think ten years from now, people be like what was ridiculous stuff with human being. To that, you can like lotless number is your head.

Essentially, what we have entered is basically an age where a single human mind cannot comprehend the data that the captain about the use.

One of these structures may have taken the length of a PHD right to solve a single structure. And now we're talking about true scale.

There have been one point six or seven million users of that for database. Now if that is not a positive statement about the planet, then I don't know what that is. There are one point seven billion people interested in building structure production. I'm really happy about that.

The last few years have been peppered with air announcements. Let's break up a few April twenty twenty two dollar two is released, the journey and stable to fusion fast well of that summer. Then in november, ChatGPT arrives.

Then twenty twenty three features the release of clad lama and mister rosen. Be just the name of few models and we're only a quarter or so into twenty twenty four. And we're already seeing the expansion into A I music and video models faster than almost anyone. And while much of the attention circles around creative tools, there was an A I unlock in biology that caught much attention in twenty twenty one that was alpa to a breakthrough in prediction around the three models of protein structures was released and open sourced by the deep mine team in july of that year. Since then, over one point seven million scientists across one hundred and ninety countries happen leveraging the in the meantime, the deep mind team has been hard at work seeing how else machine learning can expand the frontier of science .

across many of of ology, from struction ology genomics to protein design to send genomics to antoine chemistry to ideology, to fusion, to po mathematics to future science.

They released papers like high accuracy weather model graph ast in november of a geometry in january, which approached the level of human olympic gold medalist and other papers across materials, mathematical functions and more, including, of course, continuing to push forward alphabet. And today we have the pleasure of hearing directly from deep mines, VP of research focused on science push meat colly push meat sits down myself and a sixteen e general partner, V J pony, who has long been part of this intersection himself as a long time professor at stanford, spending several departments from computer science to structural biology to biopic tics, and was also the founder of the folding at home project released in the year two thousand. Together, we reflect on the journey to alphago, lt.

But more importantly, where are we in the structures of AI meaningfully impacting the way we perform and block new science, from new lab economics to clinical trials to drug discovery and more? So the question becomes, can artificial intelligence help us uncover fundamentally new science has IT already done that? Let's find out. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any exigency fund. Please note that a six scenes year and is zhu yaz may also maintain investments in the companies discussed in this podcast for more details, including a link to our investments, please see a sixteen ca com slack dispose.

So A I has been the talk of the town. A lot of people are familiar with the consumer elms, think ChatGPT maybe mid journey. But A I has been round for quite some time and is also impacting the scientific s fear, which I think is so exciting, and I think both of you do, to so push me. Maybe we could just start there and talk a little bit about your background, how you kind of got into this intersection of science in A I and also you work for deep mind, which I feel like for one of the flagship AI companies. Why have you chosen to focus more there than perhaps some of the others?

Yeah, so I took away round about journey into what I do today at the mind. I'm a computer scientist by background and was higher at microsoft search and work there for a decade, mostly working on a blind mathematics, solving difficult math problems. And most of them were encounter in machine learning.

So started with computer vision, computer graphics, information to tree will. And after having gone through many of these applications, was really excited about deep learning when IT families sort immerged, I have really thought that this was a game changer in terms of how machine learning is going to impact applications. Dennis, as always, is the C.

E. O. And founder of deep mind at that time, before he was a Young starter, and he reached out and said, well, we knew you from some paintings.

Ces, but I don't, you join us. And I said, no, of working on games at that time. And I went to products and applications.

And he said, well, the whole games thing is just says, well, the idea is to eventually impact science and impact applications, which are the biggest chAllenges in the world, and the level of conviction with which he basically made this case. And I was like, convinced this guy gets IT. And so I move to deep mind into the seventy.

And I told him, if you are very serious about real world applications, we need to make sure that machine learning systems are reliable. So in fact, when I joined deep mind, I founded the reliability and safety sort of team and deep mind, and around a year, and do IT the measure of sp. Once for me, a really infested into, in monti discipline research, where you want to apply machine learning in backed ful problems.

And I think the most barker's area that you could work on is science. And that was a complete left eld sort of suggestion. The last class was in school, so I was quite sceptical to be honor.

And I told him like a cota wrong guy, like no background in biology or physics or chemistry. But he said, no, I mean, the way you are approaching these things, it's good. Let's sort of give me a try and see where IT goes. And so we started the science program with six or seven people walking on two projects, and now with almost one hundred and one two birth, and deem that we have then, if initiative, spanning many areas of biology, structure biology, to genomics, of protein design, to send genomics, to quantum chemistry, to meteorology, to fusion, to pure mathematics, to your sign. So it's a long journey, but started with sort of an accident.

Yeah, and also a very scientific iteration approach. I love that, vj. Before we jump into more of those projects that push me evaluated to there, I love to hear your background and how you got into this intersection of science in A I cause you also have quite the storied history there.

sure. yes. So from ninety, ninety nine to twenty and fifteen, I was a professor at deformed and actually in arrive departments. My home department is chemistry, but also had appointments in computer science, structure biology, and was also shared by physics. And at the intersection, IT was clear that machine learning was a very exciting tool to use.

I think what really was happening early, which genomics in the nineties and then just cloud all the way through, was the rise of data in biology and biog, becoming very quantity ative and once starts becoming quantitative machine learning is very ancient as push me talked about, I think, where a lot of us and and myself included, that particularly excited was maybe twenty thirteen and two and fourteen, fifteen as deep learning was emerging. And I think machine learning before deep learning was human beings have to coming up with their features. And IT was like a little tool with deep learning.

IT could be something that replaces more, more of the human part of the thinking. And actually, a lot of the interesting results are emergence after that. And those emergent properties got very exciting, was clear at the time that we need a lot a compute.

And so actually early on in two thousand, I founded the photo and home distribute computing project. And actually we are some of the first program, G, P, S. And so all of that comes together, the data, the compute, and then finally, the algorithms. Once those three pieces were together, I think many of us could see that this was taking off. And I was time to absolutely.

I think that brings us to this question of the why. Now you kind of already addressed IT. But V J, what is you so excited about this intersection? We're recording this in twenty twenty four.

A I has really been around since maybe the fifties. Is IT just that we have the right amount of compute? Is IT that we have these unlocks when IT comes to the modeling? Give us a little bit of a picture of what gets you so excited about what's to come before we dive into some of the specific examples.

Yes, you step back. I think what we're really seeing in biology is this industrial revolution. But if you look at a biology, and maybe even today, to some extent, verses ten years ago, over six, fifty years ago, will be benches and people in a White coat, and by pedling and so on, and maybe the boxing adventures are a little different, but very, very similar and very the spoke and artisan.

What is shifting is that becoming industrialized. We're seeing the rest of robotics and we're seeing, with that industry realization, uh, this immense amount of data. And so A I needs data and data needs A I.

And so as biology gets all that data, we can certainly into this. And what's most intriguing as a life sciences and health care largely has not been promoted by technology, not by I. T.

To a great deal. And healthcare and life science is collectively almost like becoming twenty five percent of U. S.

G. D. P. Is that trillions and trillions of dollars going through this. And none of IT are very little of IT being sort of revolutionized by tex.

So this revolution, I think, is happening because of the a eyes allowing this industrial ation to happen. And I especially turning these but spoke artisan processes into something that is engineered and industry. Zed, one aspect of IT, and there's many others like talking about robotics, and that's the art that I think exciting.

And it's something where I think we saw a bit and twenty and fifteen, it's probably a twenty five year ark, maybe thirty year are that were ten years into and industrial revolutions don't happen overnight. But when you look back, the whole world is going to be changed. And so we're living in the middle IT. And I was actually always jealous about people living in the one thousand hundred twenties and people going from nothing to steam trains and all the stuff. And actually now where the ones that I think are in the center of IT.

it's such an exact time, right? You see that picture of, I think it's somewhere in new york where you have all of these horses lined up, right? And back then that just felt like the norm. And then you see what like a decade later, it's over placed by equivalent of cars.

And so cause me maybe we could use alpha's as an example here because a lot of people are listening to the podcast or maybe most familiar with that paper and that breakthrough, but maybe also another great example of how that didn't happen overnight. I think most people noticed IT in twenty twenty, but I didn't start in twenty twenty. And so maybe you could talk about that arc.

What is alphabet? How did IT come to be? And then also where we today in terms .

of its impact. Yeah so after I was telling how I started my journey with the science program decline and at that time we had these two small gain a sort of projects when was uh protein structure prediction and though the one was called to chemistry and after full sort of roles from that protect cure prediction project in its simplicity for comments of existing problem, we are given an a minal asset experiments, but which want to duce a protein, we want to understand the city coordinates of those of your assets.

And that's really important because if you understand the three d structure of the protein, that informs and give you an idea about what the function would be of that country. And that has implications for dog discovery, for understanding basic ology. And so IT.

So we started working on this problem because we taught that is sort of satisfied one of our key requirements when we look into problems, that is its real foundational route note problem. Once you solve IT, IT has so many different acts of implications in disease understanding in biology. And sensitive biology is.

And not only that, IT is a classic or machine learning problem. You require evening in this problem because you are working with the uh, expanded solution space as well as you have access to raw material, which is data. And the struction biology community had done an amazing job in the actuating, a very good data set in the form of the P.

D. P. So scientists all across the world had, whenever they found the structure of routine, which sometimes took almost five years, or even in a decade in some cases, but diligently deposit that through new structure in this database.

And so at that time, when we started at the 那个地址是 or structures votes from exchange strategy， phy and rio in and and that was like an amazing sort of dataset to start with. And not only that, the other big problem in machine learning as to how do you evaluate the machine letting one, because the machine learning, one of the easiest things that you can do is basically full yourselves. These models are extremely good at sort of cheat team.

And if you give them any as a way to cheat, they will cheat. So the protein folding community and the folding structure prediction community had this annual, biannual sort of competition called cast, the critical assessment for structure prediction. And they would run as blind assessment, like a olympics of protein structure prediction, where people would be given protein sequences whose structure was not known by anyone, only like one experimentalise who has deposited IT, and then they will be tested, and the true generalization ability of the model would be exhibited.

So we thought this problem really checked a number of key criterion, which we use for taking up a problem for very long term. So we started with a team. Investigated how much progress they can make on this. We are hopeful, optimistic that the machine learning can play in button to but we didn't know this was a new problem for us. And we approaching IT with a lot of this fact and push me.

what year was this when I started?

So we started around twenty seventeen, and we took out in a critical assessment at the end of radiation. And when he entered alcohol, one and twenty eighteen, we were not really sure. Like where would would you like maybe in the top three, but actually performed really, really well.

But not only was the state of the yard, but i'll perform the state of the yard by a punch. And that validated our sort of hypotheses, the basic research. Pho sophy, that deep has been the multiple discipline nature of the deeps.

So we had brought in some really good structural biology. And by of six people join jumper being the lead of was part of the team at that time, and that gave us a lot of confidence. Now we were the best in the world, but the model was still not useful, right? IT was reducing good to us, but IT was no way close to solving the problem. And then we had to certain make a bit, can we rarely, after IT, and solve IT one thing for all, but this is IT. And so the first thing we had to do was started from scratch, get to throw alphabet from the table, and said, this approach that we had started is .

not going to work. What gave you the indication that alcohol one couldn't take you to the next level? Because I think even in the A I space outside of science, there are a lot of questions around.

Can we just depend on the scaling laws? Do we need some ort of new unlock to get to you? Insert problem? Here could be A G I could be something else. What gave you the indication that this is great? We are so happy with our results, but we actually need to throw this out and start a new also.

full run had adopted a classical approach. With this classical two stage approach, what the machine learning models job was IT given A A sequence IT does not predict the three d coordinates of the mini assets directly. What IT predicts is basically the distance will be in a mini assets.

And then there's a second stage, which was supposed that distance make tricks and recover the three ordinary. So the machine learning neural lever's job was restricted to find the distances between the mino acid facilities, and this two stage sort of model was very effective. But IT was not very allegation in the sense that if you made certain matters, you were not able to back propagate back to the new records because you found the results after the second stage, and the new lego would not get that supervision.

So we believe that in order to be able to properly train the model, we needed end to end. We needed a model which could go directly from the sequence to the structure. And that was one critical of element and the change that needed to be made. But IT was a difficult change to me because you are starting from much lower this line when you are sort of building up that second enduring network.

So let's fast forward. So you did throw out alphabet. And then what happens after that?

So I look forward to we start this long journey. We start making progress of alcohol do with a much lower sort of performance from alcohol one. Even we have this central leader bowered.

Everyone in the team can can propose ideas and try out their ideas on the central leader to see how much of delta, each idea or each change, sort of mix. And we were making steady sort of progress. And then there were times so had properness would stagnate, and sometimes, even for months, IT would stagnate.

And people would ask the question, will happy reached the limit? But over time, and I think around and the pandemic started, we caught some really very big data um where we thought we are making real progress. And if you look at the matrix as to how do you wantin y broking structure, rediculous curacy or A G D D.

And we had cross that A T G D T sort of thresholds. And that was by unprecedented, of course, that also want to send us to push IT even further. And eight on to one nineteen G T and beyond, right? Which we got is what, uh, we needed to do.

And so the pandean mic happened, and IT really sort of brought home to the whole team the actual importance of the problem, because we were also are sitting in our homes of shading. And they were scientists out there who said, if you have the structure of a different sarce cope to proteins, IT would be really helpful. Now, the community very quickly found the structure of the spite protein, because IT was also waste sort of similar to our school one.

But the necessary proteins of the virus, though the structure for those was not no one. And so the fact that we could a compute these predictions share IT with experts who are trying to deal with the the dynamic and think about him designing india and so on, IT really brought to the team the real world impact on relevance that this fundamental problem has. And around september at twenty twenty, when the second cas competition ended, we got this email from the organizer, es want to chat.

And that that was unprecedented. People sort of surprised, like why the doorga ies have wanted to sort of chat. So be on.

And there was super surprised at how good the predictions were. In fact, some of them speculated maybe the team has. He did in our way, it's so good.

But apparently there was one particular for the scientists who had submitted a protein but did not know the structure. They had heard that the structure will be obtained by the time the competition ended. But this structure was not known to anyone, that to the anyone, and also could give them initial starting point, which can all the structure for that particular.

So they were totally amazed that such a stem now existed in the gas competition. And we later on of a release, although ld, and not only he was IT and he accurate, IT was also very efficient. So we decided to, in fact, find the structures for almost all the proteins that are known to scientists around to one and fifteen million of them, and put them in a data base without partners, the european and microbiology, an labor phy ebi bi. And then mean that as a resource that anyone can access.

Yeah, that's amazing. And i'd love to turn IT to U. V. J.

Yeah, you obviously have run a lab for a long time, and you've been on the other side of this, right? All these researchers who now have access to this database, which, by the way, for IT, for the audience, one of these structures may have taken the length of A P. H D, right, to solve a single structure.

And now we're talking about true scale. And also, again, this being deployed to all the researchers that can access to. So vig, maybe you can just speak to what that really means. And also, if we can apply this to other areas of science as well.

the impact of this is many fault. And I can speak to both from looking at IT, from the academic lens, but also from the last ten years of investing in startups. Startups use this as well.

First off, I think maybe it's worth really emphasizing the significance of structure itself. So the reason my university is like stanford has hold departments for structure biology is that the structure is typically pretty vocative of function and other biological aspects. Perhaps the most known, for example, is the DNA structure.

And that was in in create came up with this structure. And by looking at just the structure, you could imply how DNA is replicated and essentially how genetics works. Some degree is a very basic of IT.

And so maybe that's on the most sort of dramatic examples, but there's numerous examples where if you have the structure, you can understand the function. And so struck pology the family part of how we understand biology from the model scale up. And also for a drug design, often if we understand the structure and it's dynamics, we can understand how to drop proteins and and come up with therapies much more in an engineering fashion.

So the significance of structural biology is huge. It's also at the time where structural biology is in arena sis because, as you mentioned, IT used to take many years to come up with experimental structure that also new methods like crown can come up with structures in much shorter type, even takes. And so there's a different sounds going there.

And I think for structural biology field, I think we will see this combination of new experimental methods and computational methods. And I think what was most striking to me is how for analysts, we're going to these database and looking at them in using IT, almost like you would use the human genome database, that the human genome database takes genomics and turns them into a database. Look up that you can basically don't have to do, uh, the experiment yourself if you can just do the competition query to some degree.

I think what helpful did is IT took the structural biology team and made IT a database. Look up, not exactly a true database. Look up in the sense of this prediction.

But as the quality predictions get higher and higher, IT becomes kind of the same thing. So that's huge. I think the final thing that was, I think most striking is that there's always going to be a shift from academia to industry.

And maybe thirty years ago, academics with design computer chips and new types of microprocessors and so on, new architectures, we don't do that. Now, academy think that's not something that makes sense to do that much Better done in company is especially the scale was on. And I think what was most striking about this is that I think for multiple reasons, this is something that deep mind was perfectly suited to do in a way that academic groups, I think, really worked. And that shift now suggests that now I think it's a really interesting time for this sort of leave act. And now in the industrial, as world start some of these.

that's really interesting. The relationship you tacked out of academia and industry is something that people talk a lot about these days, is whether these different AI models can really fundamentally advanced science, the way that you typically think of academics as the parties that are ilitch that. And so i'd love to you from both of you, maybe starting with U. V J, what indications, whether it's through alphabet or other projects that you are seeing emerge, actually indicate that, yes, these models, these scientific discoveries, in a sense, are able to help us actually push the frontier instead of actually maybe just help us be a little more efficient within this one that we're already in.

I think push me said, well, that tradition is a foundational problem. But if you take, for instance, just a sort of archived m design. We're first you have to come up with understand the biology.

The AI for biology is a very interesting area where we can maybe start understand the nature of pathways, ds, and this, our human biology, in ways that don't require experiment on human thinks, which has always been when the biggest limitations, I think we understand mass pile really well because of all experience we can do, but we could never do that on human being directly. But A I models for humans, as they become more predictive, and especially the more predictive than a mouse is predictive for of human. The mouse is a model, in a sense, that is super interesting for unraveling biology.

And so A I for biology is a thing. We have to talk about A I for chemistry. And I think alpine ld is in the chemical ord, where now we're trying to understand by physical chemistry or try to understand how can we quickly drug on drug able proteins, how can we come up with antibodies and design proteins as a whole error.

And then finally, I think A I for clinical trials is going to be really where maybe the biggest impact financially will be. Criminal trials could cost hundreds of millions to billions of dollars, even at ten percent. Improvement on a billion dollar enterprise is huge.

And that were made, made some of the tougher problems to work on. But I think as we make impact there, I think clinical trials will be Better, will be probably more easily power and will be hopefully more successful because we will be picking the right ones to do. And then that turns into eventually A I for a personal advice, which is an a sensitive extension of that trial.

And so where now I don't want to experiment on me as a mouse or rap, but I would love to make sure I get the best drugs for me. And you and I are difference and will respond difference to drugs. Able to have that predicted would be huge. So I think there's the art of that. And I think we're just at the beginning.

definitely, we talked about alpher fold, which is very exciting and maybe the most familiar to folks, but punished your team has also created a bunch of other papers that touch this intersection of A I in science, you could say, A I math, A I in physics. And these are things like materials, graphics, which has to do with weather forecasting, fun search of a geometry. And so i'd love to hear from you again on this pRobing of are removing the frontier forward with these different models. What are you seeing from some of these other projects that your team is working on in terms of A I helping us actually uncover new science?

Essentially, what we have entered is basically an age where a single human mind cannot comprehend the data that we are catcher's about the universe. 嗯， and this is true in any field, you know, encounter IT is true in biology. No biologists get reason, and at least all the biological data that we can get them.

No physicists can look at an analyst. All the high energy physics data that is being gathered and even mathematicians cannot have, look at this on the large scale mathematical simulation data that we can now compute and simulate and find out. And I think what's happened is A I is not sort of nice to have its basically of almost necessity for us to make sense in reason about any problem that we are now looking at.

I have examples in bio mathematics where work on topology you describe, but not in two different sort of definition, there is an definition and there is a geometric definition. And mathematicians understood these scattered ation, but never understood the connections between that, right? And what we showing one of our sort of worth is basically, we generated a lot of data for not in these two characters ation.

And somehow, as new network, can you make predictions about one characteristic from another? And the idea was, well down, this should be known. But in fact, IT could make predictions.

And when we drill down, we found a very nice conjecture that nobody had encountered. And we work with mathematics with them, not only way that actually, that there was very elegant, nice relationship between those two character relations. So this is like completely fundamental discoveries s in mathematics that were completely unknown to my politicians, now being uncovered by a machine learning.

And they, I want to. And we are seeing this across the board in any of the scientific areas that we are looking at. We are discovering new insights, new sort of patterns that were not expected just because the techniques to analyze the raw scale of data did not exist.

I think amongst biology, especially maybe ten years ago. And further back, I think there is often a belief that biologic is just so complex that is just incomprehensible that there no way you to even understand that the only thing you can do is on the experiment, see what happens. And I think we're seeing the beginning of the shift.

People are starting to think, well, there are complex season. There's a lot we don't know a lot to learn. But then A I actually can gather all that together and start to decide for this thing to be a natural language for biology.

And I think there's no me this really fun cultural shift where ten years ago, people would say out ridiculous that computer could try to do these things. I think ten years from now, people be like ridiculus human being to that like you can't like metals numbers in your head that's even say that. And we've seen this another places like chess IT seemed like impossible that computer could be the grandmaster. And then now there is .

nothing with tries tlees cks.

Yeah, yeah. And we saw with go, we are all these other things. So I think that's just the cultural shift, but I don't think that's a bad thing.

I mean, uh, fourth lift can lift much more than the strongest weight lifter. And we view as a positive thing. It's always going to us and them. I think the interesting question will be as once you can do these things that we can do, what do we do together with that?

Yeah and what can we do? I mean, one of the most amazing things, I think, is that deep mind, for the most part, has given these models, were the results of them to the community. And so researchers have their hands on them.

And so maybe we could talk about that. How are researchers leveraging these new breakthroughs? There's all kinds of stats around we don't have enough cancer drugs or their and shortages, and these are very real things we want to fix.

So push me. Maybe we will start with you. What are you seeing and your team seeing in terms of this technology deployed and how researchers using IT?

yeah. So this was another sort of fascinating journey of brood as that new. I was not from the action sciences, so working on alcohol was a learning experience, but then actually releasing alph d to the community.

IT was even the bike of a learning experience. So I have a database. When we were sort of building IT up, we wanted IT to available everywhere in the planet to go on this of a scientist.

But the skin of of science was unprecedented. I was not aware of IT that the full database today has been access in hundred and ninety countries, and there have been one point six or seven million users of that for database. Now if that is not a positive statement about the planet, then I don't know what that is. There are one point seven billion people interested in protein structure prediction. I'm really happy about that.

All of the things that happening in the world and in those of the impact, it's again, like in amazing sort of spectrum we saw, after all, being used in park breaking fundamental biological discovery, like my personal favorite in that domain, the india poor complex, the structure of basically the poor conflict, like the way and nucleus controls how a material gets into the nucleus and and know, I mean, that undamned struction, that conflict will not known. And the searchers use alcohol to structures to be able to piece together the whole complex. Recent paper from the fan glad showed how you could develop a molecule range.

And again, if the user alcohol to design that. And there are so many other sort of areas where people have been using IT forward, developing new vaccines in working on new antibiotics against antibiotic resistance and synthetic biology. Like one of the key partners at the early stages was university paid in the U.

K, which was losing sort of half a fall to develop. And think about ensilage that could decompose plastics. So you have this, this spectrum of fundamental biology, drug discovery to even synthetic biology and inside development that has been impacted by alcohol. L, and so IT was very difficult to even predict what would be the uses of the dude.

I think there is also just within biog, there's become a shift that I think people are sort wrapping their heads around prediction a bit Better. I think before experiment was the gold standard, and that was all people want to hear about.

I mean, paris also, just as I guys at the time, when you deal with large language models, you're basically dealing with predictions of what comes people understood the present cause of predictions, but that there's massive value in having IT. And I think it's he hold funny that we would talk so much about the technology, but I think it's the human shift and the cultural shift of the things that we're gonna need to push. And I think what gets me most excited about what push means just been talking about is the fact that I think that's the sign ever seen this cultural shift as well.

Maybe something else you could speak to. V, J, that's just coming to mind as both of you are sharing more about these researchers. How does this change the economics of a lab? right?

If you think about what we talked about before is like uncovering a structure could have taken a whole P. H. D.

Now we have new tools, and we're seeing these economics change in some of our consumer fields. And those are very obvious. How does this change the economics of research overall?

One of the certain fantasies that one my former colleagues talked about, what we call a beach biotech, where you have, let's say, one person and a laptop with me on the beach, where have you on the beach? And you've got eros s as contract research realizations to do the experiments. You have some A W S.

Cloud, or whatever, some G, C, P. Cloud somewhere to run your calculations. And that one person with A I I think we're not quite there yet, but I think that's an intriguing fancy to think about.

And I think on the way to the one person sort of aspiration is smaller teams doing way more with much less capable outside and building start saying much more efficiently and where they get to results much more rapidly. The change is gonna. Why mention is for is that the getting to the commerical trials, speeding that up will be nice, but I think the big financial return will be on the class trial side. But I think the expectations is that A I for biology and understanding targets and so on based on human data that would also help on the trial size and in addition, anything else there. So I think put together, I think we can get to these thermos s faster, cheaper and how little Better yeah and .

maybe push meet, we could tackle that directly. If you could give a sense for folks who aren't these researchers who aren't already leveraging these tools, how much does IT really cost if someone does want to get a uh protein uh structure prediction or or use some of the other models that we've talked about, again, grave cast or materials extra, like what cost are we really looking at?

Yeah so for the output of is its literally free. You just go to the other four later base that have find the protein at your interest today out of the two hundred fifty million as of proteins, and make IT. And it's there is for free for anyone on the planet to use.

So really, IT has democrates things in the way that scientists in later amErica or india who was working on side of neglect, PICC diseases, for instance, at no a way they could get a structure of a protein that they were interested in, can now get, uh, access to these structures at the sort of click of a button. Of course, a lot of research, they have to be done to take that work and towards a more focused outcome. And a lot more investment is needed if you are trying to finish and accomplish the vision that V G, uh.

That outlined the four structures that start. But you really need to think about how does IT buying to the leg and how do you do the legend design, how do you solve the cohosh ding problem? So there is a lot of investment that is needed to make these models and make these predictions and fine them for specific applications.

And we have a finale from deep mind isotopic libs, which is now investing in this area as well. At the same time, we are continuing and work on the foundational sort of side of things and have now a released an announcement or and updated on the next generation of alphabet, which goes beyond proteins to other bio molecules to nuclear assets like D N A, R N A, B, D M. Small gains and saw.

I think it's amazing that you've opened this up to the community. And I think something i'd love to hear both of your takes on is really the relationship of these models and then being open source. I mean, it's a big debate within A I at large, but I think especially when IT comes to science, there's I think both ends of the spectrum and away, right?

I think there's nothing more that people get excited about about this idea of Carrying cancer like solving poverty and and agriculture crisis. But at the same time, people also get very scared, right? I think that's where people's sivi nightmares come to be, right? Whether like, oh, someone can engineer a molecule that can kill us all. And I guess, starting with U, V, J, what's your take on this relationship of A I and science and why IT should be open source?

I think the beauty of open source, and we see this open source for A I and biology, but I more proud, I is that people can build on top of each other. And I think what's really remarkable about the I field, I was over the last five may be possible ten years that feels like an amazing result, comes out like once a week and that the key part of that is that IT comes out with code or get up the report and that you can check out merely.

You don't even have to just believe the results. You can run at yourself. People even open source for this tests of things so essential.

We're building like a skyscrapers. Which person builds a new floor, and we're going really fast. And that's what open source can do in the past.

If he wasn't open source, i'd have to read the paper, i'd have to code IT myself and sometimes the paper may be little ague for some details. So I might not bother, right? And i'll just go do my thing.

And so I think what open source allows us to do is to build on top of each other and be rapidly. Now certain parts won't be open source. I think you fortunately can't open source a drug compound because then no one's gone to pay, pay for the trial and certain things like that.

Just the economics doesn't make sense given these hundreds of millions of billions of dollars and so on. So certain parts will be close to us and there's hundreds of start up A I in biology and A I drug design that will maybe take advances of what been, develop their own methods and build on top. And then that's where I think that the drugs will come from.

He talked about also the concern for how because this is so powerful, we could maybe do a certain dangerous things with IT. And that's everything. There's a bit of a misconception because actually there is a huge symmetry between the complexity of drug design for treating disease, and that's a really hard problem to do.

But IT actually turns out to be really easy to come out with, come up goes that actually are dangerous and talked. In fact, that's why we have faced one trials because that even the things that you thought would really hopefully not be toxic at all turns out to be toxic. So it's actually very easy to make toxic things. And google will teach you actually how do I get right and and how to get all these other stuff for Better, worse. So I think there the a symmetry is that if we get rid of A I for a drug design, you lose all the good and you don't prevent, to me the bad, which is already to here.

I think that's a good point that a lot of people don't think about fish me. Maybe you could just speak to why did my has chosen to open source these models, which isn't necessarily the norm across different A I companies?

There was a lot of deliberation within the team and within the company on this. I think there were a few different things that went in to in to the final decision. One was we wanted to end, like was that foundational there? IT was so foundational, IT would, if we had capture close source, the impact of IT that fully leverage ging, the impact for society, I mean, that would have been difficult.

IT was because it's so fundamentally sort of foundation and it's very hard even predict what are the potential that of applications of IT. Just to give you an example, when we long shelf fold a couple of days a later, somebody had an analysis on the uncertainty associated with for four predictions and figured out that in fact, alcohol was, even though I was not trained for, that was the best predicted for predicting disorder in proteins. So exactly something that we would not have come up with, right, if if he had captured clothes, someone they held you interacting with the models in the community figured that out.

So when you were thinking about IT, that was, of course, how to maximize the social impact and a science fix impact of the one. The second one was responsibility. And we consulted a number of experts, from structure biology, from chemistry, from dark discovery, to figure out what is the right and responsible and safe approach here.

And even considering the malicious sort of use cases, and after we had done on the real religions that we felt that this was safe to release and the impact of releasing IT and open sourcing IT in the width of a way would outweigh costs that we would need to sort of model, I was decided that we should open to something. And I think the decision has been validated, the impact that 到之后 to has had in the community。 Now of course, that's not true for all the different models. In fact is subsequently, we have had models which we have not open source. But I think in the case of about hold to, the decision was very, very clear in favor of shading IT with the world in the most feeling way possible for .

the ones that you haven't chosen to open source. If you're willing to share.

how do you make that decision? Did a number of different factors, both what will be the so should back the scientific impact of releasing thing was is what is the commercial cost of releasing something while leveraging IT for commercial is all even the safety sort of argument. So just to give me example, one of our recent models I announced last year was I formed things, and this is a model for a predicting effect of missions feelings and what the model does IT IT produces state of the art accuracy. Y in making predictions about whether mission sprees are the or Better or could be patchy.

And in this particular case, we found that the predictions of the model for the human genome, for the human lessons variant, like the seventy one million of them, if he released that, that would serve most of the purposes that a clinic or abolish would be interested in. So we just released the predictions rather than the model because the model had many other certain users. You could run IT on different alisons. There were others of commercial considerations. So IT was found that we could release the predictions, we could share the methodology, but we will not sort of open source approach .

that makes sense. And I think at the very outset, you share so many different projects or areas of scientific study that your team is working on. I'm just so curious because that sounds like there has been success across many. Are there any areas of science or mathematics that you've tried to address with this approach of using machine learning in A I that's not quite working, whether IT be because we don't have the prior dataset as vj has spoken to that sets the foundation. I'm just so curious if there are limitations emerging in any of these fields .

that your team is turning into one specification that I would love to have him back on, right? I think I would eventually have is a school biology is an incredibly important sort of problem to really understand at the system level how biological distance behave. It's just the data and evaluation is not at a place where IT is for maybe genomic uh functionality makes for struction energy.

Before we actually start and initiative in any of these area, there is a huge dilligah process that we need to undergo because essentially, you are making a very long term commitment and the careers and the impact of something, the best scientists and engineers that we have are being committed to that areas. When you take that responsibility very seriously and only when the impact will be confident of the impact of the problem, we are confident that we have a good evaluation metrics. To track progress. And we have the raw material, the data on assisting dator to get good data. Only then do we make that long term commitment towards a specific topic.

the highlighted data issue. I think one of the biggest differences between A I for, lets say, language models or A I for video and A I for biology or for healthcare is that I think most of the interesting data in BIOS health care is either dark, that there is always medical records and so on, that you just get access on the internet, which would be very useful for understanding the health, your side trial sign so on.

It's either dark or it's never been measured. And so we need to do the experiments. I think having the data could be and I think that's going to different than other places, places.

Maybe the algorithms can really drive things because everyone has the same data more or less. I think here people be differentiated by their data. And so the innovations will be innovations and A I combined with innovations and data collection. And there obviously things, I bet, interface for active learning and how can use the data, more fictional entities and so on. But the data game, my things is going to huge.

absolutely. And V, J, i'd love to just get your take. You've spoken to a few examples. Are already what different areas do you wish that more attention was being allocated? Or do you just think there's a set of grand chAllenges is that can and will eventually be solved with some of this technology?

The first thing about cash, but this critical assessment structure prediction is that I think IT also inspired all these other prospective trials and prospective study. So there's a last stuff to do, and I think there's, uh, test for a preventing biting of small molecules.

So though I think will see in time these times of methods do tremeau in those assessments, but the holy grail is, in my mind, being able to predict ical trials and something where you to understand how a drug works in human biology. And that's a push me, because in is as a systems biology at the largest skill. And so that is the holy grill.

And I think we will probably do IT in parts. You could imagine, even like models were specific organs, or models specific parts of the body, then we put them together. Pictures of experts is pretty common these days. And maybe that will be in one approach.

But how are you gets done once that gets done to the point where these models are Better than the animal models? I think that's where there's really you going to be a tipping point and up point where we can just move much, much more rapidly where we can sort of not get study with having to run these animals with six time is very expensive. Even there's crazy things like right now, there's a monkey shortage because monkeys are in such high demand to run these experiments. So I think there is our long roads to get there where these models of humans are more predictive than the alternative. But I think once we get there, that would be a major incident point.

Wow, I did not know there was a monkey shortage, but I mean, that really is important to know, right? As to your point, hopefully we get to a future where some of the things that we're doing in research today seemed just so incredibly outdated because we just have Better options. Push me what's next up for deep mind in terms of areas of interest? I mean, you're already working on so many things, but would love to just get a polls on what's exciting for you to.

I think, good. What is fascinating about science and make in any of these fields is that is so much more to work on. I mean, even on structural prediction, I just mention that the leaves this version of apple, the world there is on extending IT to general biology.

Cus like DNA, understanding R N A, understanding the interactions between small molecules begins. And protein's like bigger complexes. Antibodies is so many things that we can extend in genomics. We have worked on both gene expression according part of the genome back with the best invariant and the noncoding part of right, or like critical ally expression.

We have made progress, but we are not completely at end of IT, right? So there is a lot that we are doing in all these areas in material science to mention this model name, which you was able to do, predict four hundred thousand novel stable compounds, which explains the number of stable compound known by more than order of making good, right? But how do you now think of some sort of compounds and then a reason about their specific properties that would be useful in a particular application, right? So in any of these disciplines, we are not targeting once perfect mines from the areas. Saying here is a topic, and that the long terms of the road pp is to think about the paradise shift in how science is done in that idea and move towards a more rational modeling, this approach and tracking some of the problems that are encountered in. So there's a lot that need to be done, and we are just trying to focus on some specific area and the new areas come up, if that all materials are there in terms of data and if on the value constructs, reviewing them as it's amazing.

I haven't done as much research as U. V. J, but I did do a summer of battery research and materials research where we were trying to discover new sodium iron transition metal materials.

And my summer was literally, I mean, as when I was in college, so I wasn't very advanced, but I was literally like finding a paper that documented how to synthesize this material in the kon, mixing IT up, cleaning a little battery, doing IT in the glove box and running IT. And just seeing how effective IT was. And obviously, in many cases, IT was very ineffective. But every so often we found a material IT was truly just trial narrow, trial narrow, trial narrow. And when I see papers like this, like do things in a completely new way at scale, wage, cheaper, you don't have all of these university students to send a glove box day and night is so exciting.

The end point for me is like, as we talked about, we're kind of in middle of this journey, and that is technological journey, this cultural journey in these cultural shift and that it's going to feel like the big goals that i've laid out was a clinker trial things and insistence biog. That's so far right. And it's going to take a while.

W, but we can get a lot done in ten years, collectively, sixteen years. You think about where we were five years ago, ten years ago, fifteen years ago, fifteen years ago. People on't really talk king that much about deep learning or just beginning.

So that goes that we have our lofty, but I think we're writing the sake of IT. And all that I think is very dull just now building that power one step at the time. We fun to have to chat again in five years.

hopefully sooner.

I think one sort of thing that having very exciting in last few years is the rise. Of course, there's a lot of excitement about s and foundation model sense of vote.

And if you look at the impact that's going to have on science now in most of the rogier that I was talking to you about, we were working with structure data, data either which was collected or in the case of some of confusion, work data that persisted, but with the rise of foundation monthlies, that opens up the possibility of now using unstructured ata to feed these models. And so that really opens the door for a large scale injection of a scientific knowledge in do the models. And that is a very exciting direction that will, I think, a being a number of other problems now in the feasibility itys zone, which previously were not there.

Of course, there are chAllenges with understanding uncertainty and sort of pollution nation and all these of technical problems need to be or addressed. But once that is done, I think the impact that's going to have all models for scientific discovery would be amazing. So that's another reason to be excited for the future. absolutely.

And all of the problems you just mentioned are also opportunities for people to go and fix and be a part of that whole ecosystem. So this has been really wonderful fish meat. V, I thank you for, as you said, getting people excited about what's to come on because I think these two els interacting, what is time to be a lot of here in twenty twenty four to kind of be a part of IT? Like you said, beg were in our equivalent ninety. So twenty people in the twenty one twenty will look back .

at this fund lutely.

If you like this episode, if you meet IT this far, help us grow the show, share with a friend, or if your feeling really ambitious, you can leave us a review at latest podcast dot com slash basic cincy. Candidly, producing a forecast can sometimes feel like you're just talking to avoid. And so if you did like this episode, if you like any of our, please let us know. I'll see you next time.

Can AI Advance Science? DeepMind's VP of Science Weighs In 55:48 Share

a16z Podcast

Deep Dive

Shownotes Transcript

Can AI Advance Science? DeepMind's VP of Science Weighs In