We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

How Does AI Actually Work?

2023/10/24

Waveform: The MKBHD Podcast

This episode delves into the definition and functionality of AI, addressing common misconceptions and setting the stage for a detailed discussion with an AI expert from Google.

Support away from comes from A T N T. What's I like to get the new iphone sixteen pro with A T N T next up at any time? It's like you first let up the girl and think of all the mouth watering possibilities. But how to get the new iphone 7 proof apple intelligence on them and the latest iphone year with A T T next up， anytime A N T connecting changes everything, apple intelligence is coming fall twenty twenty four with syria and device language set to us, english, some features and languages will be coming over the next year zero dollar offer or may not be available on future iphones next up, anytime features may be continue at any time subject to change traditional terms fees and restrictions apply C A T T 点 com flash iphone for details hey.

it's lee from the koto with new IP top, we spend a lot of time talking about some of the most important people in taking business about what they're putting resources to and why they think it's so critical for the future. That's why we're doing this special series diving into some of the most unique ways companies are spending money today.

For instance, what does that mean to start buying and using A I at work? How much is that costing companies? What part are they buying? And most importantly, what are they doing with the and of course, podcasts? Yes, the thing you're listening to you right now, well, it's increasingly being produced directly by companies like venture capital firms, investment funds and a new crop of creators who one day want to be investors themselves.

And what is actually going on with these acquisitions this year, especially in A I space, why are so many big players in deciding not to acquire and instead license can hire away co founders? The answer, IT turns out, is a lot more complicated than that seems. You'll hear all that and more this month from decoder with the latex presented by strike. You can listen to the coder wherever you get your podcast.

What's up? People in internet, in people of the internet? Yes, it's David a today. We get a little bonus epson for you. Don't worry, we've got a regulate epo de on friday, still the station for that. But I wanted to dig a little bit deep into what exactly AI is, right?

I think we all have been hearing about that for months now, years possibly, but nobody actually has really explained like what IT is or how IT works, right? Like people to say that things are AI. But what is that? Even me.

So I want to give you an answer for that. So I called up my friend from google, who definitely knows what that means, and made a little nice conversation about how this all works. So hope you enjoy. Dania was gracious enough to come in podcast and a persume ent of my cafe t classic so yeah, we're going to deep brief after this but enjoy.

We've been talking a lot about a AI and generated A I and all of the stuff that's happening in the world right now, and it's very confusing. So we thought I would be actually pretty useful if we got someone who knew what they were talking about to keep on the podcast. We weren't just speculating constantly.

So today, uh, we have Daniel bunga with us ah he is google head of generating A I or director of general I. So we're going to have a long conversation about what that means, right? So Daniel, if you would explain to someone, including me what you do at google or as your job, what would that be?

We we try to invite a generate vi solutions into production grade applications for comply startups and or enterprises.

So that would that include, like a company comes to you, they say we wanna use generate vi and then you work with them to actually integrate IT .

into their product, that's corde. But we do have many other teams that really focus on that long tail work of integration to say our interests, to figure out what are the patterns that are not necessarily common at this point or the new ones, and then really turn that into ten x scale packages. So as you know, many of these technology items, especially within the A I space, are fairly new.

yeah. So they demand new technologies. They demand new approaches to technologies. And then what do is to try to figure out what are the patterns within this a bit open ecosystem at this point and package these patterns into applications that we can now either give to the teams that are more consistently working on with we so on putting that into their product or open source these capabilities so that some folks can use that. So stead of .

just throwing in a chatbot that is based on a large language model, you're actually integrating a specific solution that makes sense to that company.

Exactly, I think about early days of programing when people were writing code, right? So you have a bunch of folks writing programs. And then sometimes I think about the eighties and nineties, this this common pattern around design pattern study emerged where some folks would say, hey, put these things together and then it's going to be call the specific pattern.

And then based on the design pattern, you so much create a new language and a new mechanism for people to use technology a bit more consistent manner. So it's what we do. So we try to understand what are the design patterns of A I N N A I, and then put that into you to technology and or educational artifacts for people to use.

So I want to get into a little bit about what A I actually is, because we talk IT a lot about AI and generated AI and all the stuff on the podcast. It's like the only conversation happening right now and for the last year. But I think that something that confuses a lot of people is the fact that you see all these companies that are saying we have AI now, we have AI now and nobody really knows what that means.

Sometimes that means they added a large language model chatbot. Sometimes that means they added some stuff under the hood that is doing a lot of work. Sometimes that means they're just rebranding, something that wasn't really AI into AI. So in your words, what is AI in relation to what we're seeing in the industry right now?

So um to me, right A I is a is a system so to say it's a collection of took technic and engineering capabilities. And we get when we get to talk about the of A I am also going to talk about in terms of a system because I don't think it's necessarily one single thing um and IT has evolved over time.

But if you look at that system and at A I system overall, um it's it's a collection of tools and technologies that are really gear towards uh providing human cognitive capabilities to uh computers and make IT that these computers can accelerate. The process is through which we uh we produce a different things in technology. So you can think of A I is being a collection of planning and scheduling and sensing the the world around us and understanding that will into a set of cognitive uh uh containers, so to say, and i'm being able to do other things out of that level of understanding. So A I is so much bringing intelligence, human intelligence, analog to the computers overall. And so I know that is a bit of a complex definition, but that that's where we are now in understanding that as a system. And when you try to break that down into what that really means in terms of technology and IT comes in three major forms um one of the major form which is a ureteral which is A I encompasses things like uh I planning, sensing, scheduling and then processing that data that is sensed um with A A certain list of tools and these tools are usually borried from the mathematical worlds of statistic, probability and and combining the collection of these tools is what we traditionally call machine learning um so A I is bigger than machine learning and and within machine learning in a set of tools that are mathematical and not and in a subset of these tools, which is also the essence of a generate, A I is a family of techniques called deep learning. And so deep learning is uh involved with using neural network, so to say, which is almost a an artifical representation or analog or modeling of what the brain could possibly look like to the full extent of our understanding of IT and try to trying to represent essentially a data structure that would be used to process and set of technic that would be used to process the data that is sensed just to really .

back for sex that people understand the difference between those three can you in a couple of sentences define the difference between machine learning or like individually what is machine learning, what is deep learning and then what is um of the third money said yeah I guess A I yeah between those three can you can you define them in like two sentences each got IT.

So with A I you want the machine to do things that seem human, sort of say, right? Imagine being here and someone asks you, hey, David, is the color of the car garage? You would have to do a few things.

You would have to plan the way you would get out and get to the girl. You would have to look at this artifact in the garage and understand IT as a car. And then you would have to understand colors and then look at that and say, okay, the color is right, for example.

So there is this set of steps policy that you have to Carry out as a human, intelligent person, that would say, to plan my way out, I want to plan my way into the garage. I'm going to look at this object, detect that object as a car, and then eventually detect the colors, right? So there are few things, things that you do now if you were to break that.

So that's A, I thought to say, uh, imagine that the system could do that. Imagine asking a robot to do the same set of task. Then overall, I would consider that to be A I. Now you should break that into some levels of deep detail.

The and taking out the planning and scheduling, what are the techniques that to use possibly for navigating these ecosystem all the way up until you get to the garage? What are the different techniques that you use in order to analyze that object and understand that as a car? And so that set of techniques is is machine, machine learning.

kay. So it's like machine vision and like object recognition. That kind of stuff would be the machine learning techniques that get applied on top of A I that create the machine learning mechanism.

exactly. So machine learning could be considered just the mathematical on the painting set of artifacts that you would use as a subset of I OK. And in deep learning is just one of these techniques that within the test, the context of machine learning, there are different techniques. Like one of them is called a nearest neighbors. It's usually proceeded with a, uh, what A K so k is for a number.

We can say, what are the four years neighbors to David and dani and as a matter of fact, would look at all the people that are within these buildings and then understand what are the people that have a distance that is the four uh, closest distance to for those of four neighbors or that's just one technique out of mini techniques. There's not a technic called support Victor machines, and there are many other technical like regression, classification and so on. And so for now, deep learning is a subset of all of these techniques that uses neural networks as a representation of the data that you would use the process in order to identify objects, classify objects, and so, and so what? So you get A I as a bigger bucket that has other things, including planning and scheduling and sensing. You get machine learning, that is more focus on the mathematical and the probable techniques. You get deep learning, that is just one of the application mine learning make that focuses more on artificial neural.

And then deep learning. Did that become something that became very popular in the vacuum? Because IT was IT was discovered that IT was a very good way to do machine learning to people try to do a bunch of different machine learning techniques, but deeper learning just became the most .

useful one that is correct. And when we so bringing bring them back to you initial question, which was how how do these techniques or how do these definitions relate to the current state of affairs? Yes, machine it's very related because, uh, machine learning has been applied for while deep learning also for the last ten, twenty years old day.

Yeah so the technic have been around. But then the techniques were boosted based on the address of a couple of capabilities. And until we started observing that deep learning was really doing two things.

One IT was that I was able to process a large amount of data. And so the traditional machine learning technique, e supervise learning and so on, they would tend to put to when when you give you too much day. So I will give you some performance and at some point, IT wouldn't really give you more.

So IT doesn't scale. So you start having diminishing returns to expand a lot of computer capabilities, but you're not really getting good results. Um but with deep learning, IT was seeing that you can one paralyzed that aggressively if you have a lot of computer capabilities, G P S and O T P S and two IT wouldn't necessarily play to that means that you can give you .

a lot of data and perform up.

And so what what we saw was that um the techniques have been applied for the last many years, but the technic are increasingly getting Better and Better given the adv of additional capabilities that are supporting that increasing performance.

What is that additional capability that has really tipped the scale, especially in the last year?

Uh, this go back to the last six years, so to say, with the invention of the transformer architecture.

want to explain what that is.

So the transformer architecture is is uh I was created in twenty seventeen and before that there were many other architectures within deep learning ecosystem that were used to process data skill um one of the abilities for these systems to process text, for example, or sequences of data, things like music, things like video, things that have to deal with frames um was had been studied for many years. right? So we had sequence sequence models.

We had things that we call A S T M long short term memory models that essentially made IT possible for someone to uh process sequence data and even possibly generate sequence data. But the problem with those architectures of that, if you have a text, if you have an entire page, and then you want to eat, summarized at all, analyze that, then you have to put the entire thing into the marble. And and so we started having limitations with the capabilities that the machines himself would have to host that amount of text in order for you to ask a specific question of that text. For example, what is this text talking about or generate a summary of this text one now. So there were some skating issues because .

if you were to synthesize the entire page of text, it's hard. It's more computationally expensive to the n plus one degree to generate or to send the size more and .

more text as you had words exactly. And it's ale. It's ale. The of thing is that to improve the quality, specifically, when you have to analyze things at text, you want to maintain a certain, say, grammatical structure.

If if you're being ask a question about the sentence, sometimes the answer is really towards the end of a sentence, but you have to maintain the context, wait the beginning of the sentence. So there was this idea of essentially of keeping a maintaining the structure of the content that you are analyzing by um applying different mechanisms. And one of the mechanisms that was invented with a transformer architecture in is what we call the attention mechanism.

So the attention mechanism is the mechanism through which, within the neural network, it's possible for you to maintain a structure or keep information about how that the specific words are related within the text that your analyzing. So essentially, you coming up with a mechanism through which you can you can analyze a large amount of text while still maintaining the information about how these specific tokens and or words, words is just one representation, or tokens of one representation of words are related within that context. Now IT gets very expensive competition ally, and on memory and storage to get that done.

And that was the chAllenge, three thousand and seventeen. What the transformer architecture brought um about was the ability to process this large amount of data, maintain the structure that they have and is not being extremely are expensive on the hardware, the storage age and the computer. So then IT was possible by basically paralyzing some of these architectures to make IT possible for you to process of very large amount of data, build extremely scalable, very, very internet scale models if you had a hardware for IT.

And then eventually being able to get some intelligence out of IT. And then a couple of things started happening. One is that you remember when I say that if you were able to uh, basically breakthrough the atom of dimension returns, when you start having more and and more seeing things going this way, where you you get more and more performance and eventually you get new abilities, you get emergent abilities out of models.

When you say emerge abilities, do you mean things that we didn't .

expect exactly traditionally we and what we call supervised models based on task and and so essentially what I would be would be that you would go to the model and say, what's the color, this object? And so that's a that's a model that is trained towards understanding, given an object, what is the color? And so the way you do that is that you you give IT a lot of examples that are labeled, and you say, this is a mug, the mug is read and black, this is another mug, this mug is White, and so on. And so for you, take get manually exactly in the next time you show IT some data IT if you you know mog um but it's is expensive to basically train one model that can recognize mugs, recognize people, also answer question and so on. And so for so being able to give multiple tasks, so to say, to a single model that you train ones was a chAllenge.

But we can make a multimodal.

yes, multiple del has a couple of, yeah, you can make a multitask and multi model multiple l essentially to the level of simplicity really means that you able to get the model to analyze images and text and audio and video at the same time, right? And then I IT could be multiple tal input, single output.

I A U you train the model to see images, text, audio, video, but you only asking any questions about text, the text format, um that part I am is called the pictures is worth modern, a thousand words essentially get multiple pictures within the model, get IT to learn from IT. But the way interact with IT is still in the text OK. So we started seeing benefits were very, very large models that had seen a lot of data coming out of the entire not an entire, but a huge part of crowd website, for example.

Uh, a huge part of you know, data is available out there. Started behaving in such a way that they had these almost general purpose intelligence um they could do reasoning up to a certain extent and that is tested by giving IT some mathematical problems. And I would do derivation so to say, assuming that I had seen some of these derivations in um some mathematical books, for example, or writing so IT would learn that structure, leveraging the attention mechanism and being able to derive the answers step by step and gave you a special .

against and is that still considered an emergent property if he was being fed different levels of their of derivation through different text input?

That's a very good question. So what do think that make the thing that makes that an emerging property is the fact that it's doing that in a multitask fashion. So remember, initially, we would train one model to do one thing. So if IT was one model that was trained only on doing a derivation of a specific mathematical problem, said that would be very simple.

IT wouldn't merge, but if you train one model that can do that on the mathematical corpus at the same time, take an A A T exam at the same time, uh give you a summary of a specific piece of uh text that you give IT and at the same time right code at the same time, optimize code and review code at the same time. So those are the different kinds of emerging property. Is that a multi task? All, say, a large model, I am is able to do in europe.

Inie is, are those emergent properties kind of subsets of the attention mechanism? Like is that the thing that really allows you to do these kind of things?

One analog that um I would I would give you is, you know in physics for people, when you have particles that are moving at a very, very fast pace, so to say in a contained environment that you start getting temperature, yeah my heat and and want up, and if they move faster, then you get higher and higher temperature. Temperature itself or heat itself is not necessary something that is a is a physical artifact.

It's emergence of that fast movement, but that movement itself is very simple. So similarly, the specific elements that you feed the model get to learn about each other until they get this interaction move through which they basically function to have simplistic function mechanism at a very, very law level. And there's almost this transformation is face transition, yes, where the higher level thing, which is the model starts giving you some of the specific behaviors in a multitask fashion.

Skirts didn't a pated to be over have that are based on things you did give IT, but you didn't realize we're connected .

exactly a couple of voter emerging properties. One of my favorite is, is called in context learning, where basically large model now would learn from what we so again, in traditionally, you would want to give an input to the model and then model give you an answer that is a straight and a input ule relationship. But some of these models today, you could say, hey, give me an answer that looks like this.

Or here for demonstrations of the kinds of questions that I will be asking you. Therefore, going forward from now, I need you to be answering these questions in this matter. And for some reason, it's able to remember that contacts learn from this demonstration that gave IT and then start giving you answer is going forward. That sounds like that and .

that's why is that something we didn't expect .

that to be able to do exactly. That's why systems like like ChatGPT your bar are a very interesting in that sense because you can even we tell the the system, hey, you are a knowledgeable scientist about this field, given that background, start answering my questions and then IT will be giving you some very interesting and in there are many ways you can get creative about that space, right? You can say you are very funny and creative.

What is start giving me answers within this specific uh, steps. In the last emerging property i'm going to talk about is body called chain of thought or or reasoning. I think I spoke about IT a bit earlier where the the model or the A I system is able to give you a step by step break down on on how IT came up with the interest. And that finished .

we expected to exactly OK. So that was a lot I think there's a lot of answers to this question. But effectively, IT seems like A I is sort of the outer layer where you try to teach machine like human analogs to a machine.

And then you've got machine learning, which is a subset of AI, and deep learning, which is a subset of machine learning. And then when you feed these models just this enormous amount of data, you end up with these emergent properties that you are not really expecting. We're going to get a little bit deeper into the emerging properties and very philosopher next to stick ground. I think we get a coffee first.

Hey, it's lee from decoder with the detail. We spent a lot of time talking about some of the most important people in taking business about what they're putting resources to and why do they think it's so critical for the future. That's why we're doing this special series, diving into some of the most unique ways companies are spending money today.

For instance, what does that mean to start buying and using A I at work? How much is that costing companies? What products are they buying? And most importantly, what are they doing with IT? And of course, yes, the thing you're listening to you right now, well, it's increasingly being produced directly by companies like venture capital firms, investment funds and a new crop of creators who one day want to be investors themselves.

And what is actually going on with these acquisitions this year, especially in A I space, why are so many big players and t not to acquire and instead license can hire away co founders? The answer, IT turns out, is a lot more complicated than that seems. You'll hear all that and more this month. I M decoder with the light of presented by strike, you can listen to decoder whatever you get .

your podcast with mx platform, you can really be in the now access to razi priority notify yes for a pm checkout with five hotels and resorts book through amx travel. We needed this and dedicated card member entrances at the elected. Then let's go me. You can focus on the present moment that the powerful backing of american terms apply, learn more at amErica and expressed to come slats with a max card member entrance access .

not limited to A M X platon card. So I think that because large language models and chat bots and things like di are sort of the only things that a lot of Normal people in their everyday life have seen A I affecting. Is there what what else is the transformer transforming? Like what what what industries are being kind of lake pulled up by A I and what's actually driving that?

Because I think that most people just see like, oh, we've got ChatGPT oh, now this random APP that I never talked to have a chat by for some reason, right? But like we hear all across every industry that every industry is being uplifted by AI. So is that also transformer based? And how is that working sense? Is not using a language model.

right? So the transformer started the revolution. So so the ability to have these emerging properties. And but that was in two thousand and seventeen.

It's been what six years now um since then spin a lot of evolution of that specific architecture has been a lot of creativity around you building some of these A I systems generative A I systems that can generate uh images or a text or given some text give you some image or given some image give you some text as scepters ing and all applying this paradise shift to say into many a uh industries and many applications. There are two ways our take we can look at this. One is all score A I is not gone, right? So we're still using that.

We're still applying of these techniques or recommended systems. When you go on the website, you're still being recommended to how defects some things to to buy and or uh suggestions of books to read and what not so many of these initial applications of A I R. They're really, really uh useful for very large companies that have the abilities.

And this is one thing that I really like talking about, very big companies that have the ability to hire hundreds of engineers. So they say, are dozens of engineers highty train highly paid that can build some of these highly tune systems that would scale to say hundreds of millions of users for the businesses that are not the multimillion dollar businesses. We seeing new opportunities open up because these industries can now use some of these generated vi systems.

In the past, you needed about seven months to eighteen months to build an application with programmer, designers, uh, product managers and so on so forth. But now if you have a vision, well, you can go on board and say, hey, this is my vision that help me iterate on that, give me five ideas that are related to and then after that, you can say, hey, now, right of a product requirement m and specifications for a system that may look like that. And then you can say, hey, based on all of these interaction, right, to project plan and you can iterate on that context with with t the chat botts say, and and after that you can say, hey, considering these artifacts or considering everything that we've talked about, help me write a design document that I could use to implementing this up solution would do that.

And then you can say, now I need you to help me implement this in python. You know, designed A P. S for me, right? The implementations of A S.

For me. Write the system design for me. You could even help me draw some of these things.

And so what you seeing is that you moving from uh um the life cycle where you had to use about eighteen months with a team of ten to even get an idea into a good shape to probably a matter of hours, two weeks working with prompts and being very creative in the way you interact without bot or as a smaller group you interact without bot yeah do to come up with a solution that pretty, pretty good yeah and so what I see is that many industries are many stop and enterprises are really, really taken advantage of that. I've seen good examples in media. I've seen good examples in in health care and live scientists.

I've seen good examples in financial services. But in all things is country. I'm seeing a lot of a movement.

Do you use these kind of systems in your own work to build your own guys and stuff? You use bar for your own work? Yeah.

yeah. But I used by every day, every time I have an idea, every time I want to process something, I used by to elevate an idea. wow. Um I used bark for a outlines if I if I need to give a talk, for example, of conference um usually for me, the process of creating content would be based on the work that depends on the topic, but based on the work that I do and based on some research I tried to come up with a special outline that really touches on the points that I would like to talk about. And so I use bar to create to help me create that outlined.

And then I may outline myself and give you back to bar and to, hey, help me summarized this and or help me extract specific talking points out of this. yeah. And then I can say, hey, make this a bit more creative and make this a Better, more in different types of tones.

There is that one mode of interaction, that a mood of interaction is the one that I spoke about earlier, which is when I have an idea, rough idea, say, I want to create a system that helps you determine what coffee you are going to drink, warning based on park, whatever example like that. And so I can form minutes specific questions and interact with barred in that way. And I could have a prototype before the end of the day that works, that is implemented. Python full, stuck on end back. Yeah.

yeah. That's a good productivity explosion. exactly. I want to a real back a little bit because we we talked about AI, we talked about machine learning, we talked about deep learning.

But the big thing that's being that's on everyone's mind in the last year is generated by which you've talked about multiple time so far. But we didn't really define what generate AI is and what makes you different from those other forms of x. So can you give a quick explanation of what general AI actually is?

Remember, we talked about A I overall being a system, not just one thing. Uh, so in machine learning being a of techniques that are more the nature deep learning, one of techniques that focus a bit more on. So by virtually of of getting something that is a lot more fundamental, generate V I S A deep learning technical. So it's still using the deep learning technologies, but generate V I is really focus on generating or creating a specific other fact. And so that art effect could be on image IT could be a piece of text or IT could be a piece of audio or IT could be something else yeah that that of those simpler stic definition of what generative .

and what what is the foundation of general vi? Like what allows that told work? We see things like generated fill and photoshop.

We see generated music. Now like that, every single creative industry and non creative industry is being sort of this generated content. What is allowing systems to actually generate content instead of just classifying content?

yes. So that's a beautiful question in a sense that there is a very, very strong common omatla among all of these things to transformer architect talk about. All right. So what we've got, we ve seen is dad applying the same technique and then changing the question a little bit gives you exactly uh, content that is generated that that you're interested in. For example, we can say using these lower transformative tecture help me generate an image.

You can give up the the native A I problem as given images of different art effects like animals, like cats and dogs or not, create something that looks like some of these things, uh, using, I don't know, interpolation on extra population on different technic, and make IT look like the family of things that i've shown you in the past. And I will give you something that doesn't existing real life, maybe the image, very high five image of a dog or cat that doesn't existing reality, but really, really looks like the samples of the things that you've shown IT in the past. So that ability for these models to essentially create a content in different meal, this is the generative ability.

yes. So we think about like large language models being fed into a transformer, right? That's just like give me all of the text that has ever been written in on the internet. We can develop relationships between words.

But when you when you're generating an image or if you're generating audio, what is being fed into the transformer in that way, right? Because we see, you know, there's a lot of genetics work that is being worked on with transformers. To what kind of data do you feed into transformers to actually make that work in a variety of different fields.

So in china, you will give you today. And text was very easy, easier to acquire. That's why you hear of large language models a lot more, right? And I in the results also from generating text where a lot more impressive and exciting to look at.

That's why, in my opinion, that field to what took over. But you're right. So there you could consider the input to be pretty much anything that could be put into a sequence.

A video, for example, is a sequence of frames, right? So you could give multiple videos uh broken down into frames um to uh a transformer based architecture and you get a bit more complex in away those sequences of process or structure is maintained. There are many techniques around the the attention mechanism and so and so for OK, let's consider that to be a black box and and IT knows how to do that.

Then what you give IT is a set of frames which are videos to say, and then you say, give me something that looks like that. So in that chance, you've given IT videos or or of set of frames. Um you could also have a mechanism through which to give the videos and tags, which we do today. There is this um encoding model that is called clip percentile, putting together images and videos I mean in tax.

which is the foundations of di and a lot of AI image generation .

of these founding technique of of these kind of abilities where you you you teach the model to recognize images and text together as a joint entity source and process through which you do that is by getting the images process with what we call toga ized and or encoder specific to an image and that turns that into a vector, or we call that an embedding.

And then you do, you go through the same kind of process with a text where you turn the tags into a vector. And then once you have these two factors, you can then combine them with basically algebra. And then at a higher level, you have to task and order question that you want a model to answer. In one scenario, you could, you would want the models to say, for example, given an image, explain the content of these image for me. Or you have the reverse problem, which is given a tax, generate an image that contains the the information so to say that i've provided text, which is business of Midjourney.

Yeah so as kind of a break that down, you're depending on the field that you're trying to use transformers on, you are turning data into numbers and you're comparing those numbers to each other and then getting an output. 对。 So because you're able to take video or images or text and vector ized them and turn them into tokens, you can compare them to each other even .

though they're different types of media. Celine, things that makes IT really work beautifully is because once you take the images or video or audio, you encode that into initial vector. That process is called organization. Then once you get to token way, the tokens can be a bit more complex. For example, the gani zer could learn to not just use a word per token mapping, but IT could also speak words into two or three if that word has a .

bit multiple meaning.

multiple meanings, sexes, or if IT finds .

the effective so ganim .

some toga ization. So you may have a situation where, uh, a five words, uh sentence gives you twelve of fifty tokens. So make me this. So it's a matter. It's a the concept is more about information preservation within a substructure that is a vector, rather than one to one mapping between the words and and and vector.

Same with images, an images, the two dimensional structure, which has a third dimension of red, Green, blue, right? So if you flatten that entire thing into A L intensity over that entire two times, uh, times three, sort to say I mentioned then then you get a larger veteran but that's just a simpler stic organization where you say, hey, i'm gonna atten an image. Flatten that more by region blue and then after that i'm gonna a Victor representing the pixel.

From there you can have a deeper organization that may consider the structure, for example, that it's N, C of objects, or the distance between objects, or even some deeper level of understanding of the objects within that image. At the end of day, you go from a place of our effect, like audio. And in an audio you use a spectrograms and and you turn that into specific to artifact.

So you you go from an asset to a vector. Now this is other steps called embedding, which is basically doing a projector of that, a projection of that vector onto a vector space. That is, by every other piece of art effects and other piece of data in that place.

like a .

moralization and like a Normalization. But and by that projection, what you essentially do, especially if you have a motive model, if you work with an, you work with text, for example, then you tocom nize them each, which is want to want relationship between the image and attacks and the toggling zit that works for them.

And once you have these two vectors, you do that projection onto that shared vector space to say, and the beautiful thing about that is that, and you do that through training, the beautiful thing about that is that wants to learn these things within the same space. They become of the same nature, right? 所以 you can start comparing them yeah so you can start assigning relations， making, uh having statements like a car C A R written in text form compared to the im, compared to the image of the in it's you're taking you're taking .

one language and another language and you're sharing them in a certain way that once you have the shared say, you translate them to attain and you can do whatever .

you want from there and substrate of all these different things. Problem at this is that there is information that is preserved in these different types of art effects. So you almost do in an information extraction .

and describe that. What do you mean? Information IT may be .

a long a conversation, but the end of thing you but the end of they the information and I knew you had a whole A A video about the nature of information. Yes, I could be textual zed to the piece of artifacts that working uh you working with but in the various implementation information is this uh entity or this thing that can give you it's hard to define information without .

using information yeah you do .

is this thing that can give you a bit of a pattern, right? So and we usually base that pattern on the notion of order disorder, a try and to on into forth. Yeah but if you have something that can um give you a pattern about indifference and or disorder about the specific up system, then you start having information.

For example, if I do. Nothing has change very much. So if you were on the receptive end of that pattern, you won't really get much information. But if I do. There is a difference in what I did before and what I am doing out now.

You may not understand why i'm doing that, but you would understand that there's a difference between what I, the way I tap, and frequency at which I tap my hands before, and frequency I wish I top IT after. Then you've gained information. yeah.

So it's the same way that you may understand some differences within an image, for example, looking at a contour, and then something changed between this and this. Then you may realize that these may be two different objects, and so on and so forth. And within text as well, you may have difference maybe between words or between paragraphs and between different structure.

So you have some form of information. And a beautiful thing about information is that he could be combined. So the area of information is what you're so tainting is the extraction of information and or differences in patterns within different modalities of data. Auto fact.

Beautiful thing about that is that I could be combined at certain level or compared yeah, that's what makes IT possible for you to essentially extract information out of an image by and how different IT is or how many different patterns exit in that image, treating information out of a piece of text by understanding how many different patterns exist, really not text. And in putting that together in the Normal place, which you can start in pairing, yes. And in reversing that, you can now combine text to images and basic that .

relation with all of that combined is that would you say, would that be the fundamentals of like a general AI that could do everything .

we're getting into the realm of A G I?

Yes, I would love your opinion on that if you feel comfortable talking.

of course. So um what is intelligence according to you?

According to me? This is such a big question. Um i've thought about this a lot.

My personal opinion on this at this point is, well, further listening ers, we're going to find agi really quickly, ags, artificial general intelligence effectively, meaning you can ask an A I to do anything that a human could be able to do, or possibly even more, right? And I could be able to help you with that. Would you agree that's the definition of .

an extended definition that's somewhat why um asking the question of what is because agreeing on A G I being artificial general intelligence assumes that we agree on .

what interesting sure okay my definition intelligence would be wo thanks a the ability to synthesize ze information and create a create new actions based on information that you are explicitly told to do that be probably my definition of intelligence is a decent definition um would you .

disagree that the context in which you have to do that specific workflow that you define has to be defined? I E I, you have to do IT within the context of, I don't know, literature or robotics, automation in the sub field, for example, having a robot that can control specific ARM either for surgery. And that would be a different thing if that robot control and arms, say, in a restaurant.

And so I am so forth. I think that I think that when we talk about the generalization of intelligence or even information, we are making a bold claim that goes beyond what we understand so far about the nature of these things. right.

sure. So I I see. So if I want to break down the problem of A I might have already expressed that i'm not of a big found of that definition because I don't really think we know exactly what we mean when we say that. sure. Um but if we want to get into a practical uh realm, I think that may be possible to essential and which is the the state in which we are now by getting these models to progress in their ability to impact the world as well. So we we discuss the software version of the A I so far, which is you give a data I could recognize IT, or at this point, I can also generate data.

But what is a software real world interaction mode at this point? So we have many systems, for example, and healthy care and ice sciences that have to deal with the real world in the way that say a hospital equipment functions, oh in the way that um robotic ARM that controlled cameras functions. So you get many other things about a real world that may have to do with intellect. So I think a lot of a work that we're doing on improving the quality of these A I systems has to bring things all the way up to this definition AI that are mentioned earlier, which involved in and include planning, scheduling and acting and sending as well. So when you start augmenting these systems with these additional capability, and you start training agents that are able to plan and schedule and act on in the real world, then you get that sense of A G I.

That is closer to definition to give IT, right? The we, the ability to do that at that level, at that scale gets chAllenged by where are you sensing what kind of information and also where are you acting in which kind of world environments, right? And if you want to look at the real world in which we Operate and you want to look at all the type of interactions and actions that can happen, it's the the number of possibilities is a larger than the number of atoms in universe, right? And so how would you have a general intelligence system that knows how to act in this entire world that find that quite a chAllenging thing to believe? But if you constrain the problem, if you make the problem simple, as simple as I want to have a generally intelligent system that would learn how to use all the hospital equipments within the hospital system, then maybe you have the opportunity to have an A G I system that can essentially taking the task and execute that effectively. So that is my um uh technical timm view of the possibilities of agi by training agents that have world reposition representations. But these are similar world's representations that are constrained by the problem facing which you want these systems to Operate and then have been able to plan, schedule, sense and act, including the other type of capability these they can do.

Okay, interesting. So dinner doesn't think that we're gonna have this one omission, A G I artificial general intelligence that's going to be handling everything, but he rather things that we're going to have these smaller, more specialized ais that kind of handle different tastes and help us devil lot faster.

This is actually not that different from that whole conversation around the test of right, like where you could have a robot that's like a human that does human tests or you can have a bunch of really small robots that handle the test that we already do on a daily basis. Then the same thing, pretty interesting. Um in the next segment, we're going to get into the problem of A I hoecker, which is where I just makes up a ton of random stuff, and that's clearly a problem.

I was very curious about that side of the fun version. Plus, we need to see how fast they you can type. So sorry for that.

This episode is brought to you by google geri with the german I APP. You can talk alive and have a real time conversation with an A I assistant. It's great for all kinds of things, like if you want to practice for your upcoming interview, ask for advice on things to do in a new city, or brainstorm creative ideas.

And by the way, the script was actually read by gami. Download the gi APP for I O S and android today must be eighteen plus to use. Support for the show .

today comes from net sweet and about where the economy is headed. You're not alone. If you ask nine experts, you're likely to get ten different answers.

So unless you're fortune teller and it's perfectly OK that you're not, nobody can save for certain. So make to trick the future proof your business in times like these. That's why over thirty eight thousand businesses are already setting their future plans with that sweet by oracle.

This Operated cloud erp brings accounting, financial management, inventory, hr and more onto one unified platform, letting use streamline Operations and cut down on costs with the sweet ts real time insights and forecasting tools, you're not just managing your business. You're anticipating its next move. You can close the books in days, not weeks, and keep your focus forward on what's coming next plus next. Sweet has compiled insights about how A I A machine learning may affect your business and how to best see use this new opportunity so you can download the cfs guide to A I and machine learning at the sweet dot com slash way form the guy is free to you. And next swe dot com flash away from next week dot m flash away from Alice wanted .

to have and ask your question.

So yeah sorry, I really liked what you said about um defining intelligence as a pertains to agi. And I thought David brought up a really important kind of intelligence like intuition and deduction and the ability to uh extract not just pieces of information but threads and systems of information from multiple kinds of context. But there's lots of other kinds of intelligence that people like cognitive scientists like to define and classify things like spain reasoning um things like engaging and dialectical thinking um and these are all intelligence that we've ve observed in ourselves and so when we think about sort of a general purpose with army knife ai, do you think that we should be limiting that to the kinds of tasks that our brains do you on a daily basis? Or do you see that there's going to be almost like new methods of thinking and new cognitive strength that emerge as a these neil networks stronger?

That's that's a super interesting question. I for in a practical sense um i'm actually with you, David, on that definition, right? Because I think that that's the form of intel gent that could be mechanically or mechanically implemented in a piece of software as in program, right by our own intuition. We can think about doing things like that by breaking down into steps. The kind of um intel's ence that you're talking about to me is a bit more like like that emergence marginability that I don't think we've gone to the point where we can .

perceive what does are yeah .

or intuitively .

color yeah initially .

ly know what exactly we need to do in order for the models to have that special awareness or that other accountability. Now we can program that by having segment models and having distance calculations and IT coming up with a mathematical histin through which we can claim that we've achieve that capability.

But I would argue that the way we learn us is not exactly the way we teach the machines how to do that, right? So there is definitely a lot, a lot more research. And we may stumble upon you.

We may basically strike luck and and find out that other kinds of scaling mechanisms, or the weight works in physics of days, that you have smaller systems, you have a simple interaction mood, like magnetization or just collision analysts, all the different forces that we working with, about four of them. And based on these simple interaction mood, you get the entire university where we are the fundamentals of physics. Yeah, this is is the way we we know.

But IT maybe are not a way right. We may have just sensitive in our own Operatives of sensing in that account of form, and we are able to explain IT the way we explain IT, but is still a projection on on, on the screen that we are looking at and doing out as is. So i'm, i'm, i'm super excited about the possibility of us finding out more cognitive route, so to say, in the way these systems learn.

And right now, the best tools that we have eventually in our laboratory of A I are these deep learning to the transformers and are many other architecture that are being built around that. Things like uh, memory aware news network of architectures or things like the abilities to pull from a vector store and augment knowledge, uh, with the retrieval augmented generation capability. So I feel like the more we add interaction mode and information into uh a retrieval and use utilization capabilities within these models, the more possibilities we have to have this additional emerging capability.

That is a lot more community that the mechanic way we've been doing things. So I think that's an hoping question. I think is a beautiful question. And I hope we get lucky in a lifetime to find a way to get that done. Me too.

we've stumbled upon a lot of random stuff in science. So there's definitely a possibility that we have that happen, which should be big.

So I think is Richard fineman that say that science is the belief in the ignorance of the expert. So I I think that if we if we really take IT as a as a basic principle, that we could stumble upon some things and where we believe that whatever we know so far may, may, may not be away, then we have an opportunity to really incorporate new information in our knowledge that can get us faster and further.

I want to this back a little bit, back to some practical stuff. Again, a bit yeah no I love philosophical conversation. I love the physical. And I think that one thing that people think about when they think about AI uh is the problem of halcon ation. And um for people that don't know hlubis is basically when you generate something that just isn't right or isn't true um in large language models, you can ask a question and IT will confidently lie to you sometimes.

And how do you look at how we're going to solve that problem? Because IT seems like part of generative AI and part of large language bottle in general is that it's just its party information based on probabilities, and those probabilities are not always going to be correct. So you're, i'm assuming, working on ways to make these ads more accurate, accuracy obvious going going to be a major a major problem and something that we need to solve over the next couple of years. How do you look at solving the hallucination problem.

the the problem of five signals? So the way these models work now is that you give you a lot of data and the question you really asking of IT is um give me that simplify token to word and working within the text to main give me the next word based on this word that I gave you right?

So if you if the if you qualify the problem as write a novel for me or write a parada or write summary of something, then traditionally what would happen is that you would give IT the beginning of uh A A sentence and you would say a complete this sentence for me. And so it's that sentence completion. How to say that is based on probability and even basing that on probability in within the context of conversation is a, so a lot more going on.

But the basic principles of works that IT would assume that IT works off of the most probable word to follow that word that existed. And in taking that longer sentence as an input, figuring out what is the most probable work that could follow until what that means, if you simply y the problem just at that level. And if I say, give me a complete the sentence, doctor, something works at jon hob skins or something like that, then I would just put the .

name there, right?

right? So the the the question you haven't ask is make sure that that name is an existing human being that is really a doctor at china. Options somewhat not right. So fundamentally, it's a different question to ask all of that system. And then we're back on the reason I call these things, the system in the beginnings is because, yes, you may have a model gives you the next word prediction, the next open prediction, but then you still need to do a lot more work on top of that input and not output.

And even that processing sometimes to make sure that the output and the responsible to get out of IT is a truthful one or real one night, oh less toxic one if the answer is toxic and you don't want to to serve toxic to your users. So there are many preprocessing and post processing activities that need to happen. One to uh um make sure that the context that mean the the answer of the model is grounded.

We call that concept grounding grounded in reality. And the second is to make sure that the contact, the output of that model um goes by a certain inset of respond AI principles, right? So those are two things.

But fundamentally the way of science work is that I would give you something, whether that things do not sure it's about, it's your job to make sure that that thing becomes true. And so the way that happens then now is the unit associate that responds to basic your source, truth. right? What is truth?

Yeah yeah.

What is truth? What is reality? And that's that's another thing where you another another reason why you probably want to contain and context ize that you say down to a source of truth. I gave me doctor blood that works at an opines that you need to probably have a data of all the doctors at work in the hospital and make sure that after you get the name of a doctor, because the model will give you that you chek out against a database, and if that person doesn't exist, or you can say, fill this specific spot off of the names, the list of names database, and constraining, constraining.

too. So that's why bar now has that google button where you can ask a question and then you can double check IT.

That's a lot of context. That's a lot of mechanism for for that. But that's not exactly why IT has that. IT has that button. 嗯嗯， the just just to land on the content on the concept of halcon ation.

So IT was named halcon ation because I could give you some answers that seem real but are not necessarily real. But this is a Normal functioning mode of these technologies. Um the reason he took us a while to release part, for example, was not because what we have to transformer.

So we've how to do this thing for a long time of the additional set of technology is that we have to build the principles that we have to really build around the behaviors of a model that really get us to you. One, the requirements of build addition, building additional technologies and to the chAllenge around making these technologies deterministic in the sense that you always want a specific answer. So you have to do a lot more without tions.

You have to do a lot more checks and baLancers. You have to add the number of metrics. Like, is this model answering your question when he doesn't know the answer? You probably won't qualify that into something that gets checked.

You and one and two for so has been a lot of work that we've done on one, uh really having clear and concise responsible AI principles and then two, turning those into technologies and or checking mechanisms that could work in conjunction with a creation, the Operation uh, in the Operation of a model and then three, making sure that these cores and and outputs of checks are available so that that technology could be used on the cloud cover stem. For example, as a form we research to understand what are we are principles that could be turning to metrics and god rails and so on. Those get turned into product capabilities that work alongside our models.

And then these models are exposed or commercialize, so to say, on our cloud platform called vertex AI. And you could could find that out them on, you know, cloud google, right? So that's that's how we we essentially fighting the problem of of how destination there's a lot more going .

on in that space. okay. Well, I think i'm going to close IT out here soon, but I I want to end with asking if you think that there's anything that we missed, anything that people would gain a lot from hearing about that they just are not hearing in popular media. That's very important to the whole A I story.

Uh, two things. Maybe one is the consumer applications of A I O bar. ChatGPT are very popular now.

So which is something that i'm happy about because I think that it's really bringing the conversation close and closer to everyone. And you have been working in time for while. So we may have been aware of that coming up and coming together. I think it's a massive opportunity that today um people that are news editors or writers or artists or folks are work in different domains uh can use some of these things to help them right Better to help them a generate images that I can use as part of a content that IT producing, create to write Better letters, to write Better, to do homework and answer on and to for so I really love the the consumer application but one of the things that I don't think I talked about a lot is the developer experience um and also the way the barrier of entry from a creativity and product generation, product creation that point is getting really, really lower with these set of technologies.

And so I I really think that we are at the cost of a new form of economy where creation are of valuable items, of different kinds of forms, would not just be a matter of a few being able to do that because they have a high, a high training and they spent years doing everyone undergrad computer science and two and two th. But if you bring that level of assistive, creativity, ability to dance asses, or, yes, I found that people have ideas, right? Like people are created.

If you sit down and you tell someone, let me take away the problem of knowing how to implement these ideas and talk about your ideas, you get many ideas to start merging. So I think that we are ready at the border of a transformation where the economy may take a different form. If different people, without the need to really understanding details how to implement some of these ideas, are able to one iterate on the ideas for assistance of generative A I, to validation some of these ideas with ability to prototype those in the matter of hours rather than years, and then three tests ideas in the ecosystem, maybe find value for different people that he could commercialize these ideas for. So i'm very optimistic about the possibilities of days in the future.

All right. Well, the last thing we're gonna. We have a little game here that we play when we bring guest on worth, we figure out how quickly they can type the alphabet. It's a running score board. Um you can use either the macbook keyboard you can use so you that .

so .

you get I wanted ask .

the AI to type this thing for.

So you get three chances.

Um what what is the most optimize way of typing the hall after that?

As soon as you start typing, IT starts, uh so as soon as you typed the letter a, i'll start. And does he have to enter at the end? No, you want to enter, a senza will finish.

And now and a no types is that out.

So if you miss .

a letter like let's say you miss b and go on to see to .

hit every single letter .

and you'll see at the in order type a .

doesn't tell you the letter.

you're suppposed .

we give people test at all, just three chances against three total chances OK, right?

Get A G.

It's harder. That looks definitely hard.

IT looks, that's good. You get IT j, this is what you get. Three chances somewhere about IT, I was extremely slow.

Okay, so first run twenty six seconds.

Now just .

hit reset.

Can I change the key word? Yeah, you can change key words. So no understand why there are options.

Yeah we have mechanical keyboard. We also have the butterfly .

keyboard .

that um apple cells for.

nice. Okay.

twenty six nine.

yeah, twenty six to nine point eight. Much Better come up, much .

Better.

Not that I not far, I was not far, is actually really .

ever for I D. We've seen some things in here that you would not believe.

I'll show you the score word after this in your get ready.

ready.

Nice at eight point seven three. Not bad, not bad.

But where where is that? The board, David?

So here's the leader board fastest. Tom Scott, three point five seconds. IT was in saying that was crazy a lot.

Um wow. So let's see, eight point seven three is red deva brannan wow actually faster than David lau et. He might be a magician, but you're magician on the keyboard.

A point seven, seven, three, three. You also be his son. Monash.

hey, has son.

C, B, S, sona oge David bland and Brandon on nice, nice, nice, cool. right? Well, thank you again. Seriously, thank you for coming. Where can people find you on the internet?

Um well, i'm dean banga on x now. Okay, now a days and i'm link linked in as well. So as done back and description.

And do you wants to shout out your any projects that you're finishing up right now? We're working on right now that google can people can see google.

So the the vertical A I platform is really the platform i'm working on, right? So that that's what we would put our solutions on. And I say I would say that look forward to many other more industries like to main adapted uh uh, capabilities around them because I think that large models are big thing, and I think that requires a lot of additional technologies to actually make IT work in in applications.

And I think that this is about the time where we need to come up with things like design patterns, right? So if you think about a ganga, for example, the book that was needed when programing needed some kind of structure. So I think we are the place in time now where we need some kind of structure on how we build and deploy large application, large models, enterprise environment. And that's something .

not working on swe. Well, everyone watching, listening at home, if you were surprised that we had episode today, don't worry, we have Normal episode coming on friday. This was just a little extra story for you, so hope you enjoy and we'll .

see you on friday.

Just please.

Support for the show comes from A T, N T. What does he feel like to get the new iphone sixteen pro with A N T next up anytime? It's like twenty first, light up the grill and think of all the mouth watering possibilities.

Learn how to get the new iphone sixteen pro with apple intelligence on A T N T and the latest iphone every year with A N T next up anytime A N T connecting changes everything. Apple intelligence coming fall twenty twenty four with theory and device language set to U. S.

english. Some features and languages will be coming over the next year. Zero dollar offer may not be available on future iphones next up, anytime feature, maybe this continue at any time, subject to change. Additional fees, terms and restrictions apply C A T 点 com sash iphone for details。

How Does AI Actually Work? 01:10:10 Share

Waveform: The MKBHD Podcast

Shownotes Transcript

How Does AI Actually Work?