841: Andrew Ng on AI Vision, Agents and Business Value

2024/12/3

Super Data Science: ML & AI Podcast with Jon Krohn

AI Deep Dive AI Chapters Transcript

A

Andrew Ng

J

Jon Krohn

Jon Krohn 询问企业应该如何在追求更强大的模型和利用更有效的智能体架构之间平衡投资。Andrew Ng 认为，除了少数大型 AI 公司外，几乎所有公司都应该专注于构建使用智能体工作流程的应用程序。他指出，大型语言模型的使用成本正在迅速下降，过去一年下降了约 80%。他建议企业优先构建有价值的应用程序，只有在应用成功且成本过高时才考虑优化成本。他认为，大多数企业的生成式 AI 账单非常低，无需过度关注成本优化。他建议使用最好的模型，构建有效的应用，只有在应用成功且成本过高时才考虑优化成本。他认为，在构建有价值的应用之前，过早地优化成本是不明智的。 Andrew Ng 进一步解释说，大型语言模型强大的能力部分源于其所处理数据的丰富性，而非算法的复杂性。他认为，现代 AI 正在结合两种历史方法：Marvin Minsky 的多智能体系统理论和推动深度学习发展的单一算法理论。他认为，智能体工作流程可以使 AI 模型能够针对不同的任务进行专门化。

Deep Dive

This is ups, so number eight hundred and forty one with doctor Andrew, executive chairman of landing A I.

Welcome to the super data science podcast. The most listen to podcast in the data science industry. Each week, we bring you fun and inspiring people in ideas, expLoring the cutting edge of machine learning, AI and related technologies that are transforming our world for the Better.

I'm your host, john cone. Thanks for joining me today. And now let's make the complex, simple.

Welcome back to the super data science pond cast. Our guest today is Andrew in, who might suspect pretty much anyone working in data science nose. Nevertheless, i'll introduce them. Some of his intimidating accomplishments include being director of stanford universities guy lab, where his research group played a key role in the development of deep learnt, which let him to founding the influential google brain team, as well as educating millions on machine learning and leading him to co found in coursera, is also managing director of A I phone a world, leaving A I venture studio. He was CEO and is now executive chairman of landing A I A computer vision platform that specializes in domain specific large vision models analogous to alliance for language.

Um he also founded deeper learning dot A I which provides excEllent technical training on machine learning, deep learning and generate A I as well as many other associated subjects and so much I could say about him but we'll end IT off here by saying that Andrew was also coceres and cofounder and chairman of core sera which brought online learning from three hundred universities to over a hundred million students today. Ove, at up AI conference in new york a few weeks ago, I conducted this q and a session with Andrew immediately after he gave a talk. So some of my questions referred back to that talk.

That said, the interview should be clear to understand without being aware of Andrews talk because I think I have enough context at each point. But just in case your curious, we've included the slides from his talk in the show notes so you can check those out. One quark about this interview is that at the end, Andrew shares his screen to demonstrate cutting edge vision model capabilities.

Uh, screener ing obviously is an ideal in an audio only podcast. But if that section doesn't completely resonate with you, you can check out the youtube version of this episode to get the full picture in today's episode entry details why a cheaper A I model with smart agents, A I workflows might not perform more expensive, more advanced models. He provides the surprising truth about A I, A, P, I costs most businesses don't realize.

He talks about how the society of mind theory from the one thousand and eighty is making an unexpected comeback in modern ai e. Talks about a ground breaking a new way to process visual data that goes beyond traditional computer vision. And he wraps up by talking about why on structure data could be the key to A S next big revolution. Are you ready for this special episode? Let's go.

Welcome to the second stage for your interactive session. That was an amazing talk, as we always expect from you. This is my procession that i'm hosting.

Today's introduced myself to the audience as well to you, Andrew. I'm john ron. I'm cheap. The scientists and cofounder of an A I start called, but i'm perhaps best known as the host of super data science, which is the word most listen to data science podcast.

And i'm delighted that for three years in a row now, i've been hosting sessions here at scale A I. So thanks, insight partner, for inviting me back again. And in your talk, uh, you discuss how your team found that with GPT three point five and an agent's workflow, IT can outperform a more advanced foundational models such as GPT four with a zero shot approach. How should companies baLance their investments between pursuing more powerful models versus leveraging more effective .

agent architectures? I think almost all companies um with the exception of a few little a giants uh, should be focus sitting on building applications using agents. Work grows if you have an extra few bill despite biomes to in auction is you want to do a girlfriend you know just thanks to spend a few billion on this maybe um uh uh but I think that for most of businesses is a very the so many opportunities to build applications on and IT turns out that uh uh if you look at the use of gene vi, the cost of using these models is falling rapidly. So over the last year, half, uh, is fAllen by maybe about eighty percent year on year.

So I find that you kind of two years ago, there were teens word about O G P T four is kind of expensive, but the Prices are following is so quickly that I would advise a very much more about building something valuable and then I think is a good chance that the use of these N P S uh to use geni, we just become cheap er overtime and and there are company spending many millions of dollars on genre ABS uh so can get expensive but the vast majority of business I C have genus vi bills. Uh uh if building application on top of you, open the door and up, pick these other A P S. I see so many businesses are getting so much value, all of IT. And Frankly, the bill is sending, you know, is so small that that that you, you, you be surprised how small. And now you you will be surprised .

that may be surprised IT makes perfect sense in yes. So unless, uh, people do have billions of dollars to spend, let me move onto a related question where uh, assuming that people aren't going to be trying to train their own elem himself, if you're an enterprise, should you be thinking more about always trying to use the latest and greatest um LLM or be thinking about grasp the best agenticity kfw ws IT? Seems like this was kind of a trade off there between cost and efficiency because yes, while costs have gone down dramatically, say by eighty percent, you could save a lot of money by working with GPT four mini instead of GPT four o and so if I can be using that cheaper GPT for a mini and getting Better results by leveraging a more effective agents workflow IT seems like, do you think that's the way to go for the most party?

You know, I would say don't worry about I I I feel like as a as a general suggestion, I would say they were about the Price of B L M H H.

Oh yes, so and I think, oh, but development purposes is actually, you know is not impossible but on on so I I still know you to clean myself right and and sometimes I would be spending all day on the weekend and coding media, many yours experimenting and then find at the end of the day um I just ran up like a five dollar open the idea here right yeah and and now IT IT is possible that there are some magenta les they can get more expensive IT is possible to run out you know tens of dollars may be low. Hundreds of dollars but is actually cheap er than you would think. And so what advisable scenes is the hardest thing is, uh, just building something that works that's so pretty hard.

So use the best model, build something that works. And after you something, if if we're so lucky to build something so valuable that is using just too expensive, there's a wonderful problems to have. A lot of few people have problems and I wish, but when we have that problem, we then often have tools allow the cost. But I think that um uh a lot more people worried about the Price of using these generous V I A P S than than is necessary the case um and and the most important thing is so I would say use the best model, use demo, use the latest best model, just build something that is done and only if you succeed that that and only IT turns out to expensive that work on the cost optimistic .

o and if you are looking enough to get to that stage. Maybe there's a baLance of experimenting both with lower costs options like moving to GPT four mini say um or experimenting with with different agents workshops and just trying to see which gets you the best results for your use case.

Yeah yes. And and just be clear, there are teams that have been founded. They are spending too much money on these and they spend time optimizing IT, so can use cheaper models。 You can take a smaller model and do something.

So I find you need to optimize for your own worker. So the most people tools, but I think using these other tools optimize costs before you. You are first build something valuable. I think that that that will most likely be premature. Al optimization and I I would shy away from from .

that nice uh a great answer um digg into agents A I A bit more, which was a big theme in your talk and something you said is maybe the technology we should be most excited about at this time going back to a wired article in two thousand thirteen, you mention how in the early days of A I, the prevAiling opinion was that human intelligence derived from thousands of simple agents working in concert.

This is what mt is Marvin n minsk y called the society of mind. But then you mentioned later in this wireless article that you stumbled upon the single algorithm theory popularized by jeff hawkins, which let you to deep learning. Now, eleven years after that wired article, our agents and multi agent systems, in particular, marrying both of these concepts together, where, you know, we kind of have both ideas now blend, ending together in providing powerful tools.

Question boy, are talking about them, about that, about taking you out. So I think um what's been remarkable about the large language model revolution is how much of IT is because of one of the rest of his small number of algorithms, you a name with the transformer human network and IT turns out that the weeks in large, large models of which are based on the new network called the transforming network, the reason they can demonstrate such an amazing capabilities is able to say a lot of this because of the rich ness of the data. We feel that.

So does this hypothesis not prevent that? This hypotheses, that even human intelligence, you are the human brain, a lot of human intelligence, is due to there are very small number of horizons that when fed all the originals of data from the world, allows us to learn to do all of these amazing things that humans can do. And then um as children drop in the adults, they also really too large part, I think because of the data they were fed, maybe lobert genetics, maybe look by the ala, but really loved the data you have the same infant brain could grow up right to become a doctor or or or architect, soft and whatever.

And that's the data in the interactions. And I think agency workflows always dangerous to make an logic with the A I and humans. But I think there is a little bit of that um uh uh getting the A I models to specialized a little bit for different task based on how we prompted or how we be the additional data to do specific job row job a jobs such row based stars.

nice. Yeah and I appreciate that you love us taking that out. Wish I could take credit. I have an amazing researcher surge mi so I have to give uh credit uh to for pulling out.

I questioned that idea um I want to move onto large vision models now is a topic which follows from um that five T A I translated you kept coming back to in your talk. So this is particularly related to the image processing revolution that's coming that you mentioned on that by K A I trans slide. So landing A I has a product called vision agent that LED the way in this image processing revolution trend. Can you elaborate on why planning using multiple tools and co generation are so crucial for building effective vision A I applications and how a vision agent addresses these chAllenges?

I think the vision revolution is coming a bit after the text posting revolution and IT, a result that a large multi model, they are， at least today, they're kind of OK interpreting images. But when I mentioned on the way we write text prompts in non agency grap flow, that's that's a bit like you asked to be the title essay from the first way, the last minute one go, the way that that launch multi model models of vision language models because I uses if imagine that I watch the soft I could say, you know uh maybe uh here's a picture, take a gLance, give me the answer right and do some things we could do that um where I know, for example, if I asked you to uh uh yes um doing the general of counting the number of people on the football field on the sort of you and you know can I have show a picture of a bunch of people and take the class how many people are there actually how to do if you also going to come the number of people on the city one, two, three, four, five and that's more the iterative agented work though rather than, you know this a picture was the answer. We should be more of zero shot.

Just type the answer and somebody found was that um if we generate a plan that is expressed in code to say these are the tools, this is the function, causes you detect the people one of the time, and just count how many people are to detected the very simple plan, the simple plan like that expressing code process images much more um actually for many confirmation critical image SARS and we found also that follow the image vision workload. You know writing the code require going to find the right library, the right open, small go into something that a lot of this kind of crafty, annoying coding well that we could do but IT takes like half a day. Write an agent to write a lot that code for us to come with the plan, right code to the express the plan and anteco. And and so we found that to a really low the bar for developers, uh, to get the lot of the high stakes very important visual A I text of questions answer and hence the vision agents um so not work to do there but I actually quite excited at uh the number of users yeah they are using a successful ly to build software for for visually itals yeah IT .

is exciting to see how this is going to accelerate. I A hundred percent agree with you that the text processing volume is something we're in the mist of now and people are only begin to realize those those applications and things like your uh you the worth that you're doing in landing A I is going to be the next big thing for sure, having those extra modalities provide more optionality uh, in real world applications.

I have one last question for myself before I get to some audience questions. And this is related to your fifth and final K A I trend on that same um five AI translated I was just mentioning and this is related to unstructured ata. So you previously talked about how um by volume most of the world's data are unstructured. And so with the rise or generate A I were now able to tackle this vast amount of on structure data with the kinds of technology that uses talking about with vision, vision agent and other large vision models, how do you see visual A I transforming industries outside of traditional use cases like manufacturing and health care? And what untapped areas might benefit most from these vision capabilities.

I think IT would be a lot of and I find IT i'm i'm gonna give that unsafe fine answer IT is look at like where will electricity be used as like boy doesn't scatter because is so general but maybe on uh I think that uh uh definite manufacturing I think the body automation including so driving calls that we review for I think heal care I think security um and then maybe but am I put the speed are can I share something and can I be sure can people see this?

I don't know the answer to that question and working around the room um I I can see IT and yeah i'm getting .

a thugs up from back stage that we just put weekend. But this is a video retrieval task where we use a vision agent, the right code and that achieve these videos. So I can let me just I know h let's see gray wolf at a assheton here.

So this is a little demo um that has certains out see a lot of business tons of videos uh and they just sit in love stories in the cloud. But you know using a vision agent will record by the demo to index these videos to hope you find this. And I see I actually have a lot of media companies with a lot of 这是 raising my great。 right.

And and so shows you, I and dream you know where they are, the pass of the city they are born and these part are not cable or let see gray wolf, I just tried this over the weekend. See if this works on. But so I find that, uh, uh, you know, we actually found the bunch.

So there is a great over now a the U I shows in Green where is founded. If I click somewhere else, you know there's no gray wolf elsewhere, also bear or on black luggage, right? So I should have been traveling lot these days, but.

Um i'm going .

to uh i'm gna give you one a audience question, Andrew uh, since we had a number to come in and I my personal favor that come in is how would you mitigate the risk of users in discriminating, relying on probably tic answers generated by agents? Um so you know if you have an asian powered by an allam, you're going to be getting prolific answers. How do you mitigate that risk relative to the more deterministic answers they would have retrieved? Um you know in a more of classical keyword search, like a good a google kind of search.

you know I do wonder how determined the seek, some of the determined, the sick, uh uh, things are. So I think maybe I think machine earning a various forces working many of these years of ten, fifteen years. And I think a lot of machine learning answers are not fully determined, sick and uh, even web search IT is based on machine learning and is actually hard to predict exactly what the given with sage will open.

So I feel like part of IT will be a user training, which maybe is in a popular answer because that's hard. But I think so we will use the training. And I think some of that will be on putting place the god rows and mechanisms that make these safer, even for less strangers. For example, common design pattern engineer works also is a confirmation flow where before um replaced all there for user write a character critical and super product. Well, often not have the A, I just say done.

What have the your generate A P, I car with a pop model that says, do you really want to buy this? And about the charge turn out to need all this, please say yes or no and with that type of confirmation flow IT makes you say from uh that that I want to charge your tradd card without you explicit saying yes this. So I find that that these design patterns that that can kind of god real and make the A I safer, but I think could be a mixture of um software improvements and U I improvements with god rails as was some amount of user training um and if you like, yeah we often change you the deals for these.

There were some bad episode odes, right? Like I guess why he quoted as a lawyer, made IT up cases and qualifying got trouble for that. And then that was really unfortunate.

And one side of the fact of that was, you know a lot of layers learn to not do that. So that actually had a massive training effect, ross, across across the industry. Um so yeah, I I I think you might be like that.

Oh no, not just one thing. I don't want to pretend I can never make mistakes. Never make mistakes can do IT everywhere, exactly unlived. In some places I feel like the type about how bad IT this is overblown, far from perfect. And that does the problems. I think, I think the number of times I see A, I not get ships that we can use IT because of these problems is actually much less than one would expect.

I think that there is a lot of memory of those kinds of early examples, prominently people's minds like exactly the lawyer who brought fake cases and trials uh, to a real trial h that had been generated by A A I system. And so we're kind of used to those stories. But um today the systems are so much Better than they were a year ago or two years ago.

Who lose nations are so much less of the deal. And I think that's gonna happen more and more, Andrew. Uh, thank you so much for taking the time with us today.

We are unfortunately out of time. I I have more questions. The audience had more questions. Um but um we really appreciate you taking all the time that you did for us today.

Thank you anger. Thank you so much. Thank so much.

What an experience to be able to interview Andrew in today's episode. The well spoken icon covered how agent based workflows using GPT three point five can outperform more expensive models like GPT four and certain tasks, suggesting the company should focus on building effective applications rather than pursuing more powerful models.

He also talked about how the cost of using generated I A P S is fAllen by about eighty percent year over year and he much more accessible than many business realized um and he also talked about most companies. Genee bills are surprisingly small despite relatively high per called costs. He talked about how modern A I is combining two historic approaches, Marvin minsky, a society of mine theory of multiple simple agents working together and the single algorithm theory that drove deep learning advances.

He talked about how vision agent technology is revolutionizing image processing by breaking down complex visual tasks into smaller steps and generated code to execute them, making IT easier for developers to build sophisticated visual AI applications. Many talk about how video and image processing capabilities are expanding beyond traditional use cases like manufacturing and health care with new applications in media indexing, security and robotics demonstrating the transformative potential of visual ai. As always, you can get all the shows, including the transferred for the episode, the video recording, any materials mentioned on the show, the urals for Andrews social media profiles, as well as my own at super day design stock com flash eight, four, one beyond social media, another way that we can interact is coming up tomorrow, on december fourth.

If you got interested in agenda, I from Andrew today then, uh, i've got great news for you because i'm hosting a virtual half day ference on agents A I tomorrow. It'll be interactive, practical and it'll features some of the most influential people in the A I agent space as speakers. It'll be alive in the o rally platform, which many employers in universities provide access to.

Otherwise, you can grab a thirty day trial of a rally using our special code, S D S pod two, three weve got a link to that code ready for you in the shoos? Yeah, I really force that where we're going to have speakers covering introductions to identity. I we will have hands on python uh implementations of agents gi and will have a product manager come on and tell us how we can build effective products that leverage agenda are systems. So one not to miss. Um thanks to everyone on the super day science podcast team, our podcast managers on your private our media editor marion bo, our partnerships manager native j ski researcher search maces riter doctors are our car shape and sylvia og and our founder curl erromango for producing another special episode for us today.

Um also provide a hat to two who have on a separate who has been our podcast manager for longer than i've been hosting the show so for more than four years and she's been unreal an absolute linchpin in making sure that we are releasing one hundred and four episodes per year every two times a week on time every time to such a high level Polly so we will miss you very much of hana um but were in great hands with our new podcast manager, Sonia. Welcome aboard SONY all right now if you enjoy this episode shared with someone who you think might like IT, review IT on your favorite podcasting, Amber, on youtube subscribers, of course, if you're not already subscriber, most importantly, we just you'll keep on tuning in. I'm so grateful to have you listening, and I hope I can continue to make abodes you love for years and years to come till next time. Keep on rock and looking forward to.

841: Andrew Ng on AI Vision, Agents and Business Value 26:21 Share

Super Data Science: ML & AI Podcast with Jon Krohn

Deep Dive

Shownotes Transcript

841: Andrew Ng on AI Vision, Agents and Business Value