We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Zoom CTO Xuedong "XD" Huang on How AI Revolutionizes Productivity - Ep. 235

Zoom CTO Xuedong "XD" Huang on How AI Revolutionizes Productivity - Ep. 235

2024/10/20
logo of podcast The AI Podcast

The AI Podcast

AI Deep Dive AI Chapters Transcript
People
X
XD Huang
Topics
XD Huang:Zoom 正在利用 AI 重塑生产力,这是一个令人兴奋的时刻。Zoom 将 AI 融入工作场所,以提升生产力,并认为生成式 AI 将带来巨大机遇。AI 正在变革生产力,就像过去的计算技术一样。Zoom 利用 AI 提升阅读、写作和沟通能力,从而提高生产力。Zoom 的 AI 策略是整合多个大型语言模型(LLM)和小型语言模型,形成一个“联合 AI”架构。Zoom 的联合 AI 架构与其他公司不同,它整合了多个 LLM,而非只使用一个。Zoom 的联合 AI 策略能够选择最适合不同任务的模型,甚至可以组合多个模型来完成任务。Zoom 的 AI 方法与传统方法不同,它整合了多个 AI 模型,并与网络搜索引擎集成。Zoom 正在开发一种名为 AUI(AI User Interface)的新型用户界面,它结合了对话式 AI 和图形用户界面。Zoom 的 AI 战略是将 AI 原生集成到其产品中,而不是简单地添加 AI 功能。Zoom 的 AI 策略旨在创建一个能够理解工作流程、预测用户需求并主动采取行动的系统。Zoom 的 AI 助手可以主动识别会议中的待办事项,并帮助用户跟踪和管理这些任务。Zoom 的 AI 功能可以帮助用户创建各种格式的报告和演示文稿。Zoom 提供不同版本的 AI 助手,以满足不同客户的需求和预算。Zoom 的 AI 助手会随着时间的推移学习用户的习惯,并提供个性化的帮助。Zoom 的 AI 助手最终目标是帮助人们更高效地工作,并拥有更多的时间做自己想做的事情。

Deep Dive

Chapters
Zoom's CTO discusses the company's approach to integrating AI into its platform, focusing on federated AI and AI agents to enhance productivity and collaboration.
  • Zoom is adopting a federated AI strategy to integrate multiple large language models.
  • The company aims to expand beyond video conferencing to become an AI-first work platform.

Shownotes Transcript

Translations:
中文

Hello, and welcome to the NVIDIA a ee podcast. I'm your host. No cabbies. Zoom became a household name in twenty to twenty as rose to prominence is the go to video conference platform during the covered band demise.

Since then, the company has not only been refining the video technology, but also helping us all rethink the way we approach work in the era of digital communications and A I at zone topia. This past october, zoom at the wraps off of a number of new AI first products and initiatives, all in service of the company's mission to deliver an A I first work platform with human connection. Here to discuss everything from zoos approach to federated A I and A I agents to the future of how we all live and work with technology is doctor X D.

Wang X D is 我, me, chief technology officer and has a prolific background and artificial intelligence coming to zone from lake self, where he founded the speech technology group in one thousand nine hundred and ninety three and most recently served as a zr, A I C T O and technical fellow. Xd is an an I tripoli and A C M. Fellow and an elected member of the national academy of engineering and the american academy of art and scientists.

And most importantly, he's with us right now. So let's get to X T. one. welcome. And thank you so much for joining the NVIDIA AI pocket.

Thank you. I'm glad to be here.

So we're recording this on the friday immediately following zoom topia. Zoom basically announced to the world that you're going all in on A I. We want to hear all about the new step, of course, but first, maybe we can set the scene a little for the audiences. Can you tell us kind of broadly about zoom s approach to A I N A I on the workplace? Yes.

I I think this is a beauty. The most exciting time I started working on A I since I was a graduate student. This has been for a forty years right now. Gender, they are really, really transformed how to be. So zone is in the we have the riding amazing media conference in for the homework is a household name that everyone understands what the zone is about, right?

It's it's a very .

different at this. So really now facing even more exciting opportunity, if not us. So meeting is one of the most important business functions, but we want to expand that capability for people to work.

Happy that I was so platform, right? So one workplace is going to take a advantage of gentle the AI combination. We believe that Jennifer A, I is going to really provide exciting the opportunity I reflect in my own journey.

We stop dividing. My first mass faces in beijing is the chain hook university first paper I was using type riter. I love the expensive liquid paper in china.

In beijing, nineteen eighty, eighty three, eighty two. I remember liquor cab was experiences is a luxo, right? Ah, this type.

Any letter I have to really use, use the great. I will correct that. And when I dropped my book, smoke and and processing, this is my colleague and microsoft.

I was fortunate we had the microsoft word, but even at that time, what can accommodator a hundred pages document? Too big, too big. So we have to, since they were separate fire or each chapter, but the microphone would give up on of the jail is hard for me to imagine.

Without max world, we have to use a type rate by time to that is going to be because we have a lot of grass math and references right time past quickly. One of my colleagues in microsoft is book with the GPT four. This is this amazing, not for activity really pushed everything to the new level. So you can see that reflection journey .

to stop me for A A quick second. If you can think back to when you were writing and on the typewriter, could you have imagined where we would be now you've been in this field for a long time, so perhaps you could. But i'm just curious, you know if if thirty years ago, where we're sitting today, you're coa league using GPT ford to help read a book, is that something you .

thought about back? none. yes. Yeah I elected the speech of condition.

My is and the beijing at that time I had only I B M P C X. I don't know you know what that means. I you know .

I I remember .

plus um a few apple two computers .

ah yeah at a two year .

yeah that was all we had right? And you actually told the myself if I could let the computer on the stand spoke on language, could be tired forty years past decided then ever i'm not the retired the the front just was not yes, is not a obvious features condition. In microsoft, we were the first to reach the human parody on the most difficile speech task switchboard in two thousand sixteen.

Most, we didn't believe we could have done that. Yes, we did. Now ChatGPT really will define and open up the imagine for, 然后 what right I think open, I did just to fine.

Then you found here we, you have a sorry, why is on, yes, you know, going to grab this opportunity to redefine activity? Yeah, every the error of computing created activity. Lead mics oft revolutionized the top computing office.

And question is the productivity a 服务 that's top computing as why I shared with you when I ruled the focus spoke lunched process seem with my colleague and microsoft。 We love microsoft. When the work came together, google took advantage of they ended activity to support mutio people can on the same docking right as we know, google dogs and the sheet slides. They all really supported the cross team collaboration.

right? collaboration.

IT took a notice. Fun was right. One river is an incremental info, in my opinion, now. And we all on the same leveling field.

What is microsoft? Ogle was, of course, zone has a unique advantage in the most important business function to collect people on the meeting. We are the leader, but just have that meeting capability with being sufficient. If you think about the work, we have probably a few key functions, right? Wise to consuming information is this is human value to come read to satisfy our own university.

Gently, I can help you to read at five hundred or eight hundred page books, like one my colleague, I published a hundred ages in one page, right? So magically, junk day, I can we do this and create amazing amount of learning for home. Everyone would just do not consume information, that that's one of the important function, will also communicate inference, bring people along so gently, I can help you to compose the draft, because we are the side you will need. So those two most important chillen fundamental capabilities, to read, to write, to speak a group to, to ride them, to speak on the same right.

consume information, to communicate.

Yeah, they are gone to be really, really, fundamentally helped so much that we can take that capability. We design opportunity, not just bought that this capability to the existing software. That's an opportunity zone really, really progress, right?

The approach for us to work on productivity sweet and the approach we are addressing ai. There are three key weeks are on the highlight. I talk about this in zonotrichia, explain this in details.

对, the first thing I want to be the highlights will usually ask that so we integrate the best form needing A I companies and topic open eye meta quiet exception, many open source opportunities ves. We open this with web search leaders with the new one or relaxation. okay.

So we're rarely all of them together. Addition to our own about how to read small language model, we are training, we are developing that is already reaching amazing capability. We appreciate that the small language model because they are stealing long, need to really work together with this amazing clock base, largest models. So we have this que approach to combining them together in behind to support the productivity of each individuals.

What does the small language model do in in the stack? How does that differ from what you're taking the large language model s with?

We are training the small lunch model like everyone else to train your lunch model is just a holiday test OK addition. To do that, we are also incorporating each individuals unique contacts so we can really make that personalize the addition to consumers. Muscle of pocus, right? always.

So if I, if i'm using zoom A D, I features and I give permission, zoom can basically just, just all my conversations, all the meetings I have, the the voice conversations, the documents, the chats, all of that, and use that all as context for the general A I going forward.

This is a official that is coming together through of A I company. Oh, good OK. The company is horizontal generic, not the personal lize.

Okay, this costume accompanying that we will introduce later this next year, next year. Okay, you actually incorporate the ability for anyone to customize and put some like, right? This is actually very powerful opportunity for the small lunching model running on the devices. Already recommend what the interaction models cannot often because you do on the site, your personal lives, your look, your writing pattern, IT is such a so I just want real first highlight, our federal A S stack is unique, very unique in the industry. Unlike many of the activity company, they use only one right.

And so for the audience who might not be familiar, federated I federated stack, is that essentially just mean that the system can choose which LLM to prompt depending on the situation? What does federated mean? What are you using?

They are multiple ways to federate, uh, the way to federate the large language models and the small lengthy model is them. You from here, the way we are federating, this is different, different form, federal learning OK. Let's should be trying to really combine lot of others together to form this, a powerful capability that can preserve the climb.

But we see what we do is we can choose space on different workloads because they are companies to close is almost like a super agent. This trying to understand Young with different modality, different memory, but spend a little so we have troops. One model is the best for different times.

We can also combine different models together. I you can respect like a change of thought, we think, and that we perform the same task space on what we have learned, farm a length, for example, for so if a small link can perform, the cases work very well, then was not, there is sufficient. So is very sophisticated m that IT can actually oxide multiple models together.

This has been developed and push by zoo A I talents. So this is a very unique approach that set us part form. Almost anyone .

here you use the word agent. And as recording this A I agents are, there's lot of hear, lot of us around the world the concept of a genta I, which doesn't know. But as it's come to before lately, can you talk a little bit about what that means, what the idea of A. Agent is kind of broadly but then specifically to house zum is .

is using IT. And I want to .

come to that later.

You time to that later OK nick. Why we approach A I difference with ways that are different from traditional approach, right? So the first one, if you think god traditional elective this way, most of the companies are using one mother, either open eye or german man I, to augment what they do.

They bolted capability to the existing software, right? right? So on the back end, they are most using one very good to model is a gina.

我不爱, so process different. We massage open, I anthropic german, I matter and our own smaller model together to offer a match performance. So that's number one.

I want to be the highlight. Of course, we also integrated for the amazing website question or work question and in the future, personal question. Welcome our them together.

That's what we are pushing to differentiate this one, may I through our family? That's number one. okay.

Number two of using experience is A F first, this is what I call A U. I. Well, often clean swing, optimize gravity use interface as they divide by iocs.

Many, many, many years ago, yes, you go back, right, populated by mac saw windows. So both office google dogs are examples of taking advantage of graphic using index so that we understood, okay. And the chAllenges redefined conversational use interference. They reached the hundred million users.

amazing, fast.

fast than yes. So what the zone is doing is developing A U. I. That would seem as combine going and the C, U. I together. What that means in the workplace, the company to allow, will be a persistent hello on the right okay, and the fancy graphs using the face services with this schedule um making or have a meeting with .

someone right can onder yeah .

me on the left? OK information flows similar bettors to in the A U. I.

So we are trying to take the advances of both conversation using the face and the screen optimize using the face. Similar sly, the special vision is worn well. The technology intuitively adapts to your own needs. That's any more personal. That's what we are coming with customer AI complaint.

sure. When you say adapts, do you mean that the user interface changes or that you can create a conversational window sort of text when you mean IT? Or could A I potentially just redesign the U I on the flight to match what you're doing to add you in vision .

that A I has? The vision will be point. And with that, what the information you want to consume, how you want a consumer OK, isn't what A U.

S. is. Not change in the face as charity would be defined today are just graph. New interface like this meeting is defined today. Combine simply in the model environment, we learn that based on your real needs.

Right now, we've found to combine those two category into one going traditional is so much most of the water services and applications they agree optimise ChatGPT conversation using interface is a new category. And we just kind of have that to be the only one we have reached to together. IT was information flowing across those two categories, similar, sly and the trying to understand the user needs and adapt on the five O, K.

That is what the A, U. I is going. I are really calling this word A U I. So this will come. This is for some telling you in detail what the future use cases were to me.

I love you.

yeah. So that is the principle approach. Zoom is taken, embrace, embracing A I natively. That's what will go .

A I forts you join zone about, uh, you're in a half ago, little less. Yes, when you joined, did you what I know zoom has had, you know A I functionality, A I C paigning n version one and and you know can use third party apps for for transcription. And and that's been around for a little while but when you joined is this sort of you came in and thought, okay ay, let's rebuild this from the ground up A I centric was that sort of already happening when you joined? Just kind of wondering, um you know as you stepped into the role, sort of what was envisioned and how much you've you've shaped things .

something so everyone is not see you got display deviation. So zoom has invested in A I before. Yes yeah. Since I can work with eric and the leadership, jane, together we defined A I fooks OK right before I can IT was IT just adding A I bold A I like almost every company we have transformed, you know, consensus and push air force work back for.

So what does air force mean? sweeting? So first, this the air technology back in both that small no no small election model and the due on the shoulders of the great A I companies out there. But this OpenAI on tropical matter or other other source companies like miss A A just there, lots of them IT will be a not to take the advantage, all of them, of course. So is like a reformer committed walking to support the other workloads as always, rather than just using one single del trying to be perform the same task.

right? Two rains are Better than one. yeah.

So you see how, because we are on the mothers, we want to combine all, not together, and they using the face. I like some company said change in the face is is only way, or graphics use in the face only way. I'm going to at the bottom there there right, we are combined to category.

You sit them, face classes into one that will adapt to your own needs with information flow between those two cut liveries. Similar was the second important event. I I want use A U I as the the sunrise.

This is on principle. So the third thing I want to talk about is what is the work productivity sweet? That's IT folks in the general A I U, I would say, is all about creating a true system of action.

We exist. We have task to do. We take action, right? Of course, you can say you want to entertain you, but that's not productivity. So they are and paintings software, we A I 温度 的。

So when we say we are a first work platform, this is about a camping is the design to understanding your workflow, can learn from your patent. Everyone got different work floor. Everyone got a different selections service.

So right? And we use A I to anticipate your personal needs, emphasize that was, and they can take action on your behalf, with your permission or with your participating to make a Better decision more than what you can just do by yourself. Those are really the soul and the spirit.

A F less about ity that's very different from just to replace liquid paper was what processing right? Just to support suet for editing the same and right, or just about formatting this document with nice funds is about those three things is about learning from your own patton, anticipating your own personal needs and take action on your behind. Where is this attractive cast or managing action items is always one step ahead of you, ensuring that productivity flows seriously, definitely ly throughout zones, whole ecosystem in workplace. On the second quality solution.

right? So if the A I companion, you know, understands my workflow and then can suggest to me actions to take either now or going forward, is IT a case of, like, imagine the the A I would say to me, hey, you should do these things in this order or willin actually, you call up an additional tool to help me help facilties getting these things done. Like how? How does that? Or have you .

invision networking? Does invision air compelling, can proactively inform you, you are not answering the question right in the meeting OK only you can see, right? right? Just matter how how that is going.

We're in real time.

And I are because yeah, the company is always optimising your ability to in influence others, make others like you Better at. So this is just what I want to call another phrase. So I I talk about the friday day as back unique.

Yes, I talk about the use in the face. That's A U I, A U I. This is about action, or rented task flow. Action.

task of OK.

This flow to every corner for the whole life cycle of what you need to do, because it's almost like you have, are you very expensive? Exactly a system for most important, ask. You need to pay attention to for the life cycle of the whole product until you get that project down beautifully in a time sensitive manner and in the way you delight you are coworking your family members or Better human connection.

This is a goal of zoos. AI first work at home is action, or rented information floor. If is something you don't need to take action, we can stand accusing those tasks to confuse you.

That's okay. And you can decide, and you do not want actually track those action. We learn that tighten from you yeah.

And we improve our ability to track. But these should select, oh, air companying told me this action is important. You check that on night air complaining will work harder if eyes open. So a week later, if you receive a piace in the email that is Better vant to the task, your tracking, they are confident with work just for seven to update at what you need to do and the difficult PS and advise what you have to do Better to accomplish the task. This is what I talk about, action or renting information slow, right?

And so is this is the AI campaign of an example of an A I agent or going in your .

beale absolutely air company company already brought in the age like the capabilities, like the in the meeting. We not just actually use me to recognition to understand what is being out about if you presented your slide a completing to peno today, understand what is presented in the slide or what you rote right on the paper that you share you with capability, multi model, or if you share your points in the side panel with chat, we take count as it's like agent radial tire ping the meeting as you do.

Then we present meeting week up in land meeting, wake up the most powerful way grasses identify next steps you need to pay attention to, or your colleague need to pay attention the next time is unique. We offer a major quality. We worked on this.

So hobby in the cause, you can have to improve next set, to reduce accumulation, to assign the right task, to write person. We are roughly ly right now, probably eight percent accurate. Okay, so we are done with is not perfect, but the eighty percent really impressive, right?

I was going to say in my experiences with all arms and fluctuations and accuracy, eighty eight percent sounds pretty good.

So we do not right, so that you have a meeting, you discuss you, they are concerning to follow, identify the task and the that task was up in the upcoming three this past ten and a week later. Slow the opple life cycle, that something you want to try. You want to track, you will receive, replace some email from zoom and create an update that are your behalf.

If you want to video out about the next item or status report for your colleague that information flow into zone, dogs would have the status report. You pretty happy with a few changes without doing everything. Use a Nicole paper or microsoft wood to form at everything.

And you once say, hey, change this thing into the form that I can present in the next size of the meeting, down one time. Yes, right. And is not as beautiful as you know. The what the power point can do is beautiful picture. But the the key points is very much like when I was not the college and when we presented information was black and while the tear off right and the the on the really transparency paper and the just is really projector to talk about the cake boys system actually .

effected in .

still work information flow with that beautiful color animation. So that is the point I want take. Zoom dog alone is actually performing most of the function because of the general ai with your company.

You can instruct, you can summarize in the form of sales report, who can publish this as, uh, a block that's more ready to be consumed for the public in the simple form of slides that you can communicate your points in your next meeting, you do not need the last generation tivy sweep as we know. So that is that actually highly, three kid clothes friday day that A I first using interface A U I and the action or rented information flow for productivity. That is really the landmark, how A I first protective this week with differential farm ramos tric projectile this week.

Or that's top centric, of course, most web centric. And the that substitution can add a acaba. That's not what zoo is defining this matter right back at item information flow into corner of the duty three and myself, please only.

I guess today is X D wang X T is the C T O of zoom uh position he's held for going on a year and a half. Now before that he was a microsoft for uh quite some time and and um really is just a continuation of an illustrious e's career that started with the rise. And I want here with a few minutes left to talk.

Want to get to look at things from the public standpoint and specifically from business users and the types of customers who has been with for a while now when zoom is talking to business customers about A I, about adopting the mes products and all these wonderful things you're building. But I just about using generated A I, making the spend, spending the time to obscure workers and sort of figure out how are we using these things? How do you help your customers think about both adopting A I and also how to measure return on investment? There's a live of conversation we had on the podcast.

And just generally in the world about being kind of for exciting is, as the past couple, you have been still being in the early days of figuring out what can gene I do? How do we use IT? How do we how do we rethink things like, you know, productivity, sweets from the ground up with A I. So when you're talking to circumstance, the companies, how how do you educate them about getting started and and measure in performance?

There are few things. Absolutely, our custom love zone. First, zoom workplace as a whole office is a views that's just, you know, matched what is eating or dogs on chat.

The second thing is really zona, a compare to pro is offered and the no additional cost that's just starting for most of the customer because you get to use to you have to pay for IT, all sometimes, right? So zoom offer this capability for the higher the customers we are going to offer a custom, a company, we will charge two dollars. Problem, problem.

So you can you did a get your own pattern into the a companies companion on your behalf with fine to the customer capabilities. S so zoo offers this amazing horizontal, absolutely, you game changing the ability to make a one work place are very viable privity candidate or and for the higher we offer you unmatched customer. The fine fo was twelve dollars per person per mouth, is still office in the best T C O, right? So interviews cost effective.

I match the quality. That's what we have them. So of our customers .

and is kind of the another goal is sort of the vision that customers who use them come away with is that the A I, the companion is just over time, going to learn more and more about how you work and and what your workload are like and and how you sequence tasks and the people the colleagues you're working with.

And the campaign will just be there to help you think a couple steps ahead, help you, you know, max ized and efficiency, whatever about IT word is, is that is because that's a different conversation than conversations i've had and i've read about listen to where companies are saying, okay, we need to start with wrangling all our data and then we need to figure out how to clean the data and how you know and and it's kind of this big deep investment process for IT. Sounds like what was zoom moral lake? Hey, you're already using IT for video calls and now we're going to give you this ground breaking you know change the way that you do everything companion and it's just kind of onna be there and there's not a light you have to do is the user yeah.

So we offer the choice to our customers if they have the comfort. Yes, we have improved customize capability to suit. If they don't, they can decide how much data they want to share or whether they want to turn off for some tesla. Ve, 你听 right now that complexity is in the hands of the customer。 They can control themselves.

I write them, so giving .

them the choice. Yeah, all of that I want to beat IT emphasize, zoom never takes any custom data in the meeting to train how i'm not. Okay here.

let's end on kind of looking ahead note if that's alright, right? As you envision the next, i'm going to say three years, you can change that two years, five years, whatever you think that is both in terms of zoos, mission and using A I and generate A I to help people do things smarter, Better, faster in the workplace. And then more broadly as G, A, I, another forms and machine learning and deeper learning just continue to impact the world more.

What are you most excited about in the short term? You know, again, three years. How long IT is? what? What are you really excited about? N, N, C, you know, coming down the pipe that may I don't know if it's the next transformation tional moment or just kind of a trend that's gonna really take fire and change where we do things. What are you looking .

at to this action of regulation flow to be your company? Yeah, this is a really just a game changer. I do. I would not have enough time. So sure if air company can really help you to get job down quickly, you can have didn't time to do whatever you want. That's another active ity capability, some entertainment.

but whatever yeah .

also bring this would would be a Better place that will make you work happy, live happy and the do what do you want?

The time between everyone, the .

delighted customers, is in the core mission.

excEllent X T. For people who would like to learn more about what seems doing announcement on topia, perhaps some of the I don't know, there's a technical blog for you know, developers and people more technically inclined to learn more about how you approaching everything federated day I and everything else we discussed, where is a good place for some good places online for people to get started to little more?

Yeah, you can check the zoo block the zone 点 com。 That's actually probably the .

best place to .

but the even by the one is really 从 银行 A R company in some workplace。 But you not know how .

you get you yeah anta asic X T, thank you so much for taking the time, particularly at the end of this. What i'm sure was a busy week, a crazy week for you but congratulations on the oppoa on the work you've done so far. And I for one am excited to use, uh, companion too well if I could have a panel on the side of my stream that's always telling me that the next best thing I should do that that i'll be a came changer for me personally. So .

tivy software.

fantastic. Well, thank you again. And perhaps we can catch up somewhere down the line to see what what's going on this topic next year.

solution. Thank you. I to be here.