We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Why Your GPUs Only Run at 10%! - CentML CEO Explains

2024/11/13

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript

People

Gennady Pekhimenko

Topics

Gennady Pekhimenko在访谈中主要讨论了AI系统优化和企业实施中的挑战，特别是GPU利用率低下的问题。他指出，许多公司只能达到10%的GPU效率，这其中存在许多原因，例如软件栈的低效、模型与硬件的不匹配以及对硬件资源的浪费等。他介绍了CentML公司致力于解决这些问题，通过优化机器学习工作负载（训练和推理），使其易于使用、廉价且高效。他认为，开源模型的快速发展正在缩小与专有模型之间的差距，这对于企业级AI应用的广泛采用至关重要。他还讨论了团队建设、组织结构、AI模型的推理能力、以及AI系统可靠性等问题。他认为，未来的发展方向是构建基于基础模型的应用，而非仅仅是新的基础模型，并且需要构建能够可靠地运行复杂系统的基础设施。他强调了成本效益的重要性，认为企业应该选择最经济高效的模型和解决方案。他还谈到了与云提供商的合作，以及如何利用机器学习编译器等技术来提高计算效率。最后，他还谈到了MLPerf基准测试的重要性，以及学术界和工业界在AI研究中的合作。 Gennady Pekhimenko还详细解释了“暗硅”的概念，即由于功耗限制而无法充分利用的芯片计算资源。他指出，现代芯片拥有大量的晶体管，但由于功耗限制，无法同时运行所有晶体管。这需要更智能的硬件利用方式，例如降低频率或动态分配资源。他认为，充分利用GPU的计算能力需要考虑功耗和散热限制，并介绍了CentML公司在该领域的一些技术突破，例如同时运行训练和推理工作负载。他认为，Python等高级语言虽然易于使用，但效率低下，而C++等低级语言虽然效率高，但难以使用。因此，需要开发更自动化、更智能的编译器来优化模型和硬件之间的匹配。他认为，未来的AI系统将更加复杂，需要能够可靠地运行复杂的系统，例如由多个Agent和模型组成的系统。他还强调了监控和调试的重要性，认为需要构建能够监控和调试复杂AI系统的工具。

Deep Dive

Chapters

Open-source models are rapidly improving, closing the gap with proprietary models. This wider access benefits developers and the broader AI community, fostering innovation and value creation. Building sophisticated systems on top of these models is now a critical focus.

Open-source models are rapidly improving and closing the gap with proprietary models.
Wider access to models benefits the AI community and fosters innovation.
Focus is shifting towards building applications and sophisticated systems on top of existing models.

Shownotes Transcript

Translations:

中文

Let's say, if A W S gave you five billion dollars to be exclusive on their platform, would you be interested?

But I think a few people I really respect work quite technical, like bill gates or mark. They just have to have fancy degrees, but they just see the opportunity drop out and went for an opportunity. I think that one of the success of N V is actually in the engineering instructions there is well as bureaucracy between the litters and IT means they still Operating closer to the startups.

So for the company of that scale, the way they organize its very impressive to me. I use GPT to myself using python interface and i've think how cool those models long time ago. So especially what our systems were built using a ye under the hood has to have the interface is very easy for people to manipulate.

What is also chAllenging here is that when we talk about power limitation and dark silicon, if you actually make everything to run, if you might actually overheat because IT won't be able to dissipate the heat as well. But that's where the lot of opportunities there and tap potential that we had sent to mile in our company actually useful. The benefit of our user centimo is the company that focuses an optimizing machine learning, workers training and inference. Our focuses to make things easy to use, cheap and efficient for you.

You sites said, you know, zack and bill gates, you could argue that they've become less technical over time and that is that because they are communicating with stakeholders.

Oh, great question. So santa is the company. That focus is an optimizing machine learning work close with train and inference. Our focuses to make things easy to use, cheap and efficient for you. When you try to build a new model, find tune existent, foundational models will deploy the models of your choice at scale.

Do do you struggle being? I, I don't, do not have to say this, but your by far the most technical seo i've ever seen in my life. Is that something that do you strugling with that? Because you, the guy who's Normally you know down in the details, you wanna know how everything works. And now you kind of like trying to call my everyone .

together so fast, so good to actually to think that most give an advantage. Anything right is but I can save time of my C T O. And many meetings because like if they just need to understand what we're doing in general, I can handle most of these questions, right? So they can focus on actually building things.

I think I partially, as you know, the life for professor, you need to learn some of the business aspects of fun reading and like that, you know, built in teams and all of that we've been to industry. So I think IT helps me to see the kinder in both camps, right? And I do like that in conversation with both customers and investors.

I can handle the technicality as usually as much as I can go, right? So I can handle pretty much any technical question. So I like IT, but that we can also cover everything business related. So I like I had a good enough understanding. I am not a replacement for the next team, but I understand ah what's doubles, what's not had a good grasp for the ah what our team can build.

In an ideal world, do you, if you were possible, do you think every tax C E O should be as technical .

as you alerts too much to ask? But I think a few other people I really respect work quite technical, right? Like just like look at like bill gates or mark and uh um you know like quite a few others that were successful were actually quite technical.

They don't just have to have fancy degrees, but they just see the opportunity drop out and wonderful and opportunity, right? So but I think they are very technical and they do understand that space. And I actually think it's very important for the success of the company in this space to be technical.

Like like, look at apple, you know, you all like like at the beginning was all very technical, right, really, as they don't underspend this place and the customers molick, right. So IT is important. It's not mondetour like, like, you can do the same as of a restaurant. Cy, to all we be of engineering, but it's a good feature to have if you can have IT.

What what are the forces that I wrote that over time? Because you you cited, you know, zark bill gates, you could argue that they've become less technical over time and IT is that because they are communicating with stakeholders that have a different kind of horizon of understanding, you know is is that a punica ous inevitable thing? Or do you think you can kind of hold on to that very high resolution view on things?

It's almost a inevitable. I see the visit as well. So the minute you start to write code daily, you start to get a little bit disconnected. What's possible but not not to the level you completely lost.

But it's not the same as what the engineer's have, right? But it's like you can have the zoo is zoot at the same time, right? Like you need to have some you know trade off.

And so I think it's an inevitable even in my academic career, I at some point, I had to admit they can look at every line of court that my students are doing, right? I can code was. So I need to trust what they do is the same as with any management.

All in general, I think it's a Normal path. And the bigger the company grows, the more and more IT is. I believe. When I was Young, we were like learning about how bill gates, for example, manage the engineering teams.

And at the beginning, they, you know, testing was literally he was looking over every single line of court that goes into the product. Obviously, that doesn't scale, right? So you need to know how to trust others, how to build the teams that scales and still successful and delivers things. But you have been able to understand in a reasonable scale what they do is still very valuable.

Do you see that kind of organisation building is as an engineering problem?

Very a good question. Um I actually think yes, IT is is is a very interesting problem in general. I had a lot of thoughts about IT. Usually right now they are not my main focus, because I had other problems to solve.

By that time, I really add by, for example, what jensen deed and media like, like, look, he is managing one of the flats structures, engineering wise and on the company. And when I met with human, like, ask about how to manage this and I hadn't like fifty direct reports something like that, right? It's it's a very different scales, like it's very hard to manage.

Ten people, like fifteen people is more more as the limits, right? And you know what? He manages that many.

So it's a very interesting scaling problem. You definitely cannot do like one on one meets. You need to figure out how to meet with the teams instead. But people are doing that and they're very successful.

And I think like one of the successful invidious actually in their engineering structure, like in general company structure, they actually a party flat, right? And that means that a lot of people know what others doing. There is way less bureaucracy in between the layers and IT means they still Operate closer to the startup s and a lot of other recommends.

So for the company that scale the way they organized is very impressive to me. So yes, I would like to laugh. Not that I have fifty direct reports, but I do want to have more flat structure, very important to efficient. I see how extra las easily can make things complicated.

Are you done know whether it's fair to say this, but if in video has that kind of structure, it's it's a bit more centralized, and I will talk about there's a bit more later. But you think that there's something good about having centralization but are .

definitely good things that are a very old debate about being fully free and decentralized, right? Like like, like some competent, say, google comes to mind as an example of a more decentralized structure on something more centralized.

I think centralized is very efficient way you on a mission to deliver something like you usually start up some uh, centralized, right? You had a very limited amount of resources vision, and they own main advantage was as others, is the speed, right? You can go much fast, much more efficient than others. So this is why the decision has to make the right the right way and you don't have the chances for mistakes, right? So in a way, for some critical features that reader requirement pride.

But again, learn torm is very hard to maintain IT, right? I don't think others didn't want that is just very hard to keep to leave that pace and the of coolers to Johnson to be able keep doing this over so many years, right? It's really taking a toler, you am pretty sure. yeah.

cool. Well, in which case we should slowly kick off the interview. So I wanna introduce the audience to to gani.

As I was just saying, probably the most technical C E, O I ever spoken. I think it's remarkable actually, especially when you when when you look at the research that is done. But um and I is the the cofounder and C O sent him out and the sociate professor, university of toronto.

I I love that university. I I was there recently intervening a couple of guys and you are a factory member at the vector institute research coaching at M. L.

Commons, a founding member of ml. Perth, and was also have been a researcher at microsoft. I got A P H D from carnegie mellon university.

You've had internships that in video, at microsoft, e research in ua IBM. And I mean, basically, you you've done a lot of stuff, which is, which is really cool. Welcome to M. L. T.

Thank you so much for coming. yeah. Well, thanks a lot for inviting me. Really excited to be here and to talk with you.

Very cool. okay. So why didn't we let's start with the low hanging fruit. How do you think open source models compared to property once?

Well, a very good question, and definitely a lot of different opinions here. What I can save from my experience of my perspective is open source models. I really improving our time like they really uh, get in closer, closer there, a lot of different benchMarks to the extent we can benched mark things and as I thinks are closing the gap over time, right? And I actually think it's good for the society and the I I T space in general, like like IT.

It's nice that the models are accessible to a wider community of developers and people in general that they can use IT and build on top of right. Why is this happens? I think it's fundamentally when I was clear that the e is so important through the world and at the value generates are all in front of us.

There are a lot of people that got interested and overseas able. So many people got interested, they start to be creative and add value. It's very hard to be just this one company that is go ahead of everyone and released to the rest of the world.

And the rest of the world learn from you and doesn't catch up like like no matter how smart you people, the rest of the world is so much larger. And people seen this through many other software product, like the minute you make IT available to the world, this is really important. The world will catch up quickly to you. And I think that's what happened right now. Yes, there are a lot of great property models right now, but models like uh open source model like lama and mistral also uh very high quality.

I mean, what what kind of advantages do you see in this space? And i'm but I i've been reading about, you know there is that paper to talking about. Our ones are coming back there was that min LSTM and so on. And we've had member in state space models. I mean, what do you think we're gna see any big shifts in in this space .

um as the ships are always possible but had multiple conversations with many prominent people in the ice space and the pure I field. I don't see any obvious replacement for like attention based blocks at duced transformers, right? So I think are a lot of focus rate now is actually building on top of that architects, right? Uh, so uh, that is always a chicken and problem.

This, uh you know this basic concept and IT have been really well capturing the different properties of the data were dealing with if it's good enough and if hardware and the soft evolves around IT, IT becomes like almost like a dependency that IT will sustain being around and like it's not gonna ily disappear. There are several different types of models that people suggest in building start out, but I didn't see any of them as a clear replacement. Whenever they really need to compete in the benchmark, they really usually use some form of attention mechanistically, right? And there is still plenty of room to improve around attention and transformer base model.

So i'm not particularly worried about that we are stagnating or anything. I am more excited about what people are gonna build on top of IT because I was this this event last week called intelligent applications, where we actually as a company won a one of the rise in war rising stars awards. And there was very clear of people excited and see beyond the models.

There were several a fireside chat there saying, like just models, it's about building those sophisticated systems on top of the models that had multiple different components. That is very critical. So it's like, uh, a core foundation models are still there.

They're not going to disappoint. They are important. But people are looking what do we do next on top of IT.

So when I say one year ago, I was fine tuning. People were excited about and then that was rag. Now people talk a lot about agents and compound systems like that.

A breaks talk about IT. This is very exciting things that I expect that will keep moving. So people build not just a model, the new model, people build application something that the world can use right around IT. And I think that's what even more exciting than build in an appeal on your model right now.

Yeah on france while charleys home is this progressive disclosure of complexity. And I love that, but of course, your a guy that at the bare metal, so you know your building compiles your building kernel working with the hardware, you know exactly how the hardware works at the very, very lost level. You are still talking to the sandwiches. We're buildings so that we're going to have abstractions on top of abstractions on on top of abstractions to build these systems.

Yeah uh, absolutely correct. We had in my team and I personally had a lot of background in computer architectural systems and building many different lays of the software st tack. But I think this is our expert.

This was very good at. But what the world consume is actually applications, right? I think that might out ultimately where the values coming from.

People were excited about the models, right? And some pieces, because that lay never existed before. Out of a sudden, we get another norm to play with. And obviously a lot of excitement we spent several years build in those models, but then we reach the situation will see OK seems like to find something that skills well.

So lads kind of turn to build on top of IT to right? And I actually think whatever we build on that sandwich, how you call IT would be valuable. No, what about the model is like tomorrow someone would replace is like, uh, attention with another model I don't know, like uh you know space like some of the space model that people discussing like as the same and others say this will become in using, but that would be just a blocking replacement, right?

You would just replace one model in the sandwich. Was another look in the sandwich. Yes, IT would change hard vitalization. But mostly that would be visible by us, hardware people, compiler people.

The rest of the world don't even need to worry about IT, right? They would just get a Better, high quality responses, right? So it's very important to me that we don't stick to the model alone in terms of what we contribute, that we contribute the rest of the stack.

People still can evolve new model, build new things, but if the stack is mature in out that people generate value, this is much Better, right? If you only all invest in building battle Better foundation models, at the end of the day, there's no use case that people benefit from me, right? We're just going to burn a lot of resources and not having a really tangible out, right. So i'm not worry about the model will come up. I think the system wise, we're gonna much Better, right, Better than we were at the how much .

coupling is there in the stack. You know, like we like to think that we're building these little decoupled atoms of computation. And you know we can just kind of build on top of IT and build on top of IT. But there must be some very complex behavior between you know what what emerges and what properties the overall system has as a functions of swapping out those components.

Well, that that question has many different lays at a very low level. Our agents like, like what's running under the hood. Other corals, right? Like, and that's running on the actual hardware. So the entries, they are the chips and this chip communicated with some former communication medium.

So IT can be uh you know either internal interconnect like something like anything right, or IT can be something that connects multiple notes that can be either mad or like melon ox cards or something like proprietary optical interconnect that you know google is using right now. At the end of the day, at that very level, those entities is like it's a hardway piece, right? We programmed them in a certain way and their interactions are very predictable, right? They know they behave in the way we program them to.

There is certain level of complexity with them that not a lot of uncertain title, but those are very repetitive. Like you setup a training on work and inference workload like even though you run billions of examples, all the rounds are more as identical. There's a very little fluctuation.

There is a certain things like frequency. They can fluctuate, but almost all the other things are fixed. So here, after we programmed the cheap using either mental effort, like all the compiler, things are predictable.

But it's very exciting to think about what people can do with that hardware and the southwest that is well, less predictable. So the compound system or the general systems build out of agents and other you know players and models can be very unpredictable and very exciting. So this is a very open field and am not going to be pretend to be an expert there.

But I think it's a very, very promise and field. It's even hard to imagine where the the limits are is the hard way we know that there is a certain limit. And what we can achieve was that technology after the cheapest fully utilized that you at max, right like and even if eighty percent, you know, you can go jump way more power.

But it's how the hard ways use the sky is the limit. There's no limit. So we just need to make sure the people's create T, V is not limited by the complexities of that stack. You just make sure that even kids cannot experiment with IT. So something like, I think many people said that that was the beauty of ChatGPT.

For example, for many people, they never aware of the technology, like I use GPT to myself using piton interface, and i've think how cool those models long time ago, right? Not necessary as good as deputations, but very, very powerful, right? But for the rest of the world, that was so old, mind blowing, because they the phase was a common line in similar like a google search, right? So essentially whatever systems were built using, uh, eye under the hood has to have the interface is very easy for people to manipulate.

And IT doesn't have to be as a google search like it's our voice. It's like as manipulating with the rest of the world potential August and reality. All of that can be really nice thing is right? And this is where I think this becomes very open ended and very unpredictable and what humans can build with IT.

Just at a quick point on the predictability you down at the hardware level, I interviewed someone from cohere a icml and they've got a paper out saying that there are differences when you run models on different hard way. I don't know whether you ve seen that. And apparently the differences can be significant, especially on the long tail for things like fan is, have you seen anything like that? So you mean .

that the results are different depending on which hard way you run IT on? Yeah, yeah.

different.

There are two things here that a lot of people do not realize that 你 clarified， first of all, the way we run are raining right now, even on the GPU. Forget about different hardware, just like like canonical GPU. It's quite non deterministic, right? There is a company there that would if you run the experiments multiple time, IT would ended up being different.

What people don't understand why that is fundamental to the way we manipulate was floating point numbers, right? So is like every time you're run, if you add up the same numbers in a different order that the result would rounded to be different and to hostile cure, the narrows networks would make IT go one way or another way very easily because it's very multi dimensional problem. So it's a cure.

Running the same experiment on the same hardware can go different between different hardware, even more differences, because they would could easily use very different processions. So I don't think that, for example, uh we run experiments. My uh both research group in the company will Operate on vidia GPU M D G P U, google tp u and as on training chips, fundamentally, you can get convergence of the models very similar and all these models, right? But you need to be careful with the precision and everything, right?

So I don't think the hard way is guilty of anything. Harwood does what you build IT for, right? And IT usually has certain level of precision.

But if you pick a chip uh hardware with low precision and very careful about the convergence, you might have very different results. I don't think the hardware fundamentally changing the way uh, the computation is done. Yes, computation is not fully deterministic. right. So because of that, that you cannot expect a completely uh, identical results like if you ask the same question to check G, P is not going to ever respond the same way even if he uses the same trip.

Yeah it's really interesting. As you know, when we build Normal computer applications using sea plus plus or whatever we say to ourselves, so you know we we can verify that we can test IT, it's deterministic. And so and that may or may not be the case. But now we're building the next generation of applications, building A I applications, and they have increasingly sophisticated behavior and and we grow them, we don't design them, means we're starting to build agent base systems and so on. I mean, how how does that look to you?

IT S A very interesting problem. In chAllenging the same time, think about the very critical areas like automotive. For example, they prefer to have proofs for every little pista built into the car.

And now you give them a piece that says all IT might be and not deterministic, right? That might change this behavior. So this chAllenging and White, like using the I and some of these areas very difficult to sell driving, for example, right? So again, we need to find the way where we keep those errors within boundaries.

Or for example, we make sure that we use precision that deterministic or build uh we limit uh some performance characteristic of the cheap, but making sure the thread run on the same order, so everything is in the same order and you use int eight like uh into you precision and set of floating point and then things become out of sun deterministic right? You need to make certain changes, but you lose a lot of performance if that happened. So in some critical areas, might that might be the outcome.

In other cases, I think we just make to make theory around neural networks in gentle moral best so that were more suspect, able to noise. And there were several really great papers you've seen in the community where people try to stabilize the training and process and the model and be less sensitive to noise. And so that's in general like like there are several problems that need to solve reliability of this results for predisposes. And um you know all of that is really.

really critical just just coming back to the open source thing quickly before move on. And what do you think the the sort of like the implications for using these models like lama and and and mysql.

I do see a huge value for them and not only because it's very valuable for me and a few other companies like mine that actually using them and for I think it's available for our customers. Why I see so so um a lot of their enterprises, I think they are critical for A I to actually generate ate uh significant value for the world.

So again, when open a eye or google benefits and invest into you know, and you be nice shiny toy like a, they can burn billion of dollars and is no outcome, I think for me. And unless there is uh, benefit for the second wave of adopters like uh, fortune five companies, for example, um that's not really success yet. Those companies always they don't have as much experts in the eye like as this companies I mentioned, he was the first wave.

So they are probably, most of them not gonna judge, build their own foundation, morals. And they also had a lot of their own sensitive data. So you combine those two things, you see they either have a choice, either give the sensitive data to someone else, like a hype, another hyper scheller, or I start out to go on I, or you actually leverage the open source model and go the path that require away less expertise, like fine tune in iraq was their own data. This way, they would generate way more I P.

right? And I think more and more companies right now go into that direction. So that's why I having models like allama mistral, not only to exist, but also had performance on powers like models like G, P, T four, right, is very critical because people can build the specialized models on top of IT, right.

And not dependent on any particular company invented technology, they can build their own I P. And they fell much Better about because they control IT internal right. So they don't feel they like their businesses fully dependent on that.

They had to give customers data to something that's very like, you know, a less reliable. So that is making easier for the enterprises to adopt this technology. And very important because they would see all this risk before 1 took them a long time to a uh adopt like using cloud and not all the data still in the cloud， as you probably know, right.

And again, the same would be the I like people want to be sure that we can benefit from that technology ourselves and we can build an I P O D P of IT. So I think having open source model is super critical for that. So model like lamor mixtures and others and falcon, a very, very critical for this trend to exist and actually the people to. But what do you think the performance scape is?

IT was quite interesting that OpenAI theyve shifted the narrative from we're building agi systems that can reason and do anything you know, on on the new version. They released a fine training, A P. I.

And that the kind of hinting that actually, if you give us A A thousand examples of like some labelled imagery, IT will do Better. And of course it's the same pillai a right? You can just find tune. You can just give you a thousand examples of of your thing in. And I guess by increasing the specific ity you can make IT for that application every bit may be even more reliable than GPT for I would have been.

Yeah so it's like obviously like everyone is gonna chest to this reality in the opening year. AdJusting as well is ultimate of boiled down to whether you are comfortable given this data to them, a visual data points that might be sensitive data, not sensitive debate data.

IT depends on what the model ease if the people know that exists, the open source solution that's reliable and verified by the community that that can actually be used instead, if the quality is comparable, that you would just probably go with IT, right, while taking for an extra risk, right? At least that would be my choice. Um IT would be a different story if there would be a huge performance, quality, uh, gap there.

But I think this gap, as I said before, is rinking every single time right now, it's even hard to save. There's a significant gap between like like a some benchMarks, lama three was as good, even Better than G, P, G four. Obviously, they also keep improving, but that was also keep build in your model.

What I seen the last couple of years that, that gap is always shrinking, right? I don't think this gap is grow in this gap between open source model shrinking, right? And there is a north keep take .

on I don't know if you have any interesting intuition on how it's done, but you know people are talking about at being proceed supervision or something like that. And do you think something like that could be done on a model like lama?

I am a very good question. I think it's definitely step forward is a very nice more like I play a little bit to IT as an image as a user, right without going deeper ah it's interesting that for something that's like it's taken many, many seconds to think, I give you a very basic answer that like G P T four would give you uh much Better, right? Yeah so I think that's quite funny. But and that some other logical problems, very interesting, like on how we can do IT, right? So again, to me, it's a step forward, but is very hard to say how much of a gap t was the rest of the world to there.

You know, like there are there are these scaling laws and you you can introduce what what those are, but they they seem to suggest that there's almost no ceiling. We can just scale the data, we can scale the training, we can scale the model size. And and now they're saying we can just scale the inference compute. And and you know there's almost no limit to where can go me, what what's your take on that?

Well, I think part of that is sad and done more for investors than for engineers and scientists will and understand that all this exponential law has limits, right? And you gonna hit them relatively past usually. Um so I think reality is the training killed ability laws start to diminish in a way right with limited by the amount of the data people ready to to essential data.

We don't have sometimes enough computer and will understand that a lot of the benefits of adding extra data is not high quality data become very diminish or even hurt, right? So it's like that the biggest not always bad as i've seen one recent benchmark like literally today in the morning where lama thirteen billion model, the old diversion has a worse performance, that lama one billion model, right? So IT means like that.

It's not just about the scale that matters, the quality of the data cleaning like like, like how you train and force training process matters a lot. Same for inference. I don't think it's an infinite killing going on here, right? It's ultimately a yes, you can scale the model for the keep getting improved.

The latent about the question, do you really need IT like i've seen some of these benchMarks in, I can speed up, you know they are influence for lama model by another like fifty x. But when I look at the baseline, it's already at the human you know sector faster than we can never read, right? What the benefit of that, right? So, so very important to put things always into perspective.

What's the benefit of that for me? It's way more impressive if you not just improve the role latency right on the influence, but you could actually do IT the most cost efficient way because at the end of the day, that's a very, very important factor in the adoption of the models. So it's not a matter of like how far we can scale the influence of the models and how bigger models we can still run.

It's a matter like whether those models actually do the job well and whether you can get that in a cost efficiently because that's what the customers want. And the end of the day, they are going run the largest model available. They're onna, run the best, the smallest and the most efficient model that would do the job for them, right? And that's what's the most important one.

And I think that's not covered as much because that doesn't sound as sexy as like talking about the biggest possible models by the end of the day. This is what the customer want. So like they just want make sure that they get the best quality uh, for the the minimum wound of money.

Yeah, but I was watching lex free man interview the the curse the team the other day, and it's really interesting hearing about how they designed to hold bunch of the model. So there's like there's a tab model and then there's the play model and not evenly using your I thought, oh, they're just used like a big model and they just kind of generate in the entire code file.

And for that reason, I thought, yeah, be great if we have more T, P, S, because I can just generate my code faster. But apparently to make IT work reliably, they'd they've had to do all these ridiculous optimizations are using like speculative decoding and the like partitioning the code file up into all of these different tiles and the running IT in parallel and even in speculation of decoding, you're like run IT in parallel with with a smaller model because they say code is lower entropy or or whatever. It's just an example that we have this notion in my mind. Oh yeah, we just have a big model and they will just do all of the things. But actually a lot of the innovation is when you take a specific task and you just optimize the hell out of IT.

yeah yeah. Is a lot of us really exactly what you said is, is ultimately, this is where the value is coming from. The value coming is not from random. Get one metric selected and bench market to death is actually working under multiple constraint. That is actually much harder because if you just had one metric, it's bunch easier talk to more than you say, oh, you need to be fast enough and has a good thought and the cheapest possible.

And low power and low emissions and the oldest constraint exists, right? And ultimately, when all these things go to scale, uh, constraints will be there, right? Will already see that power limitations, compute limitations, people would have to obey those limits, right? So you actually need to build a system optimization knowing that, that constraints are coming right.

And it's like in a lot of reality cases is not like, oh, what's the best I can do on the best, the most expensive available chip? The company would say, I have some computer on bram. I would have some computer on the cloud, and I have all these different workloads given me by my data scientists and in product teams.

What's the best way for me to run them? Whatever is critical to run on pram, to run on pram, whatever can write the cloud is good, overflow to the cloud, and whatever is free to go anywhere, would go there to tell you orchestrate of that and how you deal with that, that complexity you need to solve. And this go way beyond just optimizing a single model for single metric, right? And this is what we are building in the company like we are trying to get people the help of address in multiple different complexity matrix. This is what a typical enterprise or mature company would work, rather than like someone dishes planned with one model.

So I interviewed a Sarah karadi, and he was like one of the biggest tEllen skeptics. He had all of these papers out, you know, testing, self furious ation and a chain of thought and so on. And he was using this blocks world problem, which is a planning problem.

And he found that when you do mystery block's world, so you change the tokens, it's the same problem, but you just randomly suffer the tokens, or he put t like random descriptions in the tokens, all of a sudden that the models don't work because he was saying they're basically kind of approximate information material systems. But on o one, he seemed like a change man. I was surprised he he kind of saying, oh, look, they they now have none on trivial performance.

They're solving block's world problems, sometimes many steps ahead. And i've tried to on some problems that are N P hard. And it's still, you know, for a very small size of of the of of the problem is still did something. So it's doing something like what what is reasoning and what is IT actually doing?

I get the complicated problem in general. Uh, I think that is definitely was one thing that was missing from a classical aleph. To me, essentially, there was this problem of what you can memorize from the world, like a, like a great filter that memorize a lot of things from the natural language.

But to me, IT never actually could reason in any complicated way, like the good examples I like you've seen in some of the presentation, was like, it's canna do like three digit multiplication well, like good log doing IT was a alarm, right? IT would memorize a lot of corporate cases, but I cannot do the proper multistep reasoning. And that was missing and there were people that always tried to build the eye around the reason and aspect, but I felt like there were two separate camps that don't really talk too much to each other. But right now, it's hard to come ah together nicely.

And I think that's probably a good step in the right direction here with this all one model right there where, yes, it's is for some very basic things, it's taking tool on of thinking, but at the same time is actually starts to look more human in the way that they actually starts to do some basic reasoning, right, which which is good, right? It's not just memorizing things and just filter IT because even our brain is not everything is so sophisticated, some all the parts of the brain similar to like animals, which is doing like faced, uh you know, object detection, right? And that has to be, you don't need any sophisticated reasoning.

They like, you need to do IT very fast. So you wants wife, right? That's critic.

And there is a parts of the brain responsible for that, but there are other parts of the brain that developed much later and devolution that actually can do basic math and things like that, right? So that one would take much more. Proceed with c like how to teacher with kids.

Like, I have small kids to calm. Like I do. Like you would realize that it's like they would learn object detection very early in their life and they very good at IT and they don't need one billion examples.

You just show them a couple of examples and they can get IT, right? That's another you know direction for improvement for uh you know for those models. And but like teaching that must takes many, many eaters, right?

Is is and requires way more examples such that can solve any analytical problem is very like long term process. So I think is the case here. We need to develop A I that would do this.

I think in addition to this reasoning capabilities, what's missing to me in modern eye is connecting you to the rest of the world. I think that what slow down the training process right now, he doesn't have the context. He doesn't understand what the world is like.

And because of that, I can do very basic mistakes in all these horizontal could have been avoided. If there the model would have ability to have a senses in the world and collect the inputs and adjust accordant, right? IT doesn't have that experience. And because of that, a lot of things look like stupid or dum, like still to this date. I think that's another missing point. So like a reasoning capabilities is definitely, but also ability to connect with the physical world that we as human have as we add more sensors and ability to sense the world in the process and have in a notion of the world somehow is a part of the moral uh, and physics of IT. IT would help tremendously to improve those models quality.

Yeah I I completely agree you on the importance of embodiments. I think IT IT comes down to the semantics gap. So clearly, we have access to reach a semantics because as huge cause or chain of things that happen outside of us and and that helps us understand and and and actually you know cogging ze but some the other thing is that there's there's a real problem with ambiguity that I see in these in these language models, which is that even with o one, sometimes I can get the right answer, but IT simply because IT hasn't understood you, because you get Better at prompting over time.

Of course, that you learn to kind of specify I was, can only move this way and they can only move between a jacinth and you know you kind of get Better at IT. But we have this remarkable ability to deal with ambiguity. It's it's a real hallmark of human cognition.

And and i'm not really seeing that yet because I think when we do overcome this kind of semantics cap and ambiguity gap, I think you know potentially the models could be more autonomous ous. But the other thing I want to comment on is as well as this kind of semantics cap, I think that there might be a computational gap. And my cohoes keth dogger is always at pains to point out that these things are not turing machines.

Es, you know, like there there, there's a kind of difference in kind with with the turing machine, which means there is a space of computation that that they can't learn with sarcastic gradient send, much less perform. I mean, did do you think there's some weight to that? Or are you one of these people who just said, well, in practice, IT doesn't matter because we can just build systems which are turing complete.

I think probably more on the second camp. I don't see the fundamental limits on what cannot be learned, right? I don't think we had a computer side of our brains.

We just had a very good justice system, right? And I don't I I don't I don't think there is anything that uh, that we have cannot be mimic to resemble extend. It's very complicated is gonna take you know, many years.

I don't think we're going to have the E. G. I, like next year, but it's A I think its fundamental can be learned, can be improved tight and we can see in how we interact with the world how we learn as humans.

And a lot of that can improve the models we build as well. Plus they have also they are own capabilities that we don't have like they can remember everything you've seen, right? They don't forget things as easily as we do, right? We have a great way to deal with complexity of the world.

Well will filter things greatly, but because of that will lose some context. But um you know machines want right so that the advantage right it's like some precision. So there is Brown calls here.

I think we fundamentally have the ability to compute everything we need. I don't think there is like, uh, yes, maybe certain things is to have gradient and not perfect. We're gna do. Certainly, that can be done Better.

For me as a computer architect, i'm less concerned about what is computed, but how was IT computed? I think we are ridiculously inefficient when I think about how much power our brain consumes. This is how much power training process consumes in modern data centers, right?

That we are starting to talk about nuclear reactors back. And you knowing U S, I think like there's not enough far or and things like that, I mean, IT IT kind of tears. you.

Well, we go quite aggressive. Ve is in one direction, and we probably want to do some adjustment, right? So IT means there's like we need to learn how do things like has to be done less aggressive on crunching numbers without thinking about what you're doing.

You need to be once selective. And IT has many different dimensions, is the data it's what exactly you compute is like not doing the same computer too many times. Like like for example, right now, like every single foundational models keep starting with more as similar tasia de on everything they get from the internet.

And it's quite redone, I would say, like doing this again and again and again. So hopefully you can do this in the monitoring fashion. So the future models are not built from scratch but actually built from a corpus of the prior knowledge, right? You're just additive to this, right? So I think that would be really, really important yeah yeah .

about got so many things in my mind just to closed the gap on what we were same before. You know, there's this ambiguity thing and at the beginning of this interview, in a way, it's a bit nerve racking because I could go in any direction. And just like with the language model, when you have a long conversation with the language model, the the, the mutual understanding increases because it's almost like the the entries is down, 有有有 on a well trodden path， and it's a similar thing without discussion.

Now that model also train you, like you train the model, they train us towards the questions in the way they understand. Mike is very quickly realized if you don't ask questions to depicting the right way and get the high quality answer. But if you learn the protocol and how to communicate with this thing, you get what you want very quickly. So that's very impressive. Humans also off to the behavior emissions.

yes. Yeah exactly. So it's it's all about knowing what question to ask yeah and it's so common in in software engineering. So because these course guys, they made the observation, I don't know whether this is true or not, but they said that code was lower entropy. And you know what kind of I have noticed this recently, you put a load of code in the and straight the way that kind of like snap into some kind of understanding but still, if you ask get something done like if if level seven engineer from facebook went in there and they would know what they were building, that they would be using the right kind of abstractions. They would be kind of designing things correctly um it's a mistake to think that anyone can go in there and just write some software because IT just gives you back exactly what you ask for and that's quite often the wrong thing.

Yeah I don't think like I think it's misconception. People said they would replace human like I think is like work White paring my opinion from that, right? Like it's it's still in the system, a very high quality assist.

M so you really need to like you still design, are you the one know that what's missing? But for example, you don't need to memory this particular algorithm to do this like, you know, like the best, all the best walk pass through all the best like graph versus, right? You just only to memorize IT from a textbook, right? You just know roughly should IT look like, and you know roughly the complexity.

And then you just ask that question that would give you the code. So instead of a few figuring out how to write IT and debug IT, right, uh, that somebody would give you those a very basic concept. And a lot of things when we manipulate data as a soft engineers, we doing for Better things again and again.

And this thing learn from us doing this and what's the best way to do IT. So you don't need to you just need to know what that you need to do this task, but you don't need a be the best of doing this model anymore, because there were thousands of people that did IT Better than you, and it's already part of the corporation is going to give you. So gonna start building off to a very good, reliable pieces.

But without the design at the top is not gonna able to like good luck asking you to build an Operating system, right? Just good luck, right? This is just like it's not about or just add in a few four loops here and there is actually quite sophistic py of stuff.

But yes, if you want to build a very basic, you know, Operation was a vb page, right? You don't need to learn about that standard. E, P, S. It's all will be there because it's very standard. So IT will save a lot of time of learning those things, right? Yeah tober IT create some danger if that if there's something wrong and fundamentally baggie come into its core base, it's gonna suggest to everyone is going to be to everywhere, right? Uh, but like it's it's a the problem with like any library, the library has the same problem like like everyone would be suspected able to so I still think about all these coding things has not been replacement, but being just very solid assistance at in the very best is .

yeah this this thing as well that right now coding is genre coding is still human parasitic. So one of the problems the curse guys were saying is the verification problem. So you you're telling IT to do something.

You ask IT the right question. And now we need innovative ways of just differ many, many files because I might have generated four hundred lines of code and they get experience like on get b. We've got this multi file deff experience that hasn't changed in like fifty.

And now we might have layers of aliens that are kind of helping us focus on the bits that change. So we're becoming verifiers. But still though, what we don't want is a system that would just tears up the code base every single time.

You know, john, john checks some changes in. Now there's this there's changes across fifty files, and that creates this understanding debt because now all of the other developers need to fight out what's going. So there is something to be said for having a very modular architecture, which is understood by everyone. And then we even though we might be introducing inefficiencies, we make changes in a targeted way because we don't minimize damage.

Yes, we are absolutely correct that the design is like some just an engineering to narin away, like the design a complicated system that functions and functions efficient and scales is an art right like like you need to do IT.

Um and I don't think we at the stage where we can delegate that responsibility so we delegate some more monday in work um you know to people um there are definite some risk associated with that would start to touch with you like I I think I see i've see you this way. But then for many, many years, we rely on the people going through the runs and the fly development of the engineers, junior ones. They don't go into design of the whole system, right? They learn from a basic tasks.

Right now, if we start to, some of that functionality would be replaced by assistance, right, of different sorts. We need to make sure we have a good basic learning and curve for that. Like if you want to raise you, you know, generation of architects, like the people who build this system and raise from all three all the way to L A or seven or L A, right? We just want to make sure that have ability to grow right in the new environment where a lot of the basic killed replaced by automatic tools.

I think it's a doubt we just need to find a way to do IT. The whole process needs to change. Think if the interview process has to change. Right now, we were testing people a lot on memorized in tom basic concept right now. Why would even need you right? Like you need to test more of design skills like on how you use this components rather than memorize the best algorithms do.

yes. And and also surveilLance. I've I was saying to last time we spoke on a huge fan of the actor and you have to become an expert in surveilLance.

So all of the the actors are logging their their activities and they will happen a synching ously in a different order. And you know we need to build the next generation of platform is just to figure out what the hell systems are doing. And but it's also more flexible, right? Because we can do simulations and counterfactual analysis and and and god d knows what.

But we are moving away from this old school of software engineering and and that's kind of what I wanted to talk to about because enterprise adoption right at the moment, you know the enterprise that they're they've been building software in a certain way for a very long time. There have been ways to make IT more accessible like a low code and no code platforms and so on. They're starting to um adopt A I what are you kind of seeing know what's the state of play with enterprises adopting A I?

Yeah there are there are several things that we see as a company here and again, that helps me to have a company behind me to uh, hide a little bit of experience here. They definitely like majority of the companies we talk to, right and enterprise segment, their own board was genii. They see the potential of the value for them.

They very frequently have the budgets already allocated to start deploying the technology internally um where things different become complicated is what exactly that the first use case been. They also sometimes don't have the right conceptual understanding what the doubles, what's not with that technology, right? Because usually the way that looks like like someone in the company would say, okay, let's start to use janine internally, decision is made, then they start to get some people from the internal organza.

A data scientist that would help to adopt the technology is that the scientist is excited about IT, but they never actually built sophisticated system. And plus, this is a youtube, so there is all this complexity would never build complicated systems of ourselves before we were scientists, right? More in A I space, out of something we need to do IT.

And this whole itself is very complicated, and the system is very complicated below to fit, it's very hard. And also you need to find that killer up within your ork to justify your existence. So it's a very easy to come say, oh, give me five million dollars at train another GPT for company eggs.

And we had our own model in our own A P. That's great, but do really builds in helpful for the company that model be worse than later. Have used right.

And then instead, you should probably do fine tuning. And then if you do fine tuning, who's gona prepare the data? Select what's useful, select sensitive, which is non sensitive data, like all of that has to be solved. So there are a lot of chAllenges in how the technology would have have to be adopted. The good thing he seems like, pray much everyone is on board that he has to be used right now, right?

People do see the value IT doesn't look like it's just, oh, there is a hype there and we don't know what to do that people had a general feeling that they would help them a certain different use cases and they go different segments of enterprises, financial sectors and sure, automotive like they all see, that is a value for them. Is that technology. But what's complicated, what the exact first use cases should be and also after you narrow that down, how to go and implemented and develop kill because they never had a nore typically that would be able to do this sufficiently again.

Remember they're not or um you know A I first companies for them with a new toy and they had some piece of an experts but not the rest. So company like our sentimental is actually trying to close. One of these big gaps for them is like how to build the system from our data centrally.

We have foundational models. We had our data. Who's going to to help us to milk the data are and get the I P, and get this models deployed, care for us.

That is also gonna be cost efficient, right? Also what we see is that the people that use IT once successfully, I feel like they would never go back. They might criticize the eyes be unreliable and bloom about that.

But if, for example, you used ten million dollars before to find brought in organization, like, say, on their bank statements, and now you use a model to do this, and that model cost you two million dollars, you're gonna a keep complain to me as a company that responsible for optimization is so expensive, two million dollars, but you know, you're gonna go back to ten million dollars expenses with humans ever a get great. You're gonna try to optimize what you have, but already know you make that step. There is no step back, like you're always going to use some V I now, some form of automation, because you never gonna go back to doing this with humans, right? Because IT has its own downsize.

Yes, you start to forget when you use A I and you say, oh yeah, halcon ate well. The problem is humans also make up things, and also halcon ate sometimes, and also like that, make mistakes, right? So yes, there are chAllenges, but we need to admit what the chAllenges is.

We need to build a good road map for enterprises to adopt the I right. And I think the good thing that helps right now is that there are general consensus across many of these organization that you need to be part of IT that you use cases naturally coming. As you look carefully, it's not you not randomly picked, you talk across the team, but you see where your data is, where you, uh, your company unique proposition is and you try to adjust IT and help eye to enforce IT, right? Almost all conversation we have right now with enterprises like the first few conversations around do you really want to do IT they on board, right? That actually relative the smooth is the chAllenges, what exactly that would have as the first step because they never did IT before.

One of the things I found is when you have a uh a great software engineering culture, there's there's less other problems. So you know, we've been on this journey from the the individual lone wolf data scientists, and then we'd start thinking about M, L develops and we start thinking about joined up approaches for data architecture, data engineering and so on. And IT all sounds great in principle.

And then we start to build templates, ze them out, develop ramework. And we say, guys, everyone, you need to use this framework. And now we got a team building IT.

And if you will use this template, then everything will be good. But of course, that doesn't happen because people tend to build applications just for them. So it's a real big step.

Isn't IT moving towards the centralized way, thinking of having functions and platforms? So for example, if i'm doing pricing, I might have a team and it's their job to maintain this pricing platform and that has a fixed A P I as a standard interfacing. And they they just have they make IT really optimal when it's and it's kind of centrally managed that, that seems quite a difficult step.

yeah. So it's like I think, uh, when you talk about this manager platform, like like, like a lot of people support right now, I think is just a step. That's not a final solution.

I I don't think that would be mabe m wrong, but I don't see a lot of people that just wants to use the aneela lama for everything. I just like few people. This is just a building block of the system, right? And I think those system would have many of those building blogs with a lot of fine to specialize system to do what's needed. You probably need get rails around things in each of these components you like you need a lot of things. So it's a way more complicated than just say, let's take a model here and then go run IT and deploy like somewhere, right?

So in reality, I think what you need to do moving forward is you need to build a an infrastructure that can a ran reliably, those complicated system, right? So essentially the system will build out of multiple, say, agents, for example, and multiple different models with all the different constraints potentially connected to different database of knowledge, right? T A database or something like that, that people can utilize.

So, uh, you need to build very different your eyes for those system where people can really like plugged and play components. Play with them has different things. Test like the center you need to build like you not testing one mono model x again, another model to testing two systems against each other, so you want to put the two completely staff one against a charan's y how they perform.

And the scenario also very complicated. Remember, the interactive things can be non deterministic based on one input, you can conclude at one system is Better than another fact. You need to get a very sophisticated environment tests.

And how do things behaved so very easy to test? Owes like like how much faster ties that, how much cheaper IT is on a one metric. Remember, like multidirectional, onal problems are way more complicated, right? So you need to make sure that environment is capit property and you can uh, you know heavily what was good for you, right? I think that's a very exciting seem to work on and that I think that the community will move over time.

Last time we spoke you you use the term dark silicon, which I thought was very interesting.

What what did he mean by that? But it's as as a simple concept you think about IT. So imagine that you had a limited budget of the power that can you supply to to a trip, right? But you have the ability to put way more transistors on that trip.

So essentially, you have to live in the world where at any point of time, you cannot power all of IT. You need to power only some of IT. You have to decide what the power was not right for. For a long time, people were saying that that we're never gonna build in this thing like that.

We're always going on a alie everything that CPU was having, what was built in IT, right? But then at some point, uh, you know the end of you know most law and all other limit in factors leads to a situation where denot scaling, especially from coming back to computer chip ci terms, we are not able to keep their power constant as we improve that you know the dimensionality of the transistors. Why is that the problem? That means I still, as a hardware company, want to sell, begin big ship with more, more transistors.

But they had not enough power to supply them. So either had to make them bigger, bigger, bigger physically, or if I limited by their physical dimensions of the cheap and spital flight and interconnects, I had to limit what I power. So century has many different forms, like that limitation of dark silicon.

Either you gonna reduce the frequency on how you're running. So instead of fully turning off, some parts of the chip run out of the frequency, right? Or, uh, essentially we are aren't to lizer resources. We have to remember from my computer chip ca back, see you back already talking about the phone. And they have been very power limited.

And for example, this was very common on the phone where you had certain memory band with available, but you can only use IT for a very short period of time because otherwise you get power off right, like or or heat. So there was a hardware that everybody can fully use IT. So IT means you need to become much smarter and how to use this hardware. That problem didn't disappear. The chips right now are extremely genius.

Let's take, for example, in video chips, right? Like, uh, that is quite a interesting thing to look inside of IT with all the public information will already known IT has many different precision support many different specialized pieces of IT, right um so for example, even if you think about a ye alone, you have f thirty two precision, F P sixteen precision, right, that combined together so that two f sixteen can generate output for one of pieces to two. Then there are brain float precision, there is an inter precision.

And there is a special dedicated using a unit called tensor course. They don't use a vector instruction to use like direct metric motives in them. All of that is an independent units.

Most of the time, your software is not to written to use all of IT. What does that mean in practice? IT means one where another. A lot of those resources can be easily wasted on your cheap right now, whether you power the more off or reduce the frequency that the the choice that the computer architects build the cheap has to make a very tRicky problem, because the power becomes a limiting factor for many of these component, right? So yes, it's a very interesting problem that anyone who builds a modern cheap has to deal with and prioritize one thing when another, one power in tone and one power in all different parts of the chip.

Part of the problem here is that people are unaware of how much computer we're wasting. What why is that the case?

At the first time in my life, I faces when I I just finished ed, my B, D, and I was at microsoft research right and back there, i've seen a lot of people were excited to train a large moral. We are talking about two thousand and sixteen. Two thousand and seventeen quite some time ago.

Uh, people were using NVIDIA abuse. But where are old one like like maxim generation before p one hundred? And p one hundred was just built back then and essentially there was a lineup of people waiting for the chips.

And I was like, interesting and i'm an an optimization guy, I was very interesting how well it's running because like out of a sudden, there's all these frameworks like you know TensorFlow cnd back then that people were using. I was a curious how well it's running. I got a permission to access and I profile some of those workers that people were running.

I was mine below that unitization is easily can be like ten percent. And I came back to the people, like, do you understand when you run those workload? Utilization of the chips could easily be only one tenth of its potential. And they were like confused by what i'm saying because like for small people, that was, uh, ridiculous claims.

Like what do you mean that G, P, U is running, is busy, how this can be like, not utilize? So this is the way of understanding and the level of obstruction, how people from a welcome unity, typically thinking computer for the other computer, get system, people think about computer. So to this date is actually very to get nineteen nine percent utilization of the cheap.

Usually when you see that IT means that someone of the matrix is lying to you is just telling you some part of the story of what is possible, right? And um that's a huge opportunity there to actually use all that uh you know uh computer is there and part of that uh there are many different sources of the waste. One of the source of the whistle is very basic.

G P U is not even running all the time because you had a model that technically sits on the C, P, U and move the kernels on to G P U, right? And when you do this, move their periods where not in the G P U yet. So you want to make IT that properly pipeline.

And you always the G P, U run in something, but is not always the king. Then another problem, when you're running, you're not been using all the potential of the computer available on the right because IT has so much computer and the models are all flexible and different coronal and not always perfectly mapped to what you need. And that means there's a lot of waste there, so many different source of waste.

And even if you saw that, they still might be possible that you're only using one like might be your workload, just using one precision at a time or one compute units of one type. But in video, my fully power the whole cheap to run at full products like everything else. And you just waste in that compute, not running a thread, right? Yeah so that's very important thing.

And remember, what is also chAllenging here is that when we talk about power limitation and dark sica, if you actually make everything to run, you might actually overheat this thing because IT won't be able to dissipate the heat as well. So you need to be very careful, not overdo IT and do IT in a smart right. So it's a very interesting problem to solve, but that's where there a lot of opportunities there on tap.

Potential that we had sent to mile in our company actually benefit uh use for the benefit of our user. One example, I really like uh h to mentor that we get like one paper recent accepted around out that will be out in a few months century. You can run your training and inference at the same time on the same chief.

And the tRicky around that is just using all the gaps that's wasted and also used that different precisions that are otherwise not utilize. You essentially can run training an influence back to back IT almost for free, right? It's possible, right? If you orchestrate the whole system properly around IT. So it's an opportunity that is all this in a potential that can be utilized .

yeah that that's amazing that's amazing. I mean, it's tempting to think that you know we have these very high level abstractions like even python, for example, in and of course, a lot of deep learning doesn't run on python, but as you know, python is hideously inefficient. But by the same token, IT, it's democractic ze computation, right? Know you have so many people can write python code, but but what you talking about is a little bit more nuances than that. Think you are saying that when we actually start to think of this systemically, when we have a big spoiler and and we've got all of these different workloads in this fabric, we can start to do optimization that wouldn't have been possible if we are thinking of them individually.

Yeah, absolutely correct. And i'm glad also brought up a high up the stack because I think because we talk about dark silicon, I started to focus more about the low levels staff that there huge waste is possible to do things right at the top level. Um I remember when David parkson, one of our uh scientific advisors， received his tear in the world and his steer in the war talk, he showed the examples of metrics multiplication down in pattern.

This is dying. Properly optimize and see plus plasm, then optimizing all the way to the silicon, though, like three, four as of magnitude, the differences, right? So this piret is very inefficient, but piret is very easy to use. And developing to requires well as skills.

So something that by org and python ic access system was very efficient is that they hide all the things that requires open istis ted computation from their running on the pile on itself, like like you use that interface. But under the who is rn in simplest, bus libraries, and you don't need to worry about, all that complexities is hidden from you. That is, was the beautiful IT.

It's like because his refuses. Also very critical is writing everything in sipo s plus. It's not the way to go if you want to make that technology adopted by much wider audience. And even some of the inventions of our group in our company were around automated the compiler development. For example, for a lot, many years um at NVIDIA, people write very carefully crafted minor coronals to get the top performance and that considered the best way to go.

But say, several years ago, a lot of us realized that not a sustainable way was a ye, there are so many coronals and so many targeting hardware, and all of that is very diverse. So we have to advocate, you know what, you actually do this more automatically using machine learning compilers instead. So in that case, you still need to give an input on what currently you want, but use a pattern interface to give an input.

But the all sophisticated culter code under the hole O P T X even is generated automatically, right? And it's feasible and people are benefit from and break now. And I think people at a companies like google build its like compiler t that also does you know some of those things quite efficiently like in a more automatic rate.

Now this is one of the products that, that you have actually. So you can plug this thing into your PyTorch code. It's just another few lines that you need to add and you can compile the model so you can say what optimizations do I want to reply and it's onna make the model run significantly faster. But can you talk us through that?

Yeah, I was like that again a couple of years ago where a piter check system starts to evolve and set to be more open to to other people to contribute in the development. Ah there was a possibility to use something called by gal compile, right? So essentially IT allows to take your model as an input and use your own back and talk to my things.

So um our research group of use of toronto and our company supported and new machine learning compiler er and the aral high that that allows you pretty much within a single line of the biological to optimize IT significantly. And since then, a lot of people just to go in the directions like one very successful project in the space, is try on compiler as well. So uh again, people realize that keep written kernel manually is not the way to go and you need a more automated way.

I just want to remember uh uh um to tell you though that this is just only piece of puzzle. This is we are talking about the kern nel level optimization of the whole model optimization. So then generative I worked, complicates this process of and further, why? Because they typically require something called dynamic shape and compilers are not very good. And optimising some of the has random boundaries of thinking about proper optimising the loop that has unknown boundaries of the loop.

That's very difficult, right? So essentially that's what make the of compile optimize is difficult, but is still a solvable problem, and also solve IT in our company on how to run dynamic shape workers very efficiency, right? But that's at the low level, right? So why is this important? Is because that and logs the ability of other people to contribute to the success of this pitch or any python ic basic system, right? And I think people realized that is the way to go, and a lot of people contribute on the space. I think it's like right now, IT is very heavily supported, including individual. I have very heavily supports build in the open source community around the machine learning compilers.

You know like with c plus plus, we you know we have a handful of compilers and if i'm compiling something on on mac S, I use the same compiler or every single time. But i'm i'm getting the read that the machine learning workloads is far more hero genius. So there are specific optimization for specific jobs. Can you explain that to me?

Yeah it's very important your for example, even like first of training for this influence of very different workers, like people say, like isn't that very similar? Like you know like doing lot of magic multiplications all of that and the answers, yes, to know there are common corners, but a lot of things that are very different between the two and the scale and all of IT then is also very important on and more nuance on how you are a optimized for particular target.

For example, your target might be agency, your target might be through put your time cost or more commonly, combination of that, then you had a bunch of tricks at your disposal that you need kind of baLance. And none of that is usually just make everything bad. It's a one thing sacrifices something for something like for ample.

You mention today speculative decoding, very cool trip, but there's nothing for free. So speculative decoding means you do something speculatively somewhere. That means you'd have to burn that computer that you can use for something else in that.

Why would you make sense? Well, if you want to call the latency is one of the metric, so you want to speculate on the outcome, and you can maybe have something intermediate values ready for you to use, so optimize IT. But that means you need to burn some computer somewhere.

And IT might be actually burned for no reason if you actually wrong, right? So there is a trade off here. So when you improve the latency, you would make your throat at worse. That would make the cost of the process ah uh higher, right so that there is no like free lunch in many of the optimization.

Another common example, people like to reduce the memory footprint of their model because memory bandits and memory capacity is one of the bigger limit in factors on running these models on on existence chips and video M D O S. Then there are several different optimization we can do to reduce the footprint, but none of them is completely free. So for example, you can we compute the values you otherwise need to store, but the compute means doing compute multiple times the same computer.

So you waste good in uh counter of memory. Why would you ever do IT? Sometimes in some chips you had more memory and less compute and that's a good or other the way around and that's a good trade.

Or you might actually do compression right of certain parameters. Compression lossless a lossy introduce other trade of quality of the model changes and also extra computation sociate was compression to compression, but then IT can potentially save your parameters. Each of these case, and there are many, many tens of those different opposition.

And this is why IT makes IT hard, like for every model and for every use case, the optimal frontier I changed dramatically. And you need to find the way to a, what would be the best set of china ques to do IT. And it's very hard to do IT fully, automatically, forever your model, forever your workload for every your cheap that you have, you dispose.

And this is some of the inventions that we push through invoice research community. And as a part of the company, we say that has to be done more automatic, right? Because a lot of early solutions are really hard called IT to run on a given model, on a given hardware, in reality, can be done more automatic fashion.

So IT means that the customer brings a new model quickly. You can do IT uh, quite well, for example. And give you one example that I think highlighted what we were very proud of last year when lama to went out, a lot of famous companies were like, oh, we need to like optimize our models to run IT efficiently.

And IT took many, many weeks for some of them to do IT. IT took us less than a day to make like lama tool to run our software stack because everything is very automated process again. And this a year when lama three was released, four hundred five billion model.

We did a live hageman within have a partnerships, unfortunately with uh facebook and matter on that. So we did have access to the model. But within one o IT was working for us, right? why? Because everything is automated.

It's like we anticipate certain characteristic of the model. We know how bigger can be and we were prepared certain trade, then we found the optimal mappings. And then we just know make IT available to people pretty much immediately.

And that's very important. When there's diversity of models, diversity of the world is changing. Another very important factors, specifically about gene ei is that even if the model is fixed, the input itself can be dynamic. You change the sequence of land, all the other different hyper parameters, there are many of them that changes what is the best way to map on the hardway.

So I like to show in my lectures that for a basic attention block, depending on which parameters that call bigger sequence land and the window on the other, the parameters, you might actually want to fuse these kernels instead of this kernels, and slice IT this way all the way. It's a very complicated problem moving for humans to find. IT obviously don't want to do this manually every single time, but someone changed the input of your model, right? So this is something that a compiler has to do to make a office.

Yeah, you've articulated that really, really nicely. So what you're saying is that this is very situated. We're in a non stationary environment.

There are many moving parts. So you know, we might have different bad sizes, different input sizes and so on and all. So we might be in the context of a system.

So we might have this big cluster that's doing all these different jobs. And that actually gives us a whole margin for optimization, where we can select the correct optimization and that the correct corners to use. And so you .

absolutely correct. So that is like all that complexity in there. This is wide. When we talk about into Prices early in the stock.

Remember, you just hard for them to do IT because like if you do IT manner, you need like very, very unsophisticated engineers that know how to do IT is expensive for them to hire these guys, right? And IT actually not required because that like that can be replaced by the software, right? This is where all that complexity comes from.

Its like the worker with itself is so dynamics so complex. As you said, the system itself has resulted external components that a movie. So you want to make sure that this whole thing into place really well.

So you optimize for not a particular benchmark, not a particular model, you optimize for this use case for that customer in that environment than running right now, right? And this is why IT has to be more than amic. And this is why you need to build the system tag that I would be able to a double that complexity.

Yeah, a bit like that. No free lunch. They are applied to two machine learning optimization.

Agree, agree. Yes.

we we were saying that we now have this new paradigm of model development where we have this big compute fabric, we can do a whole bunch of optimization against that. I mean, who are your customers? M and how is this working?

Yeah so like um has several of our customers like enterprise uh customers like like example delay is one of them. And like the one thing that important at this stage right now is that in my you really need to build this relationship together with this customer. Except for them, they never actually had genie production forever bright.

That's the first for them. There are certain expectation. There are certain like knowledge in the team. And you want to make sure that we argument their team knowledge was everything we had external.

They they best know their workloads, right? They best know their data, and they hadn't understand in roughly what they want to build. We had a very good understand how to build the system, and we have power.

Someone understand in what data they might have, and we are trying to build that bridge. So if you say me where the chAllenge is, is how long that breach takes right now. Like building that breach is an enterprise customer because a few of these interprets customer more like partners to you like to spend a lot of time you nurture this relationship, they might become your channels.

And we're built in multiple of those. And right now, we also uh, had another customer code redemption, which is one one of their biggest deal hardware sellers. And again, they had the hardware they built and they want to make their hardware males. And they had accelerators like invited abuse there, but not necessarily each one hundreds of one hundreds like views. And they want, like magically, that these boxes out of a sudden starts to run like all their alarms that people are using h one hundreds four.

So someone hands to find all that new transformation and baLance the mouth always workwk for that, right? And it's not a trivial thing because like this box is not only had smaller GPS, they also don't have fast interconnect like an an IT means you need to find how to communicate well, less data, making sure that all of these smaller chips that are cheaper and smaller, but also less computer, less memory, are busy and doing the useful work. And then you get the output and the speed that you actually good enough for the real use cases.

So we get a lot of this chat with your data interfaces for the company is like that. Another similar use case for us was company called our cat is right. It's um an engineering company and also had multiple you know boxes was G P U and want to make them use in allam rq scenario and that's very common.

We also had multiple different uh ongoing partnerships with like you know big vendors like A G C P and A W S where we help them Better utilize the computer and locking their own customers as well. So this is all i'm going. And ultimately, it's right now when we talk about enterprises ease the time IT takes to go from uh research to the real real adoption at scale.

That's important because there is a wrong perception. Sometimes by data scientist is if I found some really cool model somewhere online and I running on this few inputs and the quality looks cool, i'm ready. I can go and deploy IT IT.

Turns out now that the tool you use to deploy IT, my work on a one hundred, but IT won't work on like on large scale cluster like Operated like that and you actually didn't optimize IT by crunch other workers and didn't realize the way you run can only run one input the time because you all the memory right now and if you run one at a time, the Price run in IT is like enormous. And that kills the whole point of the workload of the u kiss to begin with. So you need to educate the people say that, no, you need to be careful about that.

You like you tell us what the constrains are. We help to do you to do IT for you. We try to to make this process as possible for us as the company is very important not to become another service company, right that i'm trying to avoid.

We do this initially in some of this engagement, but for capability, we try to automate this process from each of these new engagements, right, to make sure that becomes a very repetitive. So we had several other gene set up companies like equi, for example, and a lot of other conversations where that learning process with the bigger enterprises helped to full to ate the process. So when they arrive, we give them all the software stack and they can use IT out of the box with no any special features needed.

So that's another interesting thing. There is a different segments. Some requires some customization, but some are like fully automatic.

And where's full automatic? Cheap is cheap for them to use us and also cheap for us because it's very easy for us to on board customers like that. It's because it's fully automatic, right? We can support a lot of that.

Why is that important? IT allows us to keep very low Prices for them uh, for for the final for our software tag because we don't need to use engineering time extra for that, just a basic support. So IT also means that they can be successful because they are you know software stack building on top of the models, not that that expensive for them anymore.

Yeah I I agree it's it's good to um avoid the consultancy trap. Of course you know you guys want to build this platform you wanna scale up in and there's always this tension between and you know kind of focusing too much on on individual installations. But what what is your relationship with the cloud providers?

Yeah we as I mentioned, we had a very good partnership with several magic club providers. So like uh obser not ability A G C P N A W S where official you know official partners with each of them were on their marketplace. And we had several strategic engagements with them directly with the engineering team.

And um you know uh also we h with somebody customers as well directly. Then we also had different stage of engagements with other cloud providers. You can think about all the the usual suspect in the space, i'm not gonna say until we publicly announced, but there are a lot of these conversations.

A lot of them are looking at the southwest tag building top of their a lot of them had very good experts how to build, you know, cloud service, like how to provide that bare metal access to their hardware. But they want distinguish characteristics. They don't just want to be Better metal supplier, right, is very competitive market right now with margins for their hardware going down and down.

So they want to have a software stack that would attract customers and keep them on the platform, right? So that's a very important thing right now. And we help them to get good use to print the customers and connect them with their level of expertise because majority of them don't are not like bare metal experts to use you know, their hardway directly. And you also try to help them utilize the hardware Better so they would understand there is like you can call on things together and you can do this and that and you can save a lot from your computer that you didn't do realize as possible.

interesting. But you don't you don't feel that you're competing with them in any in anyway. I mean, I I guess this comes to to the mote question. But let's say if if I don't know if AWS gave you five million in five billion dollars to be exclusive on their platform, would you would you be interested .

while five billion old is a lot of money, right? Like is like IT all depends on the stage, right? APP if the deal, like bt will come with a little talk, but like honestly speaking, will always try to build something that would be not overspecialized for any cloud provider or any particular hard way even, right, like we always wanted to have a platform where all cloud providers are equally represented and all the different hard way equally represented.

And I think it's very important for us world is because at end of the day, we want to get the best performance ah for the lowest possible cost. And we want a place where we had an onest opinion that would optimize my workload for my uh for my model and for my use case and mapped the best possible way, right? And I think for that IT is important to be uh in good terms with all the key local providers and all the key hardware enders.

This is why we had a very good uh in partnership relationship was in video and the our investors in that I will previous around next year. And then but we also like working with google, had T, P, U, we talk to A S, has training cheap, we took to other vendors, like we are trying to be equally friends with all these companies because I think they are important and and people don't want, I think in most cases, one dominant monopoly of some sword. They prefer to have a choice of multiple things that makes the market also more healthy, right? So I think we are trying to help them to find the best chiefs optimized with whatever constraint they have.

And they had any concern that have older N D, G, P S. And they had the new advances. M D, G, P S will try to make sure that they can leverage everything .

that they have a day disposal is always the question of being in cloud, kind of like um you know embedded in in one cloud. But I suppose you could go one level of abstraction up because it's great that you have this echo stic ism to hardware. So you have this incredibly intelligent system that can optimize for the hard way that is running on the cluster.

But could you, in principle, do that at the cloud level as well? So you could do cost optimization by setting up clusters, you know this bit on A W S. And on on google or something like that.

IT is technical, possible, will already do like multiple cluster things right now, like within one winter. But that can be done across venture is in principal can be done. The problem is a communication between the amount, how cost, latest, like inbound traffic and egress.

Uh you know that can be expensive, right? Uh but technically that can be done. What uh you uh at the beginning you raise the important questions like how you integrate with all of this different infrastructures.

And I want to say that there is some level of complexity, but not dramatic because ultimately will pick a level of obstruction at the level of a container, right? So all of them support containers, one form of another, that a very basic concept. So we make sure that whatever we deploy is just based on the concept of container, and we can build on top of that.

That means that we can easily go between G, C, P, N, A, S. And then all the other guys that are very metal there just need to support that, say, basic unities on top of day in front, and that is good enough for us to build on top of them. So it's really a few specialized a cloud providers that is like a hyper scheller and everything else from all the level perspective, others are more less identical. They all like bare medal access and we can manage you know, all they are quite well.

Yeah, I think part of the thing as well as you need to have this pragmatism because you need the models to be close to where the data is. So for example, I I think you're built into snowflake and from from a data base query, you can actually like you know call into the cluster and you can say do some sentiment analysis or something. But in order for these um architectures to work efficiently, you actually need to kind of respect the data architecture boundaries as well.

Yeah, we absolutely rack. And there is like you brought up this case, I yes, we partner with snowflake as well. And like we are available the marketplace so we can accelerate the models of the customer ones.

Why would we do IT? Because there are people that are really like snowflake infrastructure, the data already there, and they would prefer to do things closer to weather. Data resides, right? And not moving IT necessarily to the clouds like a different cloud provider. So we provide them that flexibility, but they provide them the ability to run whatever model and whatever applications built on top of the standard male models where the data resides for them. So that's where is convenient for them.

So moving over to the agents discussion now. I'm very excited about agent. I think that I mean dug and iran, and i'm not excited in a kind of you know it's agi in the singularity kind of way. I'm just excited in the way I love building, you know distributed acing chronic systems, I think is very exciting. I think it's it's really cool to think about this is like a synchro units of computation at the application level but of course, you know using a shared computation fabric, which which you guys are are doing so how do you think this paradigm of of kind of software engineering is going to change things?

Yeah, i'm definitely brazil. And like I also like to build distributed systems of sync. One is very exciting that is open up again.

Go in back to the event I attended last week. Intelligent applications IT was very nice event organized by maduro, A A W S and microsoft. And IT was clear.

A lot of people, they were run and around and saying, like, how about agents? What are you guys doing? What are you doing? This space like everyone is excited about is similar health.

They were excited about rag, like me, say, six months ago, right? But it's like you want to find this killer upward to use. So the problem was this IT seems like its ah one chAllenge is that is not used as heavily production as we would cope to, so it's a little bit early.

Another thing, people when they hear a nice bars words that to use IT for everything. So whenever they say in agent IT becomes like almost like a the application. So it's not very helpful like you need to understand what kind of intelligence you incapable late and what you mean by an agent in general.

But I do think this is the direction we should go. Like i'm not as excited if all the future we are gonna use gene eyes by running individual models here and there and they still if you have human in the loop, it's not very efficient. Like I want this system to be able to and exclude the human in the loop and human being.

The designer to top and overseeing the process, but not overlap into every single conversation, is the same as managing the company. Like if I as A C O, have to be in every single meeting just to orchestrate the communication with everyone, but I take a failed company very quickly, like I rely on them to have the proper meetings protocol without me and be efficient. I don't need to be there and not be bottling the same.

Here is very interesting. How to build this sophs casting system out of those agents system is exciting, is still a lot to be proven on how this will be using real examples. Remember, people talk about fine tine and fine tune on was used to certain extent, but I didn't solve all the problems IT.

Turns out that fine tuning for these models is also not very cheap, right? You still need hundreds of cheap using many cases to find tune, and you need an expertise to do IT. So a lot of people they like, even find tone might be too expensive for me right right now.

Then rock came up and IT helps with certain things. But you realize that rock without fortune is not as good as rag was fine tune. Like like people look at IT helped you was certainly the gardens, but not was everything.

But we anyway seen the real use cases where people benefit from me, but not of that concept alone is the answer. It's just all pieces of the puzzle that can benefit defended on the use case. But that ultimately, with all the course of you building something more exciting and that become agents, I think everyone starts to talk about the agenticity stem I lost track.

How many start up around, uh, claiming that and funders ies recently around that space. But again, I want to see a customer tell me that we build in the that will be excited. Remember, we talk about enterprise customers.

These guys always gna have a certain lack of adopting things, so I don't see them like an adoption agency yet. IT will take some time, and I think we, as a company, centamin need to be ready when that comes. And that helps with us interacting with this genie startups because they are the driver of the novelty of the technology, but ultimately, the revenue and the future of the technology.

The use comes from h enterprise adoption, in my opinion, and cloud provides adoption. So we thought to be used there right now. I think a lot of that is just an experimental face.

And I talk to a few companies. They are doing agents and like guys, like, do you need our help? Do you need optimization? Like it's trying.

They, oh no. Everything is like a few request here and there. And we are calling chat PPT under the hoods. Like all get GPT like, you know OpenAI A P I here. Obviously, it's not at the scale where like we can help, for example, right? Is generally it's not as they just proving that something can be done an important step, but it's not the scale step yet.

But still on a big fan of of your kind of methodology, which is that, let's say, i've built this big distributed a synchronous system with the act of pattern. I'm looking at the logs that there's a logging actor and you can start to see patterns which can be optimize, which you wouldn't be able to do before when building these A I applications as well.

What i've noticed is that part of the blocker is the complexity grows very quickly if you try to do IT monolithically, even if like with a rax system, if you're building an information attribute system, typically you're kind of like sticking together, you know, query auto completion systems and then like, you know, we rank systems and you are talking to all these different hero genius data sources. You know, the fer print of the thing is very complex, is very difficult to monitor and deployed and so on. And the reason why people don't use that, say, the actor system is because I I guess you know from zero to one, it's actually more complicated.

And you know when you look at twitter and linked in in all these big globally distributed systems, they are already doing IT. So it's almost like we need to say to people, you need to start by building these distributed systems. I know there's a bit of a learning curve.

It's gona take you a while together, but you need to do that. Clearly though, we need to come up with the new traction. I don't know whether we could have a new type of programing language or some some new way of building these systems. I mean, what what might that look like?

yeah. So I think there are several um different directions we can go with that. And i've seen some attempts to solve the problem that are definitely people at berkeley.

I've seen working in the space like and that the breaks, working on compound system, that dimension before for a promising, I talk to a few of my radio students also as one of the next interesting big problem to work on. So I think we would need, not because in your programming model, but we need a new level of obstructions to work with these systems. And you absolutely correct, we, uh, not only work, we need to identify things.

We need to be able to monitor n debug them like remembers that was all of the system like you never smart enough if you build IT, you know the system as good as you can. You never like smart enough to debug IT, right? So that that a conventional problems like like we not only matter building IT, you're gonna build IT with issues that for sure you need to be capable of and then monitoring and debug afterwards.

Everything that is a synchronous distribute IT is always difficult to debug, right, making a deterministic. It's a known problem fundamentally from the production efforts we had for many years. So essentially, we need to make sure that we build that as a part of the system, that there scalable monitoring profile and we can actually can debug them with reasonabl Epace.

All of that is still an open problem, right? I think some of the things we learn from building classical distributed system would help, but ultimately they are gonna be in you, uh, you problems that, first of all, I believe those system gonna very, very hit or genius. So I believe gona become consist of running things on C P U, G, P U, other accelerators like G P U and other components.

Because I might turn out each peace would run in different environment, right? That particular model might reside on this club provider and take like that, and you need to orchestrate of that. So IT would be like build in software that has to be orchestrated across the globe, type of thin, potentially run in a different cloud environments.

You also want to make sure you build smart version of migration, right? People work on that for a violent CPU world. But now you would need to do this at, you know, in the eye space.

You need to make sure you had checked in mechanisms of difference, like a lot of things that needed. Make sure to manage that complexity is a very open, interesting problem, is very interesting to build products like that, is also a very interesting for a new future. You know graduate students looking for a uh, projects work on as a very promising area, in my opinion, right? A lot of very exciting things will come up. And i'm only touching the system level aspect of IT. I'm sure there's a lot of excitement on the eye space on what can be built with things like that.

Yeah indeed. And of course, there's no such thing as as a free lunch there. There are always bottle on expert. I love that book, the mythical man month.

And I was saying, even if you have a perfectly partitioned task, you know, you still have this kind of curve, right? You at some point, you can just add more and more developers to the problem minute and IT doesn't go any faster and i'm sure there would be similar bottle next, but it's still cool though, just to have a whole bunch of um engineers that can work on different units of the system and they can work independently and they can deploy independently. And that that's for exciting.

But I wanted to move on to the M L. Perth and M L. common.

So now you are a founding member of M L. Perth, M, L. Perth, and the the research coach, A M L commons.

Now this is all about benchmarking in N. A. I. Can you tell us about that?

yeah. So what was the motivation to, first of all, building, although before I was created so my own journey was like that so um I was as I said at some point here in two thousand sixteen was at microsoft research, right? And I wants to understand the problem of utilization, how well these workers are running because I had a theory that the guys are running those.

A lot of those experiments of scale never actually built sophisticated system themselves. So they never looked at that, and IT turns to be true. And then the second problem background, two thousand sixteen seventeen, was almost everyone claim, but then there are southern gg.

Better than everyone else and when that happens, you know someone was cheering and majority case that everyone was cheating and they were children because they using different data set nh non traditional methodology, a lot of things you can cheat around the system. And we've seen these people the world, uh, where was the explosion of CPU being built in seventies and eighties at some point got a benchmark called speak that helped to build IT. And there were uh, people that clearly build didn't have this experience.

So I thought seems like that benchmark doesn't exist. So when I started my academic journey interest of toronto, my first four graduate students all work on build in benchmark for a century, said, look, ea systems guy, I won't optimized. I can't optimized until know what's happening.

I need to know what, uh, the things we can do right now, what's problem? What's not the problem? There are models that are different dataset I don't understand.

Everyone claim ridiculous things online. IT cannot be true. So we start to work on that.

And where is released? The first benchmark suit called T B. D. So IT was training a benchmark for dinner back then, but also IT.

IT was called T B D because the models were changing quickly every time the finishing iteration. There is a new model there that has to be excluded. So IT was always to be done.

So we call A T V D. I think it's still likes. The website is still alive, probably somewhere. So we making D, B, D, you available to the public. And in paro to that I was talking to a my friends at google about potential collaboration on on different research topic.

And I mentioned somewhere among all the cool things i'm doing like, oh, I like to doing this bench market for a mile and then uh, one of the guys they are Cliff yan, who is one of the architects of tpc. It's very, very relevant. There is something happens between us bide and stanford that's not announced that we're talking about.

But like you seem to be a good feedback. Do you want to partner? You want to do IT by your on your own said, like I almost open to part, I didn't know you guys are doing this.

So this is how I got into the stain. And then after a few months of emails, that was the first meeting at stanford. Right back down IT was prety of IT.

Like we're talking to tell them seventeen, two thousand and eight for everything wasn't person. So I flying the very regularly and then we just meet with a small group when I try to identify like what that benchmark should be. Its very different workload from the traditional CPU, very different from anything about the world benchmark before the scale is very different as well.

And there's training, there's influence. There's so many different use cases that there is like a large scale, small scale, there are so many category, are very hard to pick one particular thing to measure. So we sit down and start to build working groups, start to build methodologies.

Several of few early benchMarks and a op were built by my gradients students, right? We contribute one of the first speech benchmark, p on the way, on the deep speech. And a lot of that will done at the beginning of by the academic community, with people from burning, like people like dave partisan were, involves that people from all the industrial companies you can think of graduate art to join.

So I started by about maybe fifty people in the room, and now I think I will come on and on. Thousands and thousands of people in a many of the worker would like, quick is just an available people. Uh, rather.

IT was half and half at the beginning. Academics was his industry, but obvious t when IT starts to get distraction like IT seems to be over dominated by industry. That is a huge organization, right? That does uh, a lot of good things to people.

IT provides that proper measurement are you know a step to say where you are you cannot compete on, say i'm south and Better than video google, if you claim that code come to M, L, perf h and benchmark and m commons will release the result if you really wants out. And I usually answer no, you're like cheating ing somewhere, right? So right now those claims, uh gradually disappear and essentially, uh uh, people are not claiming this anymore like people know what reality looks like.

We've got this big problem that we in, in benchMarks that they don't represent real world usage job. The thing I mean, it's been spoken about cargo. It's been spoken about M M L U N L S arena and and so on. And you know, how do you end? Of course, you know, benchMarks get good hearted as well, which is that when they become a measure that seems to be a good measure, how how do we actually make these things faithful?

It's a very hard problem that people try to solve for decades. Um the question is um can you make them perfect? I just want to get them reasoning good in us for the purpose they are reasonable. So essentially, uh, when we build them, a perfect, no, I cannot make IT perfect is too complicated of a problem.

But I will try to make IT sure, reasonable, fair, because the rest alternative was a wild while west that exist on line by zan, where everyone claims that we're Better than everyone and potential customers were all got lost on what can be done. So essentially what ah uh you know there are all these benchMarks, me, that you can do and it's the choice of the company that present their benchMarks, how honest they want to be. So we create the rules that the playing field is reasonable, honest, for example, for training, very in the day.

We realize that some company has way more resources than others so they can win in convergence just because they had Better hyper parameter tuning and they had they can spend the millions of dollars in advance, find the right hyper parameter, they can verge faster. So in order to make things more equal, say that, look, if someone is doing that, others can steal your hyper parameters and they intermediate runs, results submission. So if you ended abusing this, if, say, you are like a big company, hyper scale or in the video, and if you do that, it's okay.

It's totally legal base on the rules, but others can use the same hyper parameters that you do so that we try to equalize the playing field as much as we candle. Obviously, the big companies like NVIDIA and the google still have way more ingenious to do the manner tuning. But at the same time, uh, it's important to make their feel as equal as possible for people.

This is why we will think where very successful in you will prove you have big companies submit their results and small entities and even academic comes submit some of the results, right? Again, it's very far from being perfect, but he tried to keep up with what's to go in. For example, this M O proof has lama seventy million parameter for in prints.

Um that's not the best that lama has store for but the next improve the february would have lamor three was four hundred five billion parameters right as an inference venture a so the goal for amper was never launch the state of the art faster and available best model something that being, uh perceived by the community as a stable start to score like a really, really good model. So IT means that by that time there might be another long my around, but was still benchmark the chips and the software south that something that is really reasonable, right? So no miracle there.

We don't solve all the benchmark problems you can steal over feat seeks that you can still put a hundred developers and manually tuned to get just great results of these models. And then and your customer come to you with a very different model and they get very poor performance. This is A U.

S. A company, right? Like like, we cannot stop that from happening, but we create a cool fill for people to demonstrate result on a representative set of the models. And that ability to do performance cheating is getting hard and harder because there are so many models across different benchMarks. Good luck doing hundreds and .

reproduce playing the good hard thing we can spoke about earlier before, you know, because you could up promise for accuracy and know we tend to optimize for headline metrics and then maybe we we lose something on on performance. But have you noticed that the diversity of submissions has actually kind of influenced the evolution of the benchmark?

Definitely the case like I am obviously like running the company for more than two years or not as closely working with the mothers used to be. But I clearly see the communities grow in. And IT creates a lot of different camps there that people do an algorithm that people even start to do, uh, you know, uh, data benchmarking.

Specifically, four mile people are doing storage age benched market, like all the different aspects of the system. So, and they separate and different work in groups that how you manage the complexity of so many people involved. And I was involved in several different groups at different stages, and obviously was helping with research and promoting among proof on academic stage.

And I think IT becomes like if you ask me where I am more perfect and more common, successful to me, IT is. And i'll tell you why, like every time right now you see the announced on videos from Johnson about like then you cheap like what the numbers they are citing. That's what they are citing.

They are not citing any special numbers that they get in video. This citing the m for results, right? And there is a good reasons for that, right? Because that is not just measure by NVIDIA that was only dated by their peers and competitors. It's not perfect, right? But it's still someone else going look at your numbers and they can reproduce and run what you're doing, right to an extent.

So they can see that you legitimately get these numbers and they can see what's coming from, right? So yes, and all the trees become public and actually push the science further, because, like all of that is visible for the rest of the world, is very important. IT also make effect on the fate of the company.

I remember there was one example. Where are one of them about hardware? Totus decided to participate, and they had a very good, cheap, but IT turns to be not as good as some others, so they bring up the solution, and the results were not great.

And investor said, all we not believe in that anyone. They took the money away, right? Gradually, right? So IT IT is a powerful tool, right? And it's some more is too, because like before, they want to tell an investors that they are Better than everyone else.

But now they came and measured and y've seen. They are not they may still have a good system, but they will say two x worse than I say something in google T, P, U. And because of that, they just like IT.

It's not something that that was proceedings as active. So that's very important because of the wise is very hard to validating these claims that help good performance you have. And like what kind of the system have I think after that, the number of B S online about the performance reduce dramatically.

Like because they if you claim like that, people that go to them all portfolio did your numbers right? And you have like if it's so good, why I don't submitted IT when I don't make visible and even before people were open sourcing things, but when there is open source things, but no proper methodology doesn't matter like how many different rest. Net results we had was different data set that cm dict ous sink around the performance.

And there was always some methodological here. Everything is under control, right? So like we limit the amount of variables that you can cheat around, so we get you all the optimal parameters that people, but you can cheat against others with those problems.

So I wondered what your thoughts are on the the state of industry research. Now we were talking about this last time, you know, name Brown tweed. And I I think he got into a bit of A A bit of a back with the other coon, but he was kind of saying, look, you know, we've actually got millions, even maybe more than that, millions of users using our stuff and as as an academic, you've just got to convince scenario chair and like you know, to to review is but how do you kind of contrast the kind of industrial research that you are doing versus academia?

Yeah and it's like all my life, I like to sit on the both sides of defense. I'm very glad that my family also supported me. Was that because it's actually taken a toll on you? So there is a reason for that.

So there is a benefit of being an academia because you work on clean, nice ideas without any industrial biases. And the immediate deliverable, right? Remember, you need to see, all the end of the day, we are full profit.

We need to generate driving. A lot of decisions have to be based around that and is good and bad at the same time in academy with solving problems, right? And we are usually very honest. We supposed to be very honest and how we do IT.

And yes, you can criticize the um academic system, how you select the papers, but that depends on the community I actually belong to the system's community in like in some of our top conferences in order to get in you need to get five to nine reviewers to accept your papers right? So is actually not that random as is my sound, right? And the bar to get in the top conference is actually quite high.

It's very different from a malm unity that had tens of thousands of submission and thousands of accepted papers, but that's part of the difference between different communities. So I think it's unfair to say that everyone as necessarily low bar of accept in the papers, but very important because the innovation comes from them. Remember, it's not really the industry they invented.

A lot of these advanced six like deep learning, like you know, are a lot of those things were invented actually academic academic because academic thought. IT has the potential, they believed, and build IT, and they didn't require tons of resources to make IT work. At the beginning, the industry is very good at making IT a skill, so when people realize where the deepen in potential ease, microsoft research, google, open the eyes and example starts to scale the technology.

So the google, for example, is that behind attention models, right? Invention and bird and an opening, I took you know a lot of that and start to build further for the stale further right. So um there is a good thing that industry can do.

So the industry was always important because I believe in academia, there is this policy. You can go into solving an irrelevant problem. You just become if they become like a few with matisse, just pick up a random problem that looks fancy and you just solve in and publish a paper.

I never liked myself, expect that something always bothered about me. I always want to be close to interesting, because this guy has the right problem to solve. This is why always in my career, always, at least one hand was, uh, one leg.

What was the right a way to say was an industry right? Uh, because I want to understand what the real problem existing, right? And that's the way to contribute. And that is evolution and some of those things that an industry usually have the right problems, but they not always have the time and engineering focus to solve the right problem. They solve what's needed for their customer for the next deadline.

And this is why for many years, I was always taking the problems from industry and solving them academic, right? And then I had to move uh ah now do the focus slightly differently because I realized the solution. I came up to the problem that exists.

Industry, I can not scale to the enterprise quality. Academia, that's the limitation of academia. Like I can hire twenty graduate students maybe, but I cannot hire fifty and they're not gonna generate the call that needed.

They had a different uh a goal to publish their papers to get their s and master physical rather than actually a build in an artifacts, right? We do a little little bit of artifacts, but so it's a very tight interview. So I reached the stage in micro s like, well, I build something really interesting.

And IT now become practical. And I realized that I don't do IT like others, like kind to ignoring me, that they think he is not important. So I think I had Better build the company and do IT the right way, right? And remember that I was all that a GPT.

So it's very nice interplay between the two fields, right? And I think both needs to each other. So it's not like all research like like people between researcher lives very frequently move right.

Am I always a like to think that we need to have also one of these best leading laps in the world like like you know like bad labs was the time. And I like all these labs war like um a place where the brighter you know graduate pes want to go and develop new stuff IT. Always nice to a make sure that we have something like that.

And microsoft research was like that for a while. Like I want us to have something like that moment forward as well. I think I would hope that someone would be get up, I think matter did IT to a certain extent with the eye right, and h yelly coon pushing a lot of radical stuff there, like in the ice space.

But I think I would love to see this more beyond the eye as well, like in other in other areas as well. I think that's a very important interview between these two areas. And I think I spend all my life, you know, being either on one side or the other, but always stained connections with other side.

And what do you think about the trade off with exploration and and exploit? And I always feel that there's a bit of a basin of attraction even you know sa hook called IT the hard way lottery. And I suppose that's kind of what we're talking about now that um so much of the deeper learning revolution was just kind of influenced by the investment and researching and hardware and so on. And you can just build this whole directed a cyclic graph and we can do a bunch of graduate student dissent on top of that. But do you think that just having a few random people working on completely crazy ideas might might lead to something?

Um you need to select those crazy people carefully. I think you need to have some of that. Remember like that was a reason why japin TM actually one uh, nobel Price today, right? Like which.

oh yes.

you know greate achievement for everyone like you know those jeff and jeff was personally, but like again, talking about that, like for example, some point like in U S. They just didn't believe in this whole field of A I right and i'm glad there was a country like canada that actually was willing to invest that people with this crazy ideas that that is feasible and see the potential in that and believe long enough until they can reach the stage that actually practical and out of pa sadden and change the world, right? So you need people like that the fundamental research has to be there, but you need also see where it's like there is a chance for that.

This is just like a crazy person like like you can just give everyone like there should be a selection process and there is like like you select the people that have the chance to change the world one way or another and give them resources to make. This happens when there's that grants exist. This is why industry also give money to research grants and give us the chance to innovate, right? I receive some of those awards from like google, uh, facebook and in the and where and others.

IT was also very helpful tool, you know advanced research in the areas that others you can afford because you have the budget, right? But in terms of explore exploration, I think it's a very good problem, general. And I think there is even resembLance about what we have in a small space right now. I actually think about spending money on training is exploration and using influence and deploy exploitation. And for a long time, I was very puzzled that we were mostly doing exploration without no exploitation.

And like like at some point that that was to change, right, people has to start to actually deploy the portal s and i'm glad to see that starts to happen uh for like like at a reasonable scale starting like last year, but also very significantly this year that clear trans I see in NVIDIA reports and others that you know, people want to deploy more. And we reach the stage where, yes, the training is free. Training probably only can be done by a dozen of companies in the world, and five tune may be by few hundred.

And the rest of the world is gonna use that the tune models and rag and like building on top of that and doesn't limit the creativity right, is still actually a good thing for the world. But I think, like someone need to say, like, like and nothing and nothing we don't need to delete, you know, retrain at ten times like that, the same thing again and again. And things build done in more.

Imagine going back in time to the ninety eighties. And I I think what hinton was in toronto even back then, right? I'm not sure when he moved over that imagine saying to him that all of this stuff you're working on um he would have been kind of entries that we're having the opposite conversation. We're talking about graduate students to send inevitably on deep new on networks. And you imagine telling him me is gonna in the nobel prize .

that yeah but if you tell him were gonna win the number and physics he probably would be zzz yeah but yeah is an interesting like he was always a storm delivers for as they knowing what he built like IT was evident from his whole career.

But there's a scale of IT I feel like I will impress him, right? Like it's just and wear and I think he had I mean, like the last time I talked to him uh, on one of the flight from toronto to uh and Francisco, he was had very serious concern right now where he can goes with these ideas because IT seems like those models are improving at a very rapid pace, and we need to also be very careful about where they can bring us. So I had a very valid concern about how powerful IT is, right? So it's a little bit like this story.

And open hyper was like a nuclear and bomb right? And invention and then not be able to control IT right. The again, with the area down right now is in the hands of the world, throw, you contribute, but you have no control.

You can say that the concern, but you need to convince others. You don't have a control of technology. Gy and IT would never become what IT is if you don't make IT available to the people.

But I think that has to happen one way another, like is just, I did what my happens, happened later. But IT has to be invented. This was asking for each, like, luck.

We need to learn. Like we were looking for a way to learn differently, right? And that was imperative to have something like that.

Yeah, it's interesting. The points about hinton s fears in a way, no one disagrees. I think everyone agrees that this is transformative.

Technology in the world will never be the same again. I think people on the left think that IT, it's kind of changing. You know, that might be it's it's reducing our our agency or it's kind of dramatically changing society. And of course, some people think that I might recursively self improve and become super intelligence of or something like that. But everyone seems to agree that this is A A very dramatically transformative technology.

Yeah, it's hard to talk about that. It's like it's like for a long time, I like to believe if just high, is that big mining or is a real look like everything? As a scientist, I know this is a real substance behind IT IT as hype is always on top of IT, but there is a real substance.

And now, yes, I grew you that it's really transformative. We are never gonna the same, right? You can just forget about what we learn.

We're able to do with this technology. Tell IT to all the students they use ChatGPT to pass the exams that that doesn't exist anymore. They need to get keep memorized and things, things that like they not gna go back, right? That that's forget about IT.

So essentially the technology is there. We just need to understand how to Operate. Was IT right? I I don't think that would be fair to say that humanity humanity never had anything disrupt that.

We had industrial revolution before, right? And people were worried about the machines also like those uh, pictures from eighties where there is all is a high school teachers demonstrated in right the White house with calculators and they argue that they uh that they shouldn't stall. They shouldn't allow calculators and high schools because kids would starve us, stop, know how to learn and count.

And they are not going to be as intelligent as they used to be. That was fact that the reality that IT was like a perception that that's a danger. And right now we smiled, but like the only way we just adopt IT, like and we know how to use IT, like when they teach kids right now that calculators use at the proper time, at the proper age, right? But we don't try to outcompete the calculators.

So I think that would the situation for a long time with a ee as well. I don't think personally that we are that close to super intelligence yet, right? Yes, it's exciting technology to play with, but there is a lot of missing gaps.

It's hard to say how long is until we get any form of real intelligence from IT, but it's already very capable and saying that that's gonna a gradual process. I don't feel like we're gonna wake up one morning is gona be terminated scenario and everything is captured and we out of a sudden that, uh, intelligence get outside of the control. You have to build this intelligent process like we are, uh, in the early unions, in my opinion, of building IT.

It's interesting how our perceptions changed over time because there was a similar thing. We have chess, I think in in the late one hundred and seventy, IT was thought that if we could build a chess computer that could be, you know, the best human, then that thing would be generally intelligent. And of course, with experience, we've just changed the way we think about things.

But just final point on this, you know, like the the physics thing, because you can think of deep learning as being a form of physics and know what is physics. It's modeling the world that we live in. But no, chrome sky said that he he thought that these language models are not a theory of linguistics because they don't.

You know, theory should explain IT, should carve the world up by the joint and actually tell you something about what it's not, as well as what IT is. So what what do you think about this as as as a form of science? Did do you think it's legitimate?

It's legitimate. And in general, I like when the words like that are given to into disciplinary, uh, stuff in general, that multiple different things, taxi physics, biology, computer science and these particularly examples because I think those are where the biggest breakthrough happens, smart in one narrow field, but actually across different areas, IT requires much bigger vision.

I am really excited about the whole field that that get a recognition award is an award that is a definitely, you know, credit comes with this. But what I think is this is definitely in a fact that we're all excited about this technology right now. And people want to recognize IT because it's so transformative, right? And again, people are inventing if your elements of, say, biology and physics deserves IT as much as the people that inventing their methodologies or measuring tools, or even model and simulation tools they like as important.

For example, when we think about the brain is important to study the brain from the physical perspective, but the only that much we can do, but so is modeling, because the modeling allow us to do things we never can do in the physical world. So it's important, especially if it's proven to be so disruptive, right? It's not becomes just like a tiny experiment, is actually change the world in a way that is hard to imagine before. Like you ask someone ten years ago, how disruptive year I would be know if I would be very hard to imagine, right?

Do you think that s say in video, for example, do you do you think that their stock Prices is gonna continue to go up? Do you think that are compute demands and G, P, U demands are just gonna continue to explode? Or do or do you think that will level?

A great question. So well, the short on the answer, like for example, I have some stock myself and I still do on NVIDIA talks, so I still believe that they will grow. I think that's the more some sense.

So like, I can come up with any explanation, but that's a reality. What i'm doing myself, how fight can go with a hard question, right? Like but I I do think they deserve to be where they are right now because they believe in this technology early in us.

So they got lucky away that like the cheap they had at the time was IT ended up being a good feet. But a lot of people lose those opportunities and they didn't. They understand and capitalised and do a huge investment into software ker system beyond just building ship, which is a great vision by the company and the founders on like what's really needed.

And I think they are greater benefits of that. How far they can go to depend on look like a lot of that things in where the stock easy depends on people expectation. Sometimes they show phenomenal numbers and people like, oh my god, they didn't grow by now the interact.

So they like not doing well. It's like ridiculous. Like they they revenue and everything was growing by huge numbers, and the stock next day can go down.

I look at this is like that ridiculous. There is like a lot of that gambling that goes into stock chairs. But I think as the company, they had a chance to become uh like like continue being a transformative company in the space.

But I also think they are not gonna be the only hardware providers moving forward. There will be other people with the hardware we see right now. And because of that, the world can utilized all of them. But I would expect for the next safely to say, I would say three years, right?

I can see that anyone would, uh, by person video at the top right, I think they are gonna be uh dominant player in the space for the years to come and definitely for the next few years. I don't see anyone gonna easily top them, but I don't see this arming. I don't know what that means for the shares and expectation the stock, but I think they're gonna. Be there one of the top companies for the for some time now for a good reason.

Gani, i've thoroughly enjoyed this conversation. Thank you so much. Focus home. If you want to work for canada, please reach out to send ml. I am sure you're hiring, hiring good people.

growing, always hiring. So please reach.

Thank you so much. It's been a pleasure.

And for me as well, thank you.

Why Your GPUs Only Run at 10%! - CentML CEO Explains 02:08:40 Share

Machine Learning Street Talk (MLST)

Deep Dive

Shownotes Transcript

Why Your GPUs Only Run at 10%! - CentML CEO Explains