We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

John Palazza - Vice President of Global Sales @ CentML ( sponsored)

2025/3/10

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript

People

John Palazza

Topics

John Palazza: 我认为企业要想有效利用人工智能，需要自上而下的统一方法。高层领导需要制定愿景并负责，才能创造一种文化，让机器学习渗透到业务的各个方面。从创新到生产再到规模化，企业需要改变在创新阶段的松散状态。关注GPU利用率和成本控制至关重要，避免因计算能力不足而导致破产。 CentML致力于帮助企业优化基础设施，使其能够更有效地启动并使用机器学习和生成式AI，解决基础设施带来的挑战。我们的创立源于对高效利用计算资源的需求，尤其是在大型语言模型和人工智能领域。企业开始更加重视计算资源的实用性和效率，而不是一味追求规模。企业需要从理论构建转向实际构建，以实现机器学习的最佳财务和技术利用。许多现有的计算资源利用率不足，提高GPU利用率和效率是解决计算资源短缺的关键一步。CentML的目标是帮助企业尽可能高效地利用现有计算资源，以应对不断增长的计算需求。尽管企业倾向于平台化，但个人仍然需要参与决策，这在实践中很常见。企业中缺乏团队协作和集中化，导致资源浪费和效率低下。大型企业需要自上而下的方法来更有效地利用人工智能。自上而下的方法，由高层领导制定愿景，才能促进机器学习在企业中的有效应用。我们与企业组织的各个层面进行沟通，了解其在生成式AI和大型语言模型方面的战略。企业对大型语言模型和生成式AI的应用需求日益增长，但其采用路径和阶段各不相同。许多首席技术官尚未充分关注计算资源的利用率和效率问题。从创新到生产再到规模化，企业需要关注成本控制，避免因计算能力不足而导致破产。提高效率能够释放资源，从而推动创新和增长。我认为在不久的将来，最棒的机会将来自那些开放权重模型。企业更倾向于使用开放权重模型，因为它更灵活，更容易扩展。随着Llama模型的进步，越来越多的企业开始关注开放权重模型。企业不应该满足于“足够好”的解决方案，而应该追求更具突破性的创新。与NVIDIA和Deloitte等合作伙伴的合作，为CentML提供了市场洞察和技术支持。 CentML并不担心云服务提供商会构建类似的产品，因为CentML与他们建立了良好的合作关系。CentML与云服务提供商合作，提高客户满意度，并为客户提供更高效的解决方案。 CentML目前没有计划自建云平台，而是专注于与现有云服务提供商合作。CentML致力于为客户提供灵活性和自由度，让他们能够轻松切换模型和基础设施。 CentML对AI领域的众多初创公司感到兴奋，并认为这是一个充满活力和机遇的领域。

Deep Dive

Shownotes Transcript

Translations:

中文

There's always a pendulum swinging between sort of like decentralization and centralization. But then there's the question of who owns AI in an organization? Because if we're going to leverage AI more efficiently, we need to have a joined up approach, surely. An executive with a vision that has ownership from the top level creates a culture that allows the adoption of machine learning to permeate through each facet of their business. And I think it's been proven that the companies that are doing that are more successful for both the adoption as well as kind of the future vision of where those are going.

Even though there's a lot of excitement, I think there's a lot of individual excitement, not necessarily that next phase of collaboration. But this transition from innovation to production to scale is when that looseness that happens in innovation needs to start changing. And all of a sudden they're saying, you know what? It's great that we're running this solution on an H1, H100 that we've been testing, but now we're gonna roll it out into 45,000 users and we're concerned that this capability is gonna bankrupt us.

As Lama has progressed, so many of the conversations I have, customers are predominantly looking at the more open weights that are being available to them. John, welcome to MLST. It's great to have you here. Pleasure to be here, Tim. Thank you so much. Tell me about your background. Wow. So I started a small farm. No, I'm just kidding. So I was originally started out as an officer in the military. So I went to the Naval Academy in the US. So

I was an officer in the Navy. I started exploring early fundamentals of machine learning and aspects around technology right then. So, kind of ended up selling software. I kind of went through a startup cycle for a long period of time. Was the first salesperson in about four or five startups. Really enjoyed kind of building that product market fit, understanding where

and how new technology was being used when I transitioned from military to civilian sector. And then for the last 10 years or so, I've been working through the challenges around machine learning and artificial intelligence. First with a startup that I advised for called GraphLab, which was acquired by Apple, which was very exciting. And then early on in the MLOps space with a company called Algorithmia and another company called Converge, where we were dealing with

I think the challenges and adoptions of getting started with machine learning, and that's kind of what drove me to Sentinel where we're going through such a similar narrative now when it comes to generative AI and large language models and understanding how people use them and why they do. And the excitement that was being built here at Sentinel was very attractive to me, and that's where we are today.

Amazing. Well, CentML, so it was founded in the University of Toronto in 2022, if I understand correctly. That is correct. What is it? I mean, what are your products? What do you guys do? We are really the first platform to look at optimization across multiple levels. Optimization is such a cliched term, so I apologize to use it. Let's look at it a better way. The hardest thing to do sometimes is actually get started with machine learning or with, in this case, generative AI and large language models.

Sentinel realizes that some of those burdens or challenges are with the infrastructure itself.

How is it to use so much compute, such heavy and large amounts of that environment? And we said, what if there was a better way? What if there was a way that we could, one, meet the customer where they are in their journey, and we can kind of go through what that means? And also, once they are at that journey or thinking about it, how do we make sure that they're using the right compute at the right time for the right workloads to help people optimize and adopt large language models? And at the core, that's really where Sentinel is and what it is.

And I always like to say academics sometimes have the biggest challenges to face. And so with the people for Sentinel at University of Toronto, some of their biggest challenges was how do we actually use our grant money efficiently to actually test all of this machine learning? And in that case, it was we couldn't just use the biggest and the best hardware. We had to use the most effective hardware.

And they always say necessity is the mother of invention. In this case, necessity of the right compute to generate these large language models and to work with large workloads for artificial intelligence is what led to the foundation of Sentinel.

So do you think we're starting to move towards more pragmatism with compute? You know, there was this kind of modus operandi of let's just build bigger and bigger models and there is no concern whatsoever to efficiency and the environment and so on. Even more of a problem when we deploy into enterprise because of course there's heterogeneous compute environments, there's lots of different applications being used by different people on different cost centers. So it's really becoming an issue now.

Yeah, no, I completely agree with you on that. And I think there's always this desire and I would say almost fear of missing out where they want their largest infrastructure we can get. Let's get bigger. Let's get faster. Let's get larger.

But then all of a sudden it looks about how usable is this, right? So I always think it's wonderful to have the biggest, the baddest, and the largest that you can have on anything. But when it's not usable and it actually starts becoming a burden and a hindrance, I kind of feel like that's the transitional moment where we are now. People are saying, how do we go from a theoretical build of machine learning to a practical build that's both financial and technically utilized the best? And that's, again, where...

The conversations we're having on a daily basis at Sentinel focus on, right? It's kind of this elastic band snapback a little bit where people are saying, sure, there's opportunities for it, but I see that there's a shift coming. How do we take advantage of that shift? How do we start using things in the right way, in the most efficient way? So demands for computer increasing significantly. I mean, I read the survey the other day. Apparently by 2026, we're going to need something like 33% more GPUs. There's a GPU shortage.

And just generally speaking, the demand for compute is increasing. The applications are increasing. I mean, soon people will be generating real-time virtual games, virtual worlds on their VR headsets and so on. So what's your perspective on those kind of industry trends? Well, first, it's an exciting time, right? I would love to see some of those things come to fruition. I benefit from the use of AI on a daily basis already. So the more and more that I get to see of it, the more exciting I get.

I would say, though, that it's interesting. You know, this shortage of compute is not changing, right? You know, I always think back to some of the stories going back to even the storage days, right? And, you know, I was acquired many moons ago by EMC, which is a pretty large storage company. And the big concern of EMC at that time was they were going to go away. Storage was going to go away. Everything was going to be 100% in the cloud.

And then I remember two or three years after I left, I met with a VP over there and they were saying, I asked casually like how things ended up progressing. And he said, it's been our best year in history. And I'm like, but that's impossible. All this cloud spend has gone up. He's like, yeah, you know what? So has the storage spend. He's like, things just keep going. It's the law, right? And I think that's true with compute. So I do think that the demand isn't going to decrease.

But I think that one of the things that we can see, and one of the things that not only do we see that we need to demand, is better efficiency. So much of the compute that we have available today is not 100% utilized or utilized correctly.

So we're seeing people consistently using 30%, 40% of an available GPU and then moving on. And I believe there's so much value in that space. So to be able to get 80%, 100% utilization and efficiency on a GPU, that to me is step one on solving some of this compute crisis that we're dealing with.

And that's also, again, kind of foundational to where we feel Centum L fits and the space in itself is there. There will always be a need for more compute. Our goal is to just help people use the compute they have as efficiently as possible as that demand constantly ramps up as it needs to be.

Yeah, there's a few things here. I mean, first of all, GPUs are more expensive than CPUs. So even now on the cloud, you know, the CPU utilization isn't that great. It's better if you use PaaS services, of course, but there are still plenty of people using IaaS services. The other thing is GPUs are not yet virtualized. I think there are some startups doing this and maybe the cloud vendors will start to do something like this. But at the moment, it's a little bit like the 1990s where we're sort of talking directly to the metal.

And the other thing is we're moving towards this platformification of data science, which is that five years ago, individual contributors were kind of building models and tinkering around and putting stuff into production. And now we need to have economies of scale simply because we can't afford to have all of these random GPU machines burning money everywhere. How do you see that?

that? It's such an interesting question because I feel that meeting with so many customers all the time, and I think I've probably done close to 5,000 customer engagements in AI and ML over 10 years, which is pretty hefty when you think about the conversations that we're having out there. But I think that even though everyone wants to move closer and closer towards platforms, the ability for individuals to still make direct decisions is very practical and very happening, right? Whether we want it to be or not. And I think sometimes when you give

too many keys to the kingdom to too many people. You have situations where not always the right computers used for the right reasons, not necessarily the right models. There are orphan models. I always like to say that I had had a meeting, this was probably about five years ago,

with a very, very large financial services company, and we were discussing kind of the state of machine learning. In that meeting, there were about eight different CIOs in the same organization, but they all had a CIO title. And we were discussing kind of use cases for machine learning. And one of the things I'd used as a really, really just basic example was sentiment analysis, right? Such an easy use case, especially to understand customer feedback.

All of them were working on their own projects for sentiment analysis. There was no collaboration among teams. There was no centralization. There was a series of different snowflakes. And I think right now, even though there's a lot of excitement, I think there's a lot of individual excitement, not necessarily that next phase of collaboration.

unification at large enterprise customers. So I think that we're still a little bit away from that. And I think that's part of what we're seeing, right? We're seeing still, even though the platforms are where we're going, that adoption needs to accelerate. We need to see people start looking at it from more of a

a unification or a best in breed or actually leveraging those workloads instead of individuals working as a silo. I think those things should change. And I think it hasn't happened yet, but we are moving in that direction. Yeah. It's so interesting contrasting the two approaches because there's something to be said for the Amazon culture where they have these two pizza teams, they have autonomy and it's great because you can have like a flywheel and people doing things.

But then you kind of get a lot of people duplicating the same effort. Also, with language models, the cool thing is that sentiment is so easy. You just ask the language model, what's the sentiment of this thing? And you can just describe it using natural language. And it's not really like there's much of a technical lift. So you don't need to sort of reuse something that someone's done before.

But then there's the question of, well, who owns AI in an organization? Because you know, before, for a while, we had these chief data officers and chief AI officers and so on. And there's always a pendulum swinging between sort of like decentralization and centralization. That's right. But I genuinely believe that we need to have a top-down approach in these large enterprises. If we're going to leverage AI more efficiently, we need to have a joined-up approach, surely. I agree with you completely. I think...

Companies that have adopted machine learning the most effectively are companies that start from the very top as a cultural decision, as a unification or company decision, and then kind of an implementation that trickles down from that top-down approach. And I think, you know, you always like to think you start from the top, but on a lot of levels, you start with the people that are engaged in the moment, working with them, helping build credibility of the technology.

But within AI and the wider space, I really do feel that an executive with a vision that has ownership from the top level creates a culture that allows the adoption of machine learning to permeate through each facet of their business. And that's really where I think it needs to go. And I think it's been proven that the companies that are doing that are more successful for both the adoption as well as kind of the future vision of where those are going.

And to your point, it's a lot easier to have a decision on some of the more complex models when it comes from a vision of a company saying that we're adopting or enabling or making this part of our standard and practice. And I think that's what really pushed the boundaries of where what we can be successful with.

What kind of conversations do you have with your customers? I mean, for example, when you're talking with prospective customers, who are you talking to? Are you talking at all levels of the organization or do you tend to focus on a particular role? Yeah, I know that's a great question. I mean, we're speaking to all areas within the organization, right? In ideal setup, we are speaking to the senior executives and senior leadership within an organization and explaining and understanding what their strategy is towards generative AI, large language models, right?

Are they building their own? Are they leveraging open source? Are they leveraging closed models? Are they incorporating a structure where they're consuming or are they building? So every customer's journey is a little bit different. I always like to say at every startup, the perfect world is that there is a director or a VP who's in charge of your product.

Right. Like the VP of Sentinel at any company would be a very easy person to call upon for the conversation. Unfortunately, that hasn't quite existed yet. So we do tend to have our conversations with people that are in the generative AI space or executive leadership. But some of our best sponsors have been people that have been dealing with the problem.

You know, and that problem may be integrating a large language model into an application and actually leveraging how to use it at scale. And those people may just be a senior ML engineer. And having a conversation with them and showing them what we can build and how it can be a little bit easier for them, that satisfaction is fantastic. And it also resonates because it's a technical solution. And so it's definitely across the organization, but there's value in conversation points for each.

How do you see this working out? I mean, to give you an example, you know, so Oscar Wilde said that fashion was a form of ugliness, which was so intolerable that we have to change it every six months. And data platform architecture is very much like that. So, you know, we had...

the monolithic data lake and then we had like, you know, the slow path and the fast path and then we've got the mesh lakes and, you know, the lake house and all this kind of stuff. And, you know, these are very expensive projects and they tend to slowly get buy-in from various different parts of the business. So, you know, you've got the finance guys over here and you've got the retail guys over there. And in a way, it's good to have a joined up approach, but it also means that there are so many stakeholders you need to get on board.

A lot of people say, well, you can't use my data for this thing. You need to pass me a token. You can only use my data when someone is actually authenticated or whatever. So you just get this kind of explosion of complexity. How do you kind of bite that off? Sometimes what's hot or what's now is even less than six months in technology or in AI, right? And so...

a solution that could be of the moment when it's funded is no longer. I joke, one of my first startups,

was a company called BackWeb and we had developed something called Polite Push technology. And I'm dating myself here, but the big competitor at that moment was a product called Pointcast. And at one point it was the largest user on the internet. It was bigger than Yahoo or bigger than AOL, whatever it was at that moment. And it was great until corporate networks realized it was unbelievably invasive and then it became banned. And all this different technology was there. So it was hot one moment, was not hot the next. And

you know, we went public and that technology ended up not necessarily landing as well, you know, sitting as well two years later. Now with AI and machine learning, I think one of the things that we're encountering is that there's a need that's been shifting over time, right? I think the universal concept of adoption, integration, and

leveraging into production is still there. I mean, I think some companies are achieving it. Some companies are looking to achieve it and some are just getting started on their journey. So there's still a lot of companies with different evolutionary traits. I also think the approach is what's changing. If you look at

Seven years ago, eight years ago, a lot of companies, again, were working independently, trying to work on it. You found companies that would start their projects living on the cloud and then move from a cloud to on-prem and then to a hybrid cloud and to a hybrid on-prem. And there was all these different levels of adoption and different approaches, all of which had their own merit and some of which had their own challenges. I think today what we're seeing is that there's a universal need or understanding that large language models and generative AI

have a tremendous impact for both business and socially, and it's going to happen. The question now, or what we're dealing with, is how our customers and how our companies in the enterprise space see the best path forward. So a lot of companies start and they say, you know, we just want to put our toe in the water. We want to start with generative AI, maybe serving it up as an endpoint. So is there a way that we could try a LAMA model as an endpoint? However, you know, that journey doesn't begin and end there. So when a company starts there, we've having conversations with other companies who are like,

That's wonderful, but what we really want is to bring some of this in-house. We want to run this on our AWS environment or GCP. We have some on-prem GPUs that we want to leverage. Our biggest struggle is around the inference capabilities of it, where inference has been growing. Can you help us solve that? And the answer is, yeah, we can. We focus on the ability of optimizing inference and building that process. And then we get to the next phase. We've met with some companies who want to build and train their own large language model.

right? So completely move away from leveraging the existing architectures and building something that new. And I think to your point, I think where people are on that journey is what's shifting every six months. Yeah. I mean, the opportunity is certainly huge. I was reading the Bain report a couple of days ago, and they were saying that there are these huge efficiencies that AI could bring to businesses, you know, in software development. I think they estimated it to be 15% or 30% if you do it correctly. And

you know, across sales, for example, it could be about 30% call centers, about 25% just off the top of my head. So the opportunity is huge, but there is reticence. And I think it's a little bit like, you know, with the cloud as it was about 10 years ago, no one wanted to be first.

And I guess that you've seen so many customers that you must see patterns. You must see, well, there's a certain type of culture. There's a certain type of organization that is adopting AI better. You know, Google, for example, they have done lots of centralized optimization. They've got a centralized mono repo. They've got a centralized build system. They've got all of these economies of scale because they really managed to work together and figure the thing out. And I'm not yet seeing that in a lot of other businesses.

Does that imply that we need to have amazing software engineers or do you think that there will be some increasing level of abstraction which will allow us to build AI applications more easily? Yeah, I mean, I think that's a...

a great question. I also think it's a trend that can't be dismissed. I think companies that have great engineers and great software engineers are pushing the boundaries of what's available and they're setting an expectation of what's capable. And I think that's amazing. But I think the majority of companies

It's very difficult to build an engineering team that large with that much skill set, right? I remember going on a sales call with a startup that was very, very successful. It was fantastic technology. We met with Salesforce and we had that conversation.

Salesforce was like, wow, this is an unbelievable product. We're building something similarly in-house. And I remember like any good salesperson says, like, absolutely. But we have a team of dedicated engineers that are building just this and we're specialized on it. And we have a team of 25 engineers. And they're like, that's awesome. So we have 380 engineers.

that are on our team at Salesforce. And I was like, okay, so my 25 specialized engineers, a little less, right, than what you have at your capability. But I think what you'll see is those –

unique snowflakes that are working at the engineering level that can develop and work in these larger teams and these large organizations, they will always exist at those organizations. And the work that comes out of those can actually be influencing what you can benefit from at a smaller company that has a team of 12 engineers.

Because some of those functionalities can become productized. You know, if you look at the ML Ops space at Uber, they created a solution called Michelangelo, right? And it was very obvious that that was their ML Ops solution. It didn't stop, you know, customers to benefit from using algorithmia or Converge.

which were two MLOps platforms that I had led sales for. The reason why was the reason was great. People needed it, but they didn't have the infrastructure to build their own, but rather they wanted to benefit from solving those same problems. And we're dealing with that now in the optimization space, right? You know, and if you go to a larger company or larger enterprise where they have 300 engineers doing that every day and tuning it and working on it, absolutely. But most companies don't have that. And utilizing an open source product where,

still takes a huge amount, is not always the right answer for everybody. So solutions such as Sentinel and other products in different spaces kind of help fill in those gaps when a large centralized engineering team that you have at Google, you don't have at a local big box retailer.

Yeah, I remember Michelangelo from Uber. That was wonderful. I know some of the people who used to work there actually. But it was really inspirational for a lot of folks in large enterprise to start thinking about platformification. Agreed. Building data products, creating this kind of information architecture, right? Because you were saying that, yeah, there's all of this complexity to deal with.

But the smart people can create a schema. They can create ways of doing things, you know, data connectors and standard forms of processing and whatnot. And of course, these can be templatized and reused downstream. I'm also seeing a lot of innovation around the user experience because, you know, Gen AI is so new that it took a long time to move from the chat interface.

And now we seem to be progressing to the Canvas interface. So ChatGPT, of course, they've got this Canvas interface and Salesforce have just released a similar thing today. And I think over time, we're going to see this evolution of interface design with respect to

generative AI and that might be the secret to kind of creating that low code, no code, you know, accessibility in the enterprise. I also think taking it to next step, right, is what I think will also help drive adoption, right? So many companies

they want to benefit from it and they want to use it, but they're not 100% sure what the right path to use it is either, right? So just some basic questions, like whether you're using RAG or fine-tuning, whether you integrate it into something or you create a different front end, right? Like, do you take it to an agent level and deliver it to there, which I think is

really where everything will end up going very shortly. And I think those types of innovations and changes is what drives adoption and allows companies to get more pervasive in the use of it.

And to your point, I agree that the multiple options to engage multiple formats and that evolution keeps pushing the adoption and making it a little bit faster, a little bit easier, and hopefully a little bit more pervasive, because I'd love to see every company being able to benefit from it. So you use the magic agent word, which is one of my favorite words. It's so exciting when we think about this levels of abstraction. Of course, you folks have done incredible work.

optimizing the, you know, the compilers, the kernel, all of the hardware and so on. And then we're just going to build on top of that foundation with things like agents. And I think that might itself become a new user experience concept because right now on chat GPT or whatever, you do a search and what you actually want to do is do a kind of hierarchical search.

So you want to say, I've got these five PDFs and I want you to go away and do some research and summarize each of them. And all of the respective results should be injected into my thing here. So, you know, you're kind of doing this compositional processing because this is how we think. And currently, weirdly, it's not really possible to do that kind of thing. And I think there's a few reasons for that. But what do you see the future is with agents? How are we actually going to start leveraging that?

So, you know, we at Sentimel are actually working on multiple areas around extending our solution to engage with the agent level. And I think, you know, the first logical step would be around troubleshooting and engaging within a software model itself, right? It's such an easy way to, to your point, to diagnose and to deliver what would be about the next step of taking an action and enabling those actions to be run. The impact that you could have on troubleshooting, trouble tickets, working in a data center,

Amazing, right? And you can take those same use cases where we've worked on similar areas of taking that step towards the healthcare industry, working towards doctor information, information around prescriptions, and taking the logical steps through those processes or areas that we both engaged on at work. So much of what we're seeing with generative AI in the business place today is a step one or step two, where agents can take it to full usability, step three, step four, step five.

with an accelerated rate and actually start having higher business impact, right? So many of our use cases today that we deal with customers that see the most impactful results around chatbots specific. I just feel like so much more business impact

is available than chatbots. Not that they're not valuable, 'cause they are, but it's that next phase. And I think agents are really what's gonna be the focal point to deliver that value to the enterprise. - By the way, as an aside, I got Cursor, you know, like the Gen AI code program to generate me an app that I now use for interviews. It's called Interview Notes.

It's got a timer and I've got a good bit, bad bit reference and a text box. So when I type something in, it will generate an SRT captions file. And then my editor can overlay that on the recording and they'll just know where everything is. That's awesome. And it's so cool with Gen AI because you get an idea come into your mind and you're like, oh, you know. Yep.

And before you just wouldn't have been bothered to do it. But now there's so much innovation because you're like, oh, I could try this. Oh, I could try that. And it just takes 30 seconds. So you just do it. It's awesome. It really is. Okay, cool. So CentML, you guys have been cooking some very interesting stuff. Can you tell me about your core product stack? So we have C-Serve, which...

which is really a trailblazer in the LLM ops space, allowing you to very easily integrate and serve, pre-configure, and even get insight into how models will run on specific hardware, giving ideas around cost, optimization, and efficiency. And then we also have another platform, which is more called C-Train, focused specifically on the training aspects within each of those areas. We then have a

Underneath that, a product called C-Cluster, all of which that fits again into what makes our platform solution unique. And that incorporates our components around compiler, network capabilities, and really providing yet another level of optimization and enhancement to that experience. Then at the very bottom, our platform is what enables us to provide our serverless endpoints

on a variety of our LAMA models that have a highly efficient and optimized endpoint where customers can engage on an API level and get their first taste or even scale out to production level of hosted large language models. Let's talk about the cluster management first. Sure. That's pretty cool, right? So I remember, I mean, I used to use Databricks, for example, and that was quite similar in the sense that it was...

a standalone startup and it became the incumbent in many of the cloud providers like GCP and Azure and so on. The weird thing is, because we can talk about moats as well here, Databricks were just moving so quickly that no one could out-innovate them and everyone loved using them and so on. And in a way, I see a similar thing with what you guys are doing. So you just have so many smart people working really, really quickly, optimizing these things

And you've almost become a kind of incumbent in many of the clouds and people can just kind of like, you know, set up and just build stuff with Sentinel. But the cluster management is really interesting because it's an example of what we were saying that rather than going straight to the metal, you now have this self-healing, self-scaling cluster that just automatically does a whole bunch of optimization and runs jobs for you. And of course you can pause it, you know, when you don't want to pay for it, but it's just this increasing virtualization of compute.

Right. And, you know, to the early part of our conversations, so much of the GPU environment yet is not yet been virtualized, right? But what we believe is our approach allows

the right workloads to run at the right time in the right fashion. So it's very similar concept of the benefits that are received from it. And I think what's also pervasive about it is we can run that anywhere. So it can run on any of the major cloud providers and can run on premises as well. So regardless of where customers are running their workloads, they have the ability to run it

and actually see the difference between running it in their own environments and their own infrastructure. And I think that's very unique. It also allows us to run on a variety of different GPU types, both manufacturers and levels. With our platform, we actually have the ability to leverage and run some of those workloads that you would see running traditionally on an A100 or an H100.

can be augmented by your L4s or A10s or other environments. And we're actually giving that insight. And so to your point, I think that's very valuable. You know, we have a solution that's available on the Snowflake App Store, right? And what's unique to it is that we run on Snowpark container services, right? Which is something you don't see every day. So our functionality extends not just to

traditional cloud providers, but even some of these other customers. And our goal is to become that standard to allow efficient workloads to flow to where they can run most efficiently. Yeah, I love this abstraction of a job or a workload. You know, we were talking about having a new interface, you know, a new platform.

And when, you know, in the olden days, we would just say, OK, well, it has to run on an NVIDIA H100 or whatever. And now we've got this interface where we say, I want to have a job and just do what you sort of predict will be best for this job. The reason I'm saying this is I think that there's this notion of good enough performance.

with ML models, right? You know, so sometimes people are massively overestimating the hardware they need, maybe even the model that they need for a particular job. And having some kind of virtualization layer in the middle means that through experience, perhaps we can cleverly do some routing and mapping. We can, you know, run it on optimized hardware, maybe even swap out a model and share another model that someone else is using. And I really think we need to do that.

I think what prevents things from happening in that way, and I agree with you, by the way, but I think what prevents it is if it's not easy to do and it actually requires a decision and an effort and a movement, people are reticent to take it, right? What it has to be is seamless.

And so our approach is, you know, we don't want to have a customer have to rewrite our model 150 times to take advantage of every different flavor of GPU that's available. What we want them to do is have the ability to just have educated decisions that allows them to make that change. You know, if you give everybody the ability to run their workloads on an H100,

They're going to run it on that if they could. If they didn't care about costs and they could just run it on the fastest, it's like if I could drive a Ferrari to go get groceries every time and it fit on the front seat, I would do it. That's not always the most efficient way to do it. With Sentinel, the customer doesn't have to worry about the effort it would take to choose between a Tesla or a Ferrari based on gas consumption to get gas groceries 20 miles away.

can just automatically have that chosen for them based on how long they want to take right so if it's i want to deliver this model with this latency within this performance zone there's 16 different ways to do it and you can enable sentinel to decide the most cost efficient or the most you know financially capable based on a cost parameter with other platforms so it starts allowing you to make business decisions without having to choose the lesser of two evils on this and i think

Ultimately, that's what enables people to make the right decision. If it's seamless, it's painless, and it doesn't require a huge amount of change, then you can benefit from those infrastructure pieces. And to your point, there are unbelievable amount of models and infrastructure that aren't married correctly. There's a lot of people that should probably not be married together in those environments, but because they don't have a path to change them or an understanding of what would happen if they did run it,

and they didn't have the insight to it, they don't do it. And with Sentimel, we hope we can start having those workloads running on what makes the most sense, both financially and performance-wise, and given some of the higher capacity workloads, the necessity to run on those environments. Yeah, and there's a lot of money to be saved here as well. I mean, I read on your website that in some circumstances, I think you can save up to 60%. And if you think about it, I mean,

large enterprise, they must be spending an incredible amount of money on cloud compute. I think NVIDIA might say not enough.

There's no such thing as an insane amount of money. There are so many startups that are coming out today that are trying to be a large language model for insert generic use case there. I think some of them are the most creative ideas and unbelievably changing the way business will get done in these areas. But their single biggest cost is compute, period. So the startups are working on it and saying, you know, if there's a way that we can extend our runway by another six months,

because we can reduce our compute cost by 60%, so many GPUs are underutilized to begin with, right? So people aren't running at 80% or 90% utilization rates and through no fault of their own, just by ways that things were programmed, things were coded.

the way that individual developers work versus collaborative developers. Sentinel looks to, one, increase the efficiency and actual utilization rates, but then also increase the efficiency both on where those models run, how those models run. With a lot of our unique and patented data,

an approach, we're actually able to drive that consumption. And I always say a website is a dangerous place to put statistics because you can always try to prove it. And I actually see that our numbers are low based on what our actual customers are saying. So it's a multifold approach and it's been very, very impactful and very, very beneficial to both our enterprise customers and significantly impactful for the startups.

How much of this is in the consciousness of a typical CTO in large enterprise? I mean, I can imagine as the years roll on, we're talking about climate change because the compute costs will just continue to increase. But I have an intuition at the moment, I mean, certainly in my recent experience, that a lot of senior leaders aren't really thinking enough about utilization and efficiency. So I think that the CTOs and leaders that listen to your podcasts, they

They are thinking about the efficiency because I think you bring awareness to some areas that people aren't paying attention to. But unfortunately, I tend to agree with what you're saying. I think so many CTOs have yet to feel the bite of that problem. And I say this, you know, having been in startups for almost 25 years, it's

It's an interesting kind of adoption curve. And I think where we are in so much is innovation right now. So a lot of companies, when a CTO looks at it, they look at an innovation budget and say, I'm willing to spend X amount of money

to see where this goes, right? So it's almost like an experiment. But this transition from innovation to production to scale is when that, I would say, looseness that happens in innovation needs to start changing. And so the customers that we're dealing with and the customers that we're having our best conversations and best success with as a company are customers that were in that innovation stage and are moving into this production and scale. And all of a sudden they're saying, you know what?

It's great that we're running this solution on an H100 that we've been testing, but now we're going to roll it out into 45,000 users and we're concerned that this capability is going to bankrupt us and impact a huge amount of cost. And that's when it becomes real, right? It becomes real when it goes from innovation to production. And I think...

Good CTOs and CTOs that are forward leaning, they're thinking about that today. And that's why we're having so many great conversations. And the ones that haven't gotten to that point, I anticipate having those conversations with them in the coming months and years.

as they get closer and closer to that moment from innovation to production to scale. Interesting. Do you think when we do make this more efficient that it will lead to less compute being used overall? I mean, the reason I say that is I had a bit of a weird thought the other day that in this Bain report, they said that, you know, we're going to be spending like 25% less on call centers. That creates a bit of margin because it's like we were spending this amount of money before. We've now got some spare money in the budget.

And if it were me, I would be adding more features. That's right. Right. I've now got all of these LLMs and I've got Gen AI and we can write code quicker. And, you know, instead of just kind of doing what we were doing before with the call center management, I would be adding

adding in many more automations. So it's almost like it'll just keep blowing up, but at least we can get more done with the money. And that's exactly what I think is going to happen. And I think it's been throughout all innovations, it's played out over time. The efficiency is what scales excitement. It's what scales innovation and it scales the next generation of movement. So right now, people are stuck in the mud a little bit with the amount of costs that it takes to run it.

If you're able to drive higher efficiency and free up some of that capability, even free up some of that time, right? If you're able to free up some of the developer resources that are used

to kind of get to that innovation period and go to production and scale, free up those individuals, the developers, free up the creative content capabilities, free up the cost. That opens up the next stage for growth, the next stage for efficiency, the next stage to expand upon your value. And to me, that's been a consistent thread, right? So I always feel that when you're at a solution that can help drive innovation as it goes forward,

and you drive innovation by driving higher adoption and optimization or efficiency, you're providing value today, but what you really do is unlocking the value for this technology in the future.

And I think there's no more valuable technology in the future than generative AI and LLM today, right? I think this is the biggest area of potential impact that we have in the next three to five years. I'm sure there'll be more that comes out from there, but in the short horizon. And if we can free that up and accelerate it, I'm very proud to do that at Sentinel.

So, John, I want to get your thoughts on the open source kind of situation. I should say open weights. It's not technically open source, but folks like Meta, for example, they're giving away Lama. And in the large enterprise, they are reticent to send their data to the likes of OpenAI and Anthropic for obvious reasons. So they want to have control. They want to have pragmatism. But then there's the thing of, well,

You could argue that the capabilities are better on these proprietary models, but the gap seems to be shrinking. And in many ways, it's better to have the flexibility, right? Because I can do my agents and I can do all my optimization that you guys are talking about. So how do you see that playing out? I think it's an interesting narrative, right? I think when a few years back, my previous company was acquired, Converge, but

But we were working under understanding around what LLMs were, right? So, you know, MLOps platform starting to understand what we think will be the next direction, right, for our platform and our product. When we investigated LLMs at that time, so much of the focus was on building your own large language model.

It wasn't ChatGPT or OpenAI, it wasn't what was being thought about, meta wasn't there. People were really just thinking, we're going to have to build our own large language model. So the companies I engaged with at that time were trying to build it. Very time-consuming, very hefty lift. All of a sudden, this eureka moment comes out with ChatGPT, which I thought was just really going to be used for my kids' homework. And then it turns out that it's being used in corporate environments and it's pervasive and everyone's hearing about it. And then it kind of shifts and

And, you know, it's very, very much a conversation around Lama. And I think that that's actually the right direction to go. And I think it's the direction that most enterprise companies seem to be leaving because, to your point, one, I think a lot of companies that want to think about how to utilize their gender of AI solutions or large language model solutions the best way for their environment and their company thinks about it as a base and

And then you use that base to expand and kind of extend your reach. I feel like it aligns very well with the developer community and the community of these companies to innovate along the lines of something that's open. And I think that as Lama has progressed in both capabilities, accuracies, parity, and capabilities of what's available today with Anthropix and

chat GPTs, I think it allows people a little bit easier use. And I don't think that's going to change. I actually think that that's going to be the standard. And I think that the companies that are currently engaged today will utilize that as a base. And I would say from my conversations, so many of the conversations I have, customers are predominantly looking at the more open weights that are being available to them. And I think that will be consistent as it goes forward as well.

So interesting because I think we are starting to have more awareness of how to develop applications on language models. So a lot of people, they just go and use, you know, Claude Sonnet 3.5 or whatever. And to give it its due, I think that's a threshold for performance. You know, I've always been quite skeptical about LLMs, but I think Sonnet 3.5 is...

It's just good enough to do a lot of things because for me, like the difference in capabilities between the different models is their ability to deal with ambiguity. Yep.

So you can, you know, zero shot on Claude 3.5, do some very useful stuff and it works a lot of the time. Now, that's not to say that you couldn't take any smaller Lama model. You just need to do a bit more work, right? So you just need to do more prompt engineering. You need to give more examples. You know, the more specific it is and the more specifically you target it, the more reliable it is.

And that's just an architectural concern. There are so many reasons why it's better to actually control the whole thing inside your own organization, right? Because you can do all the optimizations you're talking about. You can do all sorts of interesting routing and agents and like layers and layers and layers. You can build this thing from the ground up. But it's just interesting that at some point there's such a thing as good enough. And I think we're very close to good enough already, if that makes sense.

You know, I always like to say the worst enemy to a startup sometimes is do nothing. And the second worst enemy is good enough, right? Because there's always some solution out there or some capability that's, ah, it's good enough. Do we really need to change it, right? Like I always like to say, you never want to be at a startup that makes a Ferrari go one mile an hour faster, right?

you want to be at a startup that makes it go 400 miles an hour faster or some order of magnitude of craziness. But to your point, I totally agree with you on that. And I think that

We're really at this moment now where the best opportunities in front of us, I feel, are going to come from those open-weight models. Now, you folks are partnered with Nvidia, if I understand correctly, and Deloitte. I mean, how does that affect your go-to-market strategy? So, you know, we're fortunate to have some wonderful investors in our company. And, you know, in addition to the companies like Deloitte or Nvidia,

who both use and invest in our company, which I think are fantastic. It allows us opportunities to work with them, to partner with them, to get engagement, as well as to get pulses of where directions are going. When you work with one of the larger service providers in the world,

You definitely get a feeling about what projects or areas of focus and attention are hot buttons, both for their customers and for the space. Working within their innovation labs and their teams has really given us insight into the thought patterns of what could happen within verticals or specific companies. And that's been an unbelievably wonderful partnership for us, as well as a wonderful opportunity for us to partner technically on some solutions that they're bringing to market. And I think that's been a very, very valuable one for us.

With NVIDIA, the opportunity to work with them, to understand, to get feedback on our solutions has been unbelievably innovative. I also think that they're invested in the success of the space, not just the success of SenseML, but the success of optimization, the success of adoption of utilization rates on their platforms, because they do see products like ours being such an additive value to their

stack into their solutions to their customers so that partnership has been wonderful right uh it wasn't just financially but rather technically and an overall support of both companies have been tremendous and accelerating both our development of a solution as well as accelerating our adoption with our customer base are you concerned at all about let's say azure or gcp or aws building something similar into their own platform it doesn't keep me up at night uh

And not because they're not capable, but because of our relationships with each one of those as well.

So we're a partner within GCP. We're available within their community. We're making it easier to work with GCP. Our platform makes it a very easy process and goal. And the one thing that I think permeates through each one of those vendors, and by the way, we're also working with AWS. We're available in the marketplace for both AWS and for GCP, is that every one of those companies, GCP, AWS, Azure, they care about their customers a lot.

And they care about their customer satisfaction. And they realize that, you know, adoption sometimes can be limited by a feeling, right? You know, if a customer feels that they're spending an unbelievable amount of money for the infrastructure that they're providing or that there's efficiencies that could be gained or improved,

utilization that could be increased. They don't see that if we can run a model more efficiently on their infrastructure, that that customer will spend less money. I think what they realize is that customer will be happier for the efficiency that's gained, and they'll find other more exciting and innovative use cases to leverage on their platform. So we find that companies that are using Sentinel have a higher satisfaction rate.

And I think that extends as well to the cloud providers that they're using and partnering with us on. - Yeah, I mean, in a way, that's one of the great things about the whole cloud provider thing. I mean, I used to work at Microsoft and there was an old story that back in the Bulma days,

If you were caught holding an iPhone, he would threaten to fire you or something like that. And then they embraced the cloud. And of course, they're just making money just on the hardware. So at some point, they started embracing open source. And they're like, yeah, I don't care if you run Linux. And they just became a much more open, cool company, actually. And that's reflected across the board. But by the same token, there are some negatives to these big behemoths, these hyperscalers that they control everything.

Right. And, you know, it's almost like you're an incumbent in their sovereign cloud. I mean, would you ever start your own cloud? I mean, how do you feel about that kind of relationship? I think there's always a desire to see where and how you can expand your footprint, how and where you can help customers better. Right. And if you see a gap, I think it would be remiss at any startup to not try to step into the gap to provide value if you can.

But I feel that the cloud space itself is such a dynamic space, right? You know, you had made the Oscar Wilde reference around fashion being every six months. I think cloud might be every three months because I think if you Google anything and who knows what sponsors this podcast may have or others,

It seems that there's a new AI-focused cloud coming out every three weeks, right? And I don't see a need for us to step into that area, but I do think that there's a need or a larger growth that can happen. I was actually listening to a podcast a couple of weeks ago and heard an advertisement on it, and it wasn't even a technically focused podcast. And they were mentioning the type of hardware that they had in their cloud.

And I was like, that is a very specific audience, right? You know, there were four people in my car and only one of them knew what those references were. And so I think that there's definitely growth that's going to happen anyway. And I would just say from my own conversations, you know, we've had multiple conversations with new

newer new cloud providers or unique specialized cloud providers that are looking at building out or expanding those scenarios. And I do think that's something that's relevant. Yeah, it's interesting because part of the lock-in on traditional cloud is you would need to rewrite most of your code to kind of port it over. I mean, of course, there are all of these like portability things, you know, but you know what it's like.

So what they do is they give you all of these free credits and they say, come and build on my cloud for free. And then they know that you're not going anywhere. But the thing is, though, it's

kind of like the same thing with LLMs. Let me just think out loud here a little bit, because there's this perception that these models, they're just tokens in and tokens out, and you can just hot swap them. And it's not really like that, is it? On so many things, people start down a path and they get stuck with a path. And it's funny, I don't know if it's maybe my mom being a hippie or how I was raised, but I don't like to work at companies that lock people in. And it

Every startup I've been at has been about freedom of movement. Our platform sees things that people will move, right? People can change models. People can change infrastructures. And they shouldn't be burdened by doing so, right? And Sentinel actually makes it easy to even change your model.

change the architecture of the model that you're using, going from one transformer to a different transformer model, going from AWS to GCP to going to on-prem shouldn't be something that is so scary. It should be something that makes sense and is usable because this is a

about the customer and it's about how the best way a customer can engage. And to that moment and that freedom of movement, that's something that's paramount to SentML and something that's built into it. And to your point, I think everyone should embrace that, right? I think

No one likes to be in a relationship where the person is trying to lock you in. You want to be in a relationship for best service and the best experience and the right way to feel. And I think anyone that goes a different approach is not necessarily the best one, right? And I was reflecting back on this thing we were talking about earlier that when you have more margin, you can do more. Because I was thinking that right now, transformer models are hideously inefficient.

It's insane, right? They've got this quadratic layer-wise time complexity. And that means that even if we want to do open source stuff, it's like the crumbs off the table. We wait for Meta or Cohere or something like that. We take one of their models, we do a bunch of fine tuning, and we start building the whole thing out. And of course, your technology allows us to do that much more efficiently. But possibly next year, we might see these state space models. We might see these min LSTMs. It might go back to RNNs and things

Because what we're losing a little bit is the alchemy that we had five years ago. We want to have, and we do have people out there just building stuff, but we want to have more people out there just kind of building stuff. Do you know what I mean? I think that's one of the things that keeps the soul of a startup, right? Is that you have to be aware that innovation doesn't stop just because you decided to build something today.

And I think the key to a successful startup is kind of our ability to have our own alchemy that's able to keep pace with the changing market because we're focused on that and we're looking at the same trends that you're mentioning now for the models that we're supporting and the way that we're optimizing. You know, even some regions have different approaches to models that are their favorite. Right.

And so as we sell in multiple regions globally, we have to be prepared for each one of those model types and challenges and infrastructures. And I think it's that flexibility as well as the soul of a startup that keeps Sentinel so relevant and fresh. And I think that all good startups have to have that DNA. Otherwise, we'd be kicking people out for using an iPhone. And we don't want to be that.

Final question, I mean, just kind of casting the gaze outwards a little bit. I mean, obviously, you're an incredibly exciting startup. I think you had, was it 27 million funding round in October 2023? And what do you think about the other startups in the space? Not necessarily in your direct space, but what excites you in the broader AI space at the moment? Yeah, I mean, I think it's a wonderfully exciting time where there's a lot of companies that are out and being funded and being created online.

around what is this emerging new space. And everyone's take is a little bit different around where they're solving. Some people are looking at enterprise customers like we are. Some people are looking at just people that are getting started, individual users, business to consumer. I think what we're seeing is that this space is so dynamic and some of the brightest minds in the world are coming at it to try to solve

how do we optimize, how do we engage, how do we drive adoption and enhancement for large language models and generative AI? And I think at any moment, at anything that I've, I wouldn't be surprised by any direction of a startup that could come out. And I think that's a great thing. And I also think that one of the unique traits is that our team has such a cohesive unit, right? Coming from University of Toronto,

We've been able to recruit some of the best and brightest minds from University of Toronto to join our team, people that have established working relationships. So it's like a company within a company. I think that's fantastic. But also in the nature of what they've done before, we've been able to grow at a very quick pace, a very large and diverse engineering team. And I think that is very exciting. But if you look at the other startups in the industry, they're kind of going down similar paths.

Right. And it's so exciting to see so many teams of developers coming together to build and develop and advance this cause. And I think we're at the beginning of what's going to be a very exciting and dynamic run. And I'm very excited to be part of it at Sentimel. Amazing. John, thank you so much, man. This is this has been really great. A pleasure. I love it. I look forward to it. Thank you, Tim. Yeah, absolutely love it.

John Palazza - Vice President of Global Sales @ CentML ( sponsored) 54:50 Share

Machine Learning Street Talk (MLST)

Deep Dive

Shownotes Transcript

John Palazza - Vice President of Global Sales @ CentML ( sponsored)