We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

846: Making Enterprise Data Ready for AI, with Anu Jain and Mahesh Kumar

2024/12/20

Super Data Science: ML & AI Podcast with Jon Krohn

AI Deep Dive AI Insights AI Chapters Transcript

People

Anuj Jain

Mahesh Kumar

Topics

Anuj Jain: Nexus Cognitive致力于通过简化数据和AI基础设施集成，加快AI驱动成果的实现。我们采用可组合的、与平台无关的架构和管理服务，加快数据驱动或AI驱动成果的交付。我们采用模块化、乐高积木式的方法构建数据架构，允许灵活地集成各种开源和闭源工具。我们提供灵活的解决方案，既可以帮助企业升级现有基础设施，也可以帮助企业快速构建新的AI工作负载。避免被特定云供应商锁定，关键在于解耦计算和存储，选择合适的计算层。数据计算成本是主要的成本驱动因素，而非数据存储。许多企业面临着大量的技术债务，阻碍了数据治理的实施。通过可组合架构和可观测性，可以实现数据治理的自动化。 Mahesh Kumar: Acceldata提供可信赖的高质量数据，支持各种AI模型（结构化和非结构化数据），并能有效地预防模型漂移。Acceldata通过数据可观测性平台，帮助企业快速识别并解决数据问题，从而确保AI模型的可靠性。即使是很小的数据错误也可能对AI模型产生重大影响，企业需要采用策略来预防这些错误，并利用数据可观测性工具来监控数据质量。数据治理应该与数据一起移动，而不是集中化管理。随着数据产品构建的去中心化，治理也需要去中心化，并通过数据管理平台实现自动化。

Deep Dive

Key Insights

What is Nexus Cognitive and how does it help enterprises with AI implementation?

Nexus Cognitive is a composable and agnostic organization that modernizes data and AI infrastructure, enabling enterprises to achieve AI-powered outcomes at speed, value, and scale. It uses a modular approach with its Nexus One control plane and managed service offerings to simplify integrations and deliver outcomes within days or weeks.

What is Acceldata and how does its data observability platform contribute to AI success?

Acceldata is a data observability platform that ensures enterprises provide trusted, high-quality data to AI models, whether structured or unstructured. It monitors data quality and characteristics throughout the pipeline, proactively preventing issues like model drift and ensuring data accuracy for AI predictions.

Why are small data errors significant in AI models, and how can enterprises prevent them?

Small data errors can lead to significant financial losses for enterprises, as they impact the accuracy of AI models used for critical decisions like loan approvals. Enterprises can prevent these errors by using data observability platforms that detect issues early in the data pipeline, ensuring high-quality data feeds into AI models.

What is the importance of infrastructure agnosticism in AI and data management?

Infrastructure agnosticism allows enterprises to avoid being locked into a single cloud vendor or super scaler, enabling flexibility in choosing compute and storage solutions. This is crucial as the cost driver in data management is shifting from storage to compute, and enterprises need the optionality to use multiple vendors for better outcomes.

What is data governance, and how is it evolving in the context of AI?

Data governance involves managing how data is used within an enterprise, ensuring security, privacy, and compliance with regulations. It is evolving from a centralized, committee-driven process to a more decentralized approach where governance is integrated with data observability, allowing rules and policies to be applied dynamically wherever data is being used.

Shownotes Transcript

Translations:

中文

This is episode number 846 with Anuj Jain and Mahesh Kumar.

Welcome back to the Super Data Science Podcast. I am your host, Jon Krohn. Today's episode features the highlights of a session I hosted on managing data to embrace an AI-first mindset for enterprises. And that session had not one but two guests from the C-suite of fast-growing venture capital-backed startups. Namely, those guests were Anuj Jain, who's CEO of Nexus Cognitive, and Mahesh Kumar, who's

CMO of a company called Acceldata. And he's an interesting CMO because he has an engineering background and he still writes code. Today's short episode should be interesting to folks looking to make AI implementations effective in large organizations that have lots of data. In today's episode, Anu and Mahesh detail how a tiny data error can lead to millions of dollars in losses for an enterprise. They have a specific example. They also talk about why data storage isn't a major cost driver anymore and what is.

And they fill me in on what the heck data governance actually is and why it matters.

Ready? Let's jump right into our conversation, which was recorded at the ScaleUp AI conference in New York a few weeks ago. That conference is hosted by Insight Partners, so you'll hear that gigantic venture capital firm mentioned in today's episode. You also may hear, well, you will hear the name Andrew, and that refers to Andrew Ng, who was someone that I interviewed earlier in the day at the conference. If you want to listen to that, go

the recording of that interview with that superstar Andrew Ng. That's in episode number 841. All right. That's everything. Let's go.

Welcome back to the second stage. We're here for a session on managing data to embrace an AI-first mindset for enterprises. My esteemed guests for this exciting session are Anuj Jain, immediately to my right. He's Nexus Cognitive CEO. And to his right is Mahesh Kumar, who's the Excel Data CMO.

Let's start off talking about Nexus Cognitive, Anu. Absolutely. It's InsightPartner's first services automation business in its portfolio. It modernizes data and AI infrastructure, allowing outcomes within days or weeks.

providing speed to value by simplifying what you described to me when we talked last week, simplifying the ball of yarn of integrations. Tell us a bit more about Nexus Cognitive. Well, first of all, you're a great spokesperson. You're hired. No, so look, at the end of the day, we are a composable and agnostic organization.

data, architecture, and ecosystem that integrates and automates the workflows that really help us drive data-driven outcomes or really AI-powered outcomes at speed, value, and scale. How do we do that? We do that through our Nexus One control plane and our managed service offering. Really helps us get to real flexible options where our clients needs are at.

So when you describe services automation, you're taking services that would traditionally, some kind of process. We were earlier in Andrew Ng's talk, I don't know if you saw his keynote. He was talking about how AI doesn't displace jobs, it displaces tasks.

And so if you're looking for opportunities to streamline your operations in some role, look at the different tasks that make up that role and try to identify which will be most easy to automate. And so that's what you're automating individual services that a human might have historically done. That's correct. All the hard work of integrating all the pieces of the infrastructure, all the way through the data, the integration, all the way through the outcome.

Nice. All right. So let's move on now to get an introduction to Excel data from Mahesh. So tech-like.

Literally, Excel data is used within the Nexus platform. And so there's kind of a bit of a bridge there. But Excel data also stands alone as a data observability platform for enterprises. Tell us a bit more about Excel data and the role of data observability in AI success, such as proactively preventing model drift, for example. Sure. Pleasure to be here with both of you.

If you look at today's discussion, a lot of it was about the applications and the importance of building good AI applications. What powers that? It's the data, right? And Excel data, what it does is we allow you to provide very trusted, high-quality data to all your AI models, whether it's structured data or unstructured data, we kind of manage both of them.

Let me illustrate with an example. One of our customers, they are a data provider, provide business data to others, and they get data from over 130 different countries, over 100 data points. All of that has to come together, goes through about 30, 40 different transformation and steps. Eventually, it's consumed by hundreds of thousands, millions of other businesses, even government entities.

So the ability for them to provide that trusted data, including the AI models that give business risk, financial risk, and other kinds of information about their business becomes super critical. Before AccelData, if they had a problem, it would take them weeks to find out where the root cause was. With us, it takes them hours. So you can imagine how their business kind of completely transformed with us. So we observe the quality of data,

and other various other characteristics of the data throughout the pipeline from the landing zone to the consumption point and allow you to then manage that in a very proactive manner so you provide trust to all your AI initiatives. Nice, makes a huge amount of sense. And it also makes a lot of sense why that data observability piece would be such a key component in something like the Nexus Cognitive solution. So

You talked to me last week, Anu, about how Nexus has this modularity, how it has a building block approach where you can say, okay, a solution like Excel data, other modules in the platform, it's more like by working with Nexus Cognitive,

It's like buying a car as opposed to buying individual car parts and trying to integrate all those together yourself. Tell us a bit more about that modularity, Lego building block. Yeah, absolutely. So we use the word, we're a composable data architecture. What does that mean, composable? Fantastic question. Yeah.

We're basically building the entire data mesh through Lego blocks. And so that's any of the open source tools or even some of the closed source guys that are out there. But we ride on the rails of open standards to put those pieces together and integrate it as one platform.

outcome. So back to the question around car parts versus buying the car as a whole. So we have clients on both ends of the spectrum. We have those today who are managing massive technical debt. They have old infrastructures and they like parts of it and they want to upgrade and modernize parts of it. For those folks, we'll come in and really provide the newer car parts, if you will, but then have that integrated fully into it with the observability plane.

Other clients, and this is where we're seeing huge advantages, it's we have net new workloads. We want to drive AI outcomes at scale, at speed. We don't want to wait six months to get infrastructure up. We don't want to wait nine months to hire and build a team to get to a real outcome. So here it's the car comes to you. All the parts are built. It's stood up in days and you're getting outcome in weeks.

Very cool. I love that approach. Mahesh, over to you with a question about small data errors. So even with a data observability platform like Excel Data, obviously you're monitoring for data issues.

Something that might not be immediately obvious to everyone is that even very small data errors can impact AI models that those data feed into. So how can enterprises adopt strategies to prevent these errors from snowballing into big business problems? Sure. I think there are two aspects to AI models. One is the building of the model itself and the other is running the predictions. In both those cases, you need really good high quality data feeding into the models.

For example, if your data is skewed, you don't have data from one particular source. So obviously there is a change in the model and how it can predict it. So let me give you one more example. One of the largest banks in the world essentially uses AI to predict cash loan offers, credit card offers and such.

And in their instance, one of the things that they found out was the pipeline that's feeding credit scores was not getting updated properly. So now you can imagine when you're trying to predict, should I give this person the loan or not? And the credit scores aren't up to date, you've got a huge impact. You're talking about many tens of millions of dollars over the year. And some of these problems can go undetected because there is just, if you could think about

hundreds and thousands of pipelines, so many different data sources, data feeding from so many different places. - The data might look right in a circumstance like that. - Exactly. - You're getting the credit score in the right format, and so nothing breaks. - Yes. - Nothing noticeably breaks. - Yes, so that's where I think the observability kind of plays a big role because we're able to catch each of these issues at the source, shift left in terms of detecting the problem and fixing it, so you understand it very early.

and then you're able to kind of prevent it from snowballing in. The other thing I want to thread, I want to pull on is a little bit, we had a discussion around agent tech workflows and such. So if you can imagine a series of agents performing a larger task, any error in each of these due to bad data, so many other reasons, but primarily the input data is not very good. You can imagine the compounding effect

you know, bad decision versus bad decision versus bad decision, pretty soon in four or five agents down the road, you're so far divergent from where your ideal scenario is going to be that in the AI era with more AI agents being built for a lot of different tasks, it becomes that much more critical for you to have a handle on data and be able to provide very trusted data

to build your models and also trusted data to make the predictions. Your customer 360 database has to be perfect or as close to perfect as possible because that kind of feeds into the model and then you get a prediction on the other end. So it's an ongoing process and that's why you need something like an observability tool to actually manage all of this. We operate both on on-premise data and cloud data.

We are agnostic to all the data platforms, Snowflake, Databricks, AWS, Azure, Google, name the hyperscaler or even smaller data platforms we work with them. So the ability for you to kind of span multiple platforms and also have different on-prem and cloud observability becomes very critical and that's where we excel.

That leads perfectly into my next question, Anu, which is the importance of infrastructure agnosticism. So could you give us some of your thoughts on avoiding getting locked in to a particular cloud vendor or a super scaler? Why does that matter? Yeah, great question and great example. As you were speaking just now, I was reminded of what Andrew was talking about just maybe an hour or so ago around...

you know, data gravity has been really real for a lot of our clients today, but that's reducing. And what we're seeing is the real cost is not data storage, it's really data compute. And so we talk about locked in vendors. And so what we do at Nexus is we're removing the ability of being locked in. We're removing compute from storage. So we're able to now to say, hey,

you're on Databricks today, you're on Snowflake tomorrow, you're on an open source compute layer. It's the ability to decouple all of those pieces of the engine and really get to outcome. Our view is it's a world of open compute, it's an open world, open standards, and you should be able to take your compute to where you want to. Nicely said. Anything else you'd like to add on to that, Mahesh? I think the world is moving, obviously, as you said, much more to an open environment. I think

The cost of these models and compute is also changing quite rapidly. Enterprises more now than ever want this sort of optionality.

of multiple different vendors as opposed to getting locked into one. So portability of your infrastructure and your ability to analyze data and choose which is the right place to put that becomes very important. And I think I totally understand what you're talking about. - Nicely said. All right, next topic for both of you gentlemen is data governance. So this is something we promised would be covered in this session.

Yeah, I don't even really, I've been working in data science for like my whole life. Like I did a PhD in neuroscience. I've been working commercially in data science for over a decade. I still don't really understand what data governance is. Does either of you want to explain that to me?

I'll take a crack at it. I think probably the reason why you don't understand to some extent is it's always been up until now a very ivory tower type of situation where there's a committee that kind of decides how data can and should be used within the enterprise, obviously for good reason, because you want good standards, good control, security, privacy, all of those things.

rules and regulation laws in many cases, they have to be adhered to. And then that sort of gets percolated down to the data organizations and they kind of use those things in day-to-day. What's happening from an Excel data standpoint, what we are saying is like going forward,

Governance is not going to be a centralized part of the whole equation. Governance has to sort of metaphorically move with the data. You have to govern the data wherever it is rather than in a very centralized manner.

And I think Ali today pointed out about like three things, like people, process and product, you know, where how increasingly the people or even organizations or departments are taking charge of AI initiatives now with the ability to produce code and things of that nature. Now, in that scenario, if you think about building of these data products gets decentralized, right?

Now, you cannot have governance centralized trying to manage this thing that is so distributed. So you have to have an architecture where the data management platform essentially understands the state of data wherever it is for whatever purpose it is being used. And then it's able to apply the right rules and policies to make sure that whoever is using the data is using it in a way that's appropriate, both from a corporation standpoint and also from a legal standpoint, an ethical standpoint.

So I think data governance is due for a huge shakeup in the near future where people are not going to be looking at it from a committee ivory tower. Of course, there'll be inputs there, but a lot of the action is really going to be very close to the data and where it's being used. Just to add to that, I think when we talk to our clients, data governance, I'd like to say everyone talks about it, but no one's really doing it today. And

What we find is they've had so much technical debt, so many different tools that it's literally impossible for them to think about how do I follow my data from source to digital twin to mesh, warehouse, applications, whatever it may be. And what we're finding is as we've adopted a composable architecture with open standards using observability, we're able to start to automate a lot of those governance features so that heavy data

process-intensive, people-intensive part of this governance world is coming out. And getting that meta information visible is creating a ton of value. Excellent. Thank you so much, both of you. Anuj Jain, Nexus Cognitive CEO. Mahesh Kumar, Acceldata CMO. Thank you so much for this great session on managing data to embrace an AI-first mindset for enterprises. And yeah, hopefully we'll catch up with you both again soon. Sounds good. Thank you. It's been a lot of fun.

All right, I hope you enjoyed today's conversation with Anuj Jain and Mahesh Kumar on making enterprise data ready for AI. Be sure not to miss any of our exciting upcoming episodes. Subscribe to this podcast if you haven't already. But most importantly, I just hope you'll keep on listening. Till next time, keep on rocking it out there. And I'm looking forward to enjoying another round of the Super Data Science Podcast with you very soon.

846: Making Enterprise Data Ready for AI, with Anu Jain and Mahesh Kumar 18:04 Share