We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

877: The Neural Processing Units Bringing AI to PCs, with Shirish Gupta

2025/4/8

Super Data Science: ML & AI Podcast with Jon Krohn

Shownotes Transcript

This is episode number 877 with Shrish Gupta, Director of AI Product Management at Dell. Today's episode is brought to you by ODSC, the Open Data Science Conference.

Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week, we bring you fun and inspiring people and ideas exploring the cutting edge of machine learning, AI, and related technologies that are transforming our world for the better. I'm your host, John Krohn. Thanks for joining me today. And now, let's make the complex simple.

Welcome back to the Super Data Science Podcast. Today we've got a fan of the show joining us as my guest for an episode about efficiently designing and deploying AI applications that run on the edge. So like on local laptops, workstations, that kind of thing.

That guest, who's also a fan of the show, is named Sharish Gupta. He has spent more than two decades working for the global technology juggernaut Dell in their Austin, Texas headquarters. He's held senior systems engineering, quality engineering, and field engineering roles at the firm. For the past three years, he has been director of AI product management for Dell's PC Group.

He holds a master's in mechanical engineering from the University of Maryland. Today's episode should appeal to anyone who's involved with or interested in real-world AI applications, which I'm assuming is just about every listener to this podcast.

In this episode, Sharish details what neural processing units, NPUs, are and why they're transforming AI on edge devices. He provides four clear, compelling reasons to consider moving AI workloads from the cloud to your local device. He talks about the AI PC revolution that's bringing AI acceleration to everyday laptops and workstations, what kinds of large language models specifically are best suited to local inference on AI PCs,

How Dell's Pro AI Studio Toolkit will drastically reduce enterprise AI deployment time. And he provides plenty of real-life AI PC examples, including how a healthcare provider achieved physician-level accuracy with a custom vision model. All right, you ready for this illuminating episode? Let's go. ♪

Shirish, welcome to the Super Data Science Podcast. Where are you calling in from? I am calling in from sunny Austin, Texas, John. Excellent. So you've been a long time listener to the show. How long have you been listening to it, Shirish? I'd say about a year and a half.

Nice. Any particular, I'm putting you on the spot here a bit, so no worries if this is too much pressure, but any standout episodes for you? Oh yeah, I have one. I don't remember the number, but it is the one in which you go deep into what is a transformer. Oh yeah. I really enjoyed that one. I've listened to it multiple times. Yeah, that's 747.

747.

And a fair number of people came in as listeners to the show because of that. Cool. Well, it's great to have you on the show, to have a fan of the show on the show. And so we're going to talk in this episode a lot about neural processing units, NPUs, which I don't know that much about.

I know that they are an AI accelerator alternative to a graphics processing unit. So we have had episodes on the show in the past that have talked about alternatives to GPUs as an AI accelerator. But this episode, we're going to talk a lot about NPUs, specifically neural processing units. And that is not something that we've talked about on the show specifically before. So fill us in on what neural processing units are. It would be my pleasure to, John. So...

They're fairly new. NPUs stand for, as you said, neural processing units. They're maybe a year old in terms of being out on the market. The very first ones that came into the market were with Intel's Meteor Lake chipset, which launched sometime around this time last year. So I'd say a fairly new kid on the block. Why they're exciting is because if you think about...

the devices that they are incorporated into, which is your average laptop or desktop even, right? But a PC that is used by your everyday knowledge worker, it's used by individuals for their personal use, but just your typical PC. For the longest time, we've had CPUs in these PCs. For many, many years, we've had GPUs or graphics processing units.

And those are integrated and discrete. Integrated graphics processing units or IGPUs are more common, have been more common for many, many years. The NPU is the newest kid on the block. So what they do is it's a very, I'd say purpose-built architecture that is designed to do one thing really, really well, which is matrix math. So just because it is almost hard-coded to that extent,

It is extremely efficient in terms of power consumption for those kind of multiplications and additions in a matrix, which is essentially the building blocks, as you know, for AI and ML workloads. So this is super important for your average PC because, yes, you can run AI and ML on the box on the CPU and the GPU, but your battery isn't going to last very long if you keep that up.

Gotcha. So it's more efficient. So an NPU...

It would be a replacement for a GPU for training or inference time running of a machine learning model, probably particularly a deep learning based model, like a large language model. Most kinds of foundation models out there, those are all based on a deep learning architecture. Those have tons of matrix multiplication and addition, as you say. And because the MPU is designed for that specifically, instead of being general purpose like a GPU is, originally designed...

and still today is used for all kinds of uses like rendering graphics, a graphics processing unit, or mining Bitcoin. An NPU wouldn't necessarily be great for rendering graphics or mining Bitcoin, but if you're training or running an AI model, an NPU will be more efficient. So on your PC, for example, it'll save you battery power,

it might be more efficient as well, perhaps in terms of time? The NPUs, at least for now, I think the future holds a lot more possibilities. But for the time being, for the foreseeable future, I'd say the NPUs are most suitable for inference workloads. I think you still need GPUs for training, fine-tuning. NPUs are going to be amazing for AI in production or consumption by people, right? So

your data scientists, AI engineers, app developers, they're going to use a workstation to build these AI capabilities. Guess what? They're building it for someone. Someone's going to use them. And it's, you know, think about the average knowledge worker. They're going to start using on-the-box capabilities that allow them to do

anything from, you know, use AI to really accelerate productivity for anything from just asking questions to an assistant that can quickly give them the answer from a vast knowledge base to even embedded into a workflow, right? Which is where you get into a little bit more into agentic and multi-agent workflows, which is, again, I foresee that in the future. But the point I'm getting to is,

you're really going to use an NPU for inferencing. And that is where you would get, that's where your efficiency matters, right? That's why your average knowledge worker is going to use something. You don't want to start offloading AI features to a device only to tank its battery life. That's going to lead to a pretty bad experience in the longterm, right? So it's pretty important to,

Keep that in mind, that while you still need GPUs and high-powered workstations to build your solutions, NPUs are perfect for those being consumed by the average person. I got you. So we're talking about taking capabilities that today might require you to have an internet connection and depend upon some cloud service,

in order to get some kind of, say, large language model or other foundation model capability. But instead, with an MPU, you could potentially have the operations, the inference time calls, instead of going out over the internet...

and using cloud compute, you can have it running locally on device. So you're also probably going to get lower latency. You have fewer dependencies. Yeah, talk us through some of the other advantages of being able to now do things on edge instead of having cloud for alliance. Yeah, I think this is a perfect segue. In fact, this is a mnemonic that I came up with myself.

The term that is being thrown around for these devices with NPUs is an AIPC. I'm sure you've heard of it, right? So to think about the benefits of an AIPC, I've created a mnemonic with those four letters. So A is accelerated. It's basically you have now a local hardware accelerator that gives you that low latency, real-time performance for things like

Translation, transcription, captioning, and other use cases where latency is super important for persistent workloads. That's A. I is individualized. Again, this is great because if you have an AI that is on your box, it has the ability to learn your styles. Let's say if you're creating emails, if you're using it to generate emails, it's learning your style, it starts writing in your style.

It's great for, you know, we had a healthcare customer that we've been working with on a use case where, you know, there were, there's two parts to it. I'll talk about the second part. The first part is even more interesting, but I think it's related to a different example that we'll come back to later. But the second part of the AI solution is that they were taking, you know,

information from a physician's diagnosis of a patient in the ER. And they were using that information to auto-generate the physician's report. You know, mundane stuff, physicians don't like spending time on that. They'd rather go to the next patient, have that interaction, you know, increase their ability to spend time with patients. What they, the feedback they gave was, you know, with this solution, now that I've, it started seeing the way that I'm changing,

And editing its initial draft, it's starting to take on my style. And now it just sounds like me. And I love it because I don't have to do this report generation. It does it for me and I've got more time for my patients. So that's the individualized value. The third is P, it's private. Like you said, the data doesn't have to leave your device and its immediate ecosystem.

You don't have to send it back and forth to a public tenant or even a private tenant for that matter. You may have confidential information with PII that you have access to, but you don't want to merge with even a private tenant. There is sensitive information like that or unclassified information depending on your vantage point. So that inherent privacy of data

and the inherent security of running the model locally on your device gives you that assurance that this is more private than it would be. So that's P. And C, this is really important because I hear this from customers, it is an important cost paradigm shift. And I'm starting to hear this from some of our early, maybe earliest adopters of on-device AI,

which by the way is not ubiquitous today, right? In terms of enterprises building out their own AI capabilities and using on-device accelerators for offloading that. We're at the tip of the spear with Dell Pro AI Studio and we'll come back to that later. But the early adopters, what they say, and I had a FinServ or financial services customer tell me, "Sharish, my developers are using CodeGen

and they're using our data center compute, 15% of my data center compute is going to these developers that are using it for code gen or code completion or writing test cases, unit tests, what have you. They all have PCs. I want to get them to an AI PC with a performant NPU

So I can take that offload. I mean, I can offload that compute from my data center because they don't need H100s to do that code completion. I think I can do that with the NPUs on your Dell devices. So that's a real opportunity as well. It's just because you have the compute doesn't mean you should use it. It's like the right compute or the right engine for the right workload at the right time, right? So there's plenty...

of use cases where offloading from even your private data center to a on-device capability makes a ton of sense. And then if you're actually using the cloud, you're paying for every inference, right? It's tokens and API access. So now that you've got an AI PC, it's no cost to you. You built your solution, you're using it on the device,

That's it. So cost is a big factor. Now, you'll argue that the cost of inferencing in the cloud is coming down. It's scaling very fast. But again, I get back to the point that it's the right engine for the right use case at the right time. Just because you have it doesn't mean you shouldn't.

Excited to announce, my friends, that the 10th annual ODSC East, the Open Data Science Conference East, the one conference you don't want to miss in 2025, is returning to Boston from May 13th to 15th. And I'll be there leading a four-hour hands-on workshop on designing and deploying AI agents in Python.

ODSC East is three days packed with hands-on sessions and deep dives into cutting-edge AI topics, all taught by world-class AI experts. Plus, there are many great networking opportunities. ODSC East is seriously my favorite conference in the world. No matter your skill level, ODSC East will help you gain the AI expertise to take your career to the next level. Don't miss. Online special discount ends soon. Learn more at odsc.com slash boston.

Something that comes to mind for me when I think about this, initially as you started describing this, I was thinking about kind of myself sitting at a desk with a laptop. And I'm sure that a lot of users of this kind of AI PC paradigm with an MPU in it are doing that. But when you talk about cost and lots of inferencing, is it the case also that these get used for commercial or industrial applications where you have...

an AI PC that could be sitting there in a factory or in some kind of commercial setting where it could be basically continuously analyzing images or audio. And so if you were sending that kind of high bandwidth information,

images, video, audio. If you were trying to send that over a cloud, there's huge bandwidth costs. If you had a bunch of machines doing it, then you'd need to make sure that you had a network that could support all that. And then you'd also have much bigger costs on the cloud side as well. So is that kind of use case also relevant here? - Very relevant. In fact, there are vision models like YOLO, just for an example, right? Where you're totally right. It makes a lot of sense to do that at the edge, right?

and it's real time, and it's much cheaper. So those are use cases we're looking at in manufacturing. We even have a customer who's into, I mean, insurance company, right? They want to use a capability like this to take pictures of damage, use a model that is good for image classification to be able to go back, refer to that database and look at, okay, for this kind of damage,

what kind of category am I looking at, right? Is this going to cost me, you know, t-shirt size, small, medium, large, just to get the adjuster, you know, maybe 70% of the way there before they have to make their judgment. So that's one use case we're looking at with another customer. Manufacturing is an absolutely very valuable use case for customers to offload to the device right there in the factory.

so that they can do real-time anomaly detection of defects. Right, right, right, right, right. Yeah, real-time anomaly detection. Perfect application example there. Cool. So thank you for giving us that rundown of your AIPC mnemonic. So accelerated, individualized, private, and cost. Effective. Cost effective. Yes, that makes more sense.

I didn't quite finish my note there. So very cool. Makes it easy to remember some of the advantages. Are there also compatibility issues that are resolved in this kind of framework where if you have different chips from different manufacturers like Intel, Qualcomm, AMD that are also handled in this kind of paradigm? Great question. So you're actually touching into one of the reasons why we built Dell Pro Air Studio.

So without going into that, I would say this, because the NPU is such a new architecture, you know, we are working very closely with our silicon partners to get these array of models that support this variety of use cases ready and compatible with the SOCs, right? So that itself is a really important point for us to touch on later, which allows us

democratizes access for developers to use a variety of silicon targets

without having to start over every time they are faced with a new silicon architecture. That's pretty important. Nice. Okay, so now let's talk about the mechanics from the perspective of the data scientist or the software developer who's using these tools, or maybe even from the perspective of a click-and-point user, like you say, a knowledge worker, who is not necessarily coding, but they're taking advantage of some application that was built on an AI PC. In these kinds of scenarios,

You've said a number of times how it's a PC, obviously, so I assume it's Windows. Some of our listeners will primarily have been doing their work on Macs or other Unix-based machines like Ubuntu. Why should somebody be considering using a Windows computer? And as a follow-on question to that, if this isn't too much that I'm putting in there,

I'd love to understand what the mechanics are like. If I'm a data scientist or a developer using AI models or developing an application with an AI model, what kinds of tools am I using on my AI PC? This question came to me as you were answering the compatibility because you're talking about how you only get set up once and then you can work across all these different silicon providers. And then I was like, oh, what does that look like? What does it look like when I'm doing the work on a PC? Yeah, and it's a great question.

Again, keep in mind that we're talking about inference time here, right? It's all about outcomes that your AI engineers and developers are driving for. And so you think about who's going to consume the applications that are going to run on these NPUs and AI PCs.

it comes down to a Windows environment. That's where they're most likely going to get consumed. So if you start with that paradigm, then you, you know, if you've got to actually build a solution that has to run on an AIPC NPU that's running Windows, you have to develop your, you know, you've got to make sure that your model and runtime and, you know, your ability to call the model and run the local host on the AIPC

You can do some of your development work on a Unix-based machine. Absolutely, right? Because ultimately, what are you doing? You're going to take all of these AI bits. You're going to take the model. You're going to combine that with the ability to run it on a local host, right?

on an AI PC, but you still have to build your app and then you've got to integrate these bits into it. So you have the option of using your IDE of choice to do all of that piece. But when it's time for you to actually integrate your model into the app and set up the local host and run it and

and make sure that it's working, you're going to have to do that piece on a Windows-based device because that's where your app's going to run.

does that make sense it makes perfect sense and you're absolutely right windows is still in the corporate world in a lot of enterprise applications in a ton of industrial applications windows is the default operating system and so it makes a huge amount of sense to me but i i will add one thing john yeah yeah yeah go ahead we do i think i'm getting ahead of myself here so

We should come back to this when we talk about Dell Pro Air Studio. So I'll let you ask your follow-up. Okay, okay. So when... I would still love to have the question answered for me, if you may, around... There must also be, because I don't have experience developing applications. So let's say... So in this...

For this episode's purpose, we're talking about inference. So a large language model has already been trained, or maybe I'm taking some open source model weights, some DeepSeq model weights or some Lama model weights, and I've got them on my local PC. If I want to build an application as a data scientist or a software developer on that PC, I'm going to need to have a lot of data.

What do you recommend as the kind of, what are the best kinds of tools for a data scientist or a software developer on a PC in that scenario? I mean, there's a variety, right? I think the choice is the developers. This is where I think our intention is to arm developers with the tools so that they can actually integrate into their apps and their own choice of IDEs. The tools and the framework,

and the blueprints to be able to get workloads to run on the IPC NPUs. Okay, fantastic. All right, so when, you know, you apologized for interrupting me, but you didn't really interrupt me. It's just a conversation. And you said, you know, we can come back to some point that you wanted to make when we start talking about Dell Pro AI Studio. So let's talk about that now. So that is, so the Dell Pro AI Studio, tell us about why

what that is. It's a specific implementation, as far as I know, of an AI PC that incorporates neural processing units, MPUs. But tell us more about it. Tell us why they were developed. Tell us why they could be helpful to our listeners. So very important to note, Dell Pro AI Studio is a toolkit. So think of it as an SDK. But it's got an array of tools, models, a framework,

actually frameworks and recipes. So it's the tools and how to use them. So I get asked this question very commonly, right? It's not an app that's gonna be shipped on the Dell AI PC from the factory. It is targeted towards developers and IT pros that want to build, deploy and manage

apps with AI features on their fleet or subset of their AI PC fleet. So that's what it's for. So when I say toolkit, what is it? So it's got three parts to it, if I really boil it down at a high level. At least our initial release will have three parts. And each of them solves a problem or a pain point that we

heard from our customers as well as experienced ourselves as we try to bring workloads to the AI/PC for inference time. So one, we have Dell validated models. So we have curated or hand selected a range of open permissive models that cover a variety of use cases from language, speech, and vision to enable a variety of use cases.

Well, you can get those from Hugging Face, you'd argue. And the answer is yes, you're right. They're actually going to be available on Dell Enterprise Hub, which is on Hugging Face. We already have that capability today and models for enterprise today that are targeted to run

in containers for training, fine tuning on servers, on Dell servers. So we already have that. We're gonna add the models that run on the IPCs right alongside those server models, if you will. These would be smaller models, right? Think for a language model, I'd say today's NPUs, perhaps the most you can fit, and I say fit, it's a loosely used term,

I'd say get the performance that you think would be acceptable for the average user would be around 15 to 20 tokens per second, right? So to get that target LLM output, I'd say you would probably not be able to go beyond like eight or 9 billion parameter model with quantization, right? So, but that's still pretty darn good. Today those models are extremely capable.

And so this inflection of NPUs getting more performant with 40 and plus stops on the NPU and these smaller LMs getting way more performant and accurate, that inflection point has really enabled local inferencing on the device. So that's what we wanted to democratize for customers.

Did you know that the number one thing hiring managers look at are the projects you've completed? That's why building a strong portfolio in machine learning and AI is crucial to your success.

At Super Data Science, you'll learn how to start your portfolio on platforms like Hugging Face and GitHub, filling it with diverse projects. In expert-led live labs, you'll complete an exciting new project every week. Plus, through community-driven projects, you'll tackle real-world multi-week assignments while working in a team. Get hands-on experience with projects like retail demand forecasting, building an AI model from scratch, deploying your own LLM in the cloud, and many more. Start your 14-day free trial today and build your portfolio with superdatascience.com.

I'm going to repeat that back because it seems like a really important point that this paradigm that we're talking about today of having NPUs running locally or on the edge, same kind of idea,

on an AI PC at the time of recording in early 2025, that's ideally suited to kind of a 7, 8 billion parameter model. Okay, perfect. And so 7, 8 billion parameter model, yeah, absolutely. If you're using a Lama 7B model, they are hugely capable today.

And you're saying that it will pump out 15 to 20 tokens per second, which corresponds to about 10, 15 words per second, much faster than somebody can read. And so, yes, absolutely. That sounds like a sweet spot. Perfect. Yep. Yep. You got it. So I kind of went on a tangent there, but I was talking about the elements of Dell Pro Air Studio. So back to the Dell validated models.

We realized one of the key pain points was for anyone trying to build AI features that run locally on the device, they either have a model that they've picked out and they don't know what system it can run on, or they have a system in their fleet and now they want to use it and they will figure out, okay, what kind of model can I run on this? That's typically the way that I'm seeing customers approach this, right? And we'll get to like custom and fine tuning later. Let's keep it simple for now.

But let's just say you're using a base open permissive model. And if you're a developer, you're going to have to kind of iterate. You're going to take a model and a system and you're going to figure out whether they can actually coexist. Can it run? Can it give the performance I need? Is it going to give me that 15, 20 tokens per second performance? How big does the model need to be? I mean, let me try a 14 billion parameter model. Let me try a 30 billion parameter model.

you quickly realize, oh, I don't have enough memory. Let me go to 32 gigs. Let's try again. Oh, it runs now, but oh God, it's like two tokens per second. That's not going to work. But believe it or not, every iteration takes a lot of work for them to get to that point. And that's insane, right? Anyone faint of the heart is going to say, okay, this is too much work. I don't think I'm ready to build for the NPU quite yet, right? So it's a huge deterrent today.

So by solving that problem and having models and systems paired up, like, hey, here's the performance you're going to get. It's been fully tested by Dell. It's fully compatible with silicon and a variety of silicon. It doesn't matter what SOC you got. If it's a Dell system with this specification, you are good to go. And all of these use cases are enabled by this curated set of models. So big pain point. It seems trivial, but it's a big pain point that we've solved with

that set of models on Dell Enterprise Hub. The second element of Dell Pro AI Studio is its enterprise readiness, which is massive, right? Today, you have plenty of developer

There are apps out there. I won't name them, but there are some really good apps and people who want to test and see what running a model locally on the device and chatting with it looks like. And I encourage people to go find those apps. And those apps are great for bringing a model, running it locally and chatting with it. But that's it. It's a POC. But what do you want if you're an enterprise developer?

You want to be able to turn that into value for your end users. You want to drive productivity. You want outcomes. So you've got to test it. You've got to figure out how you're going to use it in an actual example that is going to make your users productive, like the ones we talked about, like the manufacturing anomaly detection. And there's several others I can give you for the average knowledge worker. If you really want to now take that capability

and deploy it across a fleet and control it completely end to end, including the data that goes in and out of those apps, that app has to be enterprise ready, which means your model's got to be enterprise ready. It has to be a testable so that you're not injecting risk into your enterprise. And then you have to be able to control it through its entire lifecycle. That's what enterprise ready means. And then you've got to, as an IT pro,

you want to be able to manage every aspect of it. Who has permission? What models get deprecated, updated? How is it being used? How are the apps being used? What's the telemetry? None of that exists today. So that's the other piece. Via Dell Management Portal, you have that enterprise-ready capability where every element of your solution that uses the Dell Pre-R Studio is ready for the enterprise. It's ready to be embedded into your workflows

and your data is completely in your control, right? That's the part two. Part three is our middleware, right? It's the framework that runs on every device that will be in the fleet where these AI apps are deployed. And it does all of the cool stuff, right? Model operations are automated through this.

system and model discovery is automated through it. So, you know, it can detect the silicon on the device. It knows the app, it knows the model that the developer intended for that silicon. It pulls that from the registry and it, you know, slots it in. It can slot in, slot out, you know, load, unload, secure it by verifying attestation, all that stuff, right? Which is, if you're a developer, you would have to build that into your app.

And that's not trivial, right? You're talking about multiple libraries of code with hundreds and hundreds of lines of code that you've got to write, debug, test, and manage, right? And at the end of it, if you have a change in your features, if you want a new model, you've got to spin a whole new version of that app.

quickly becomes very unsustainable, right? If you're an enterprise grade developer or IT pro, you just can't sustain that over the long term. So that enterprise grade capability that is enabled through Dell Management Portal and that AI framework on the box

is pretty special. Nice. So what I'm hearing is that with the Dell Pro AI Studio, you're allowing your users, developers, data scientists of AI applications to be able to dramatically accelerate their timeline from proof of concept to enterprise application. Yes. Accelerate and simplify.

And in fact, for lifecycle management, it's a big enable because it doesn't exist today for the PC. Nice. Yeah, that sounds like actually really game-changing.

That's really exciting. Okay, so if we have this accelerated capabilities, what kinds of use cases are you seeing for people using Dell Pro AI Studios, using your software development kit, your SDK? What are the kinds of practical applications that you're seeing happen? So we already covered CodeGen, right, and code completion. We covered manufacturing anomaly detection. Let's talk about a couple that are really cool, right? I talked about the insurance agent,

or adjuster using image capture to get started on their claim for damage. We've been talking to a shipbuilding company or ship inspection company, they could use the same for the field. Like have the ability to go take this app, scan the parts of the ship and check for automatically check for damage.

so that you're not completely reliant on the human. But do a quick scan through, document the damage during inspection, and again, auto-generate the report. So those are all really cool use cases. Another one that we're looking at is first responders. If you have EMS or police that's responding to a distress call, they go on site and they realize the person or the victim doesn't speak their language.

Now you have real-time translation on the app, no latency or low latency, because that's super important without having to be relaying that info to the cloud. And then on top of that, if you couple that with, you take the transcript of that entire conversation and you convert that into an auto case generation, that's massive for the officer or the EMS prep, for the personnel.

Saves them a ton of time and paperwork that they have to usually do after hours or between calls. Improves accuracy of all the information captured. And guess what? The raw transcript can be discarded once the report's created. So you have privacy and you're not necessarily sending information that you shouldn't send to a cloud somewhere. I mean, I can go on and on. And you talk about general use cases within the enterprise. There's one that we're looking at.

Within Dell, actually, I'll give you two. You have a young engineer that comes across a defect while they're developing a product. And they want to check, have we seen anything like this before? And what did we do? What were the possible causes and how did we solve it? They can quickly go into the VA, the virtual assistant, and say, here's a summary of my issue. Tell me if there's anything like it in our knowledge base.

thinks for a moment and comes back with five Jira tickets with the links and says, these are the owners of these tickets. Go talk to them, right? And go to the tickets and figure it out that maybe it's a good starting point. I mean, it's just the possibilities are endless. And you always go back to like, why does it make sense to do them on the device? The truth is not every use case is meant for the device. But if it meets that AIPC mnemonic, you should consider it.

Accelerated, individualized, private, and cost effective. That's really cool, all those applications that you just ran through. Of course, something that I say on the show pretty frequently, especially on Friday episodes recently, is if you are listening to this episode, you could be a hands-on developer, data scientist, or not. You could be a more commercially oriented individual listening to the show,

but you're interested in having somebody, like you just listed a whole bunch of examples, you can go and talk to your favorite large language model. So for me right now, that would be Claude 3.7 Sonnet from Anthropic. When I'm looking to brainstorm on ideas for a particular application or use case, I would put in context. So if I was you, listener, I'd be putting in the specific context about the situation that you're in

what business you're in or what business interests you have or what academic interests you have, what background you have, what is your special niche or what are special data that you have access to that other people might not that can form a bit of a moat. And then just talk to an LLM about what kinds of applications you could be building. And then something like Dell Pro AI Studio could allow you to go from idea to proof of concept to

to enterprise application very rapidly in, I guess, a matter of weeks or months. And while you're doing that, it sounds like based on these neural processing units that would be used in an AIPC, you mentioned earlier, Sharish, this idea of kind of a 7 billion, 8 billion parameter model being ideal.

And so I just wanted to, I highlighted already when you were speaking earlier that Lama models could be great from meta, but some other options that are small that could be great for this kind of scenario that have a lot of capability are the series of five models from Microsoft, as well as the Gemma models from Google. And I'll have links to both, well, to all three of those, Lama,

Fi and Gemma in the show notes as great options that people can be using. And potentially, I mean, I haven't gone and looked to see exactly which models are kind of in the pre-validated Dell Enterprise Hub that's available in Hugging Face, but I suspect that those are the kinds of models that would be. Yes. And I should point out that Dell Pro AI Studio is currently in development. So you are going to

see it maybe mid-year, right? We did announce it earlier this year, but we're building it. We're in early access right now. So I would say this, that if you're listening and if you're interested in early access and doing a beta program with us on it, please do reach out and I'll share the information with John to add in the show notes.

Eager to learn about large language models and generative AI, but don't know where to start? Check out my comprehensive two-hour training, which is available in its entirety on YouTube. Yep, that means not only is it totally free, but it's ad-free as well. It's a pure educational resource. In the training, we introduce deep learning transformer architectures and how these enable the extraordinary capabilities of state-of-the-art LLMs.

And it isn't just theory. My hands-on code demos, which feature the Hugging Face and PyTorch Lightning Python libraries, guide you through the entire lifecycle of LLM development, from training to real-world deployment. Check out my generative AI with large language models hands-on training today on YouTube. We've got a link for you in the show notes.

Yeah, fantastic. Appreciate that. And we'll also, at the end of the episode, we'll get, you're a regular listener, so you know that at the end of the episode, my final question is always how people should get in touch with you. So you may get bombarded through that as well. Nice. All right. So

Given these kinds of large language models that we could be using, given the kinds of applications that you've discussed and that people can get from their own discussions with a large language model of their choice, I've kind of alluded here, I've made the assumption

With something like the Pro AI Studio SDK that is in early access now and will be fully available mid-year 2025, I've assumed that we're talking about weeks or months timeline to developing an enterprise AI application. Fill me in on whether I was right with that assumption. What are the real timelines here? If I'm just using...

If I'm trying to do things on my own and trying to build all the kind of glue and security that I would need for an enterprise application on my own, as opposed to relying on something like the ProAI Studio to do that, what's the difference in timeline? Great question, John. So in our estimation, to take a typical app from POC to in production for running on the AIPC NPU,

could take you about six months. And this includes all of the discovery and iteration with identifying the model and the device that you would run on for your use case. It would involve all of the development to enable the runtime, compatibility, and all the model operations on the device.

And then finally you come to deployment, right? Which will again be a very manual process of deploying it onto a fleet of apps. So up to, and I call that point time to initial value, right? So from build or discover to build to deploy, that's about six months in our estimation for a typical app, nothing too complex, like a RAG chatbot, for instance. With Dell ProAIR Studio, you can deploy

shrink that down to under six weeks in our estimation. So that's about a 75% reduction in time. The time that you're going to still spend on is the time that you should spend on, which is picking the use case, making sure the solution output's tuned for your users, is delivering the accuracy that you need

and validating the outputs, right? So that's where we want developers to focus their time, which is really what they own, right? In all the parts that are painful that they just have to do to get to the point, from point A to point B, but they don't need to, that's what we're automating for them with Dell Pro AR Studio, right? So it's all of the stuff that they shouldn't have to do. We're working to solve that. And so again, I repeat,

For that typical app to go from discovery to deployment, time to initial value, we can reduce that from six months to six weeks.

That's wild. I believe that. With the right kind of tooling, it's definitely possible. And it's nice to think how that could make processes repeatable. It just makes it so much easier for an organization or for an individual to pick up an SDK like this and be able to iterate, be able to develop more applications. It's really exciting. That 75% number is huge. All right, so we understand the benefits today, I think now, of a framework like this.

What do you think these kinds of tools like Dell Pro AI Studio that let you rapidly go from prototype to enterprise AI application, as well as technologies like NPUs running on AI PCs, which again, with Windows being the de facto standard across most of the commercial and industrial world, what does that mean for the future?

how are AI development workflows and applications going to change in the coming years? Great question, John. I think, you know, if I look to the future, just from a standpoint, I'll touch on like, what do I see the AI workloads of the future, right? On device. I think you'll see a lot more agentic behavior on the device. And what I mean by that is you're going to have the AI, right?

have some sort of agency to take action on your behalf. And it really comes down to the parameters you or your enterprise define. But I think we're not there yet, but that is the next wave where I see AI assistants working in users and or in collaboration with others to get things done for you and come back to you to ask for your input based on the parameters that you've set. So again, to me,

that's what's going to enable true productivity gains when it's automation within your workflow for the average end user. Whether it's personal uses, you might have your own travel agent that you set the parameters and say, "Hey, go buy my tickets for this concert," or something like that. You've given it enough parameters so that it can take action on your part.

And you tell it you want the best deals, you want it on these dates, and it goes and scours everything, finds it for you, and comes back to you and say, hey, this is what I found. Is this good? Do you want to pay? And then if you've authorized it to pay, just say yes, and it goes and does it. Now, that's not a space that's not available today. I think models have to evolve further. You need to get to reasoning models that get small enough that you can run them locally. That might take some time.

But I do think that is the future, right? The other thing I see is hybrid compute, where you may not be completely black and white. You may not just work only on your device or only on your private cloud tenant. You may seamlessly use compute locally where it makes sense and then

the cloud steps in as soon as something that's required requires more compute than your device can offer than your pc can offer that's going to be more ubiquitous in the future so i see a lot of that hybrid workload orchestration happening in the future as well i also think yeah you know the um del pro a studio

is going to keep abreast of all of these developments. So not only will we continue to expand the model set that we support to keep enabling more use cases as models become more performant and NPUs become more capable, we'll expand models, we'll expand silicon supported architectures, or our support will widen for architectures. And then we're going to have more automation in the future. Whatever today

starts with a manual process. You can think anything that's manual can ultimately be, if it can be automated, it will be automated. So that's what I definitely see happening in the future and we'll keep evolving Dell ProAir Studio to keep up with that evolution. Fantastic. That is a really cool vision for the future.

Before we wrap, I'm broadly aware of a broader AI ecosystem at Dell, which is something called the Dell AI Factory. How does this particular initiative fit into the broader, how does the Dell Pro AI Studio, this SDK that we've been focused on most of this episode, how does that fit into the broader AI ecosystem? Particularly for our listeners,

what are the most valuable takeaways for them from this broader Dell AI factory ecosystem? Great question. So this absolutely fits into the Dell AI factory. Let me touch on the Dell AI factory construct itself for a moment here for those that may not be aware. So the Dell AI factory

came to life about a year ago. It was announced at Dell Tech World in 2024, and it was a really important announcement and capability for our customers to take Dell infrastructure, right? Whether it's client devices, edge devices,

data center, compute, storage, networking, fabric, what have you, but take all of that infrastructure, combine that with an open ecosystem of tools, models, frameworks, essentially the software layer, and combine that with services, right? Because not everyone wants to just take the hardware and the software

and build it themselves. Now you can, but for those who don't want to DIY, you have Dell professional services to help you along your journey, right? Either do it, build completely for you or consult with you on the parts that you need help with, right? But so these three components of the Dell AF factory, the infrastructure, the open ecosystem and the services together

enable customers who have data, ideas, and they want to come to outcomes and use cases and productivity or efficiency for their company or their employees. To go from that left side, which is ideas and data, to the right side, which is outcomes and value, you use these three components of the delay factory to go from there, from A to B.

So it's everything. How do you manage your data? How do you store your data? How do you organize your data? How do you use the best set of models and tools and tool chains and frameworks on the right set of hardware coupled with services as needed to get to those outcomes? So that's the Dell AI factory construct. Dell Pro AI Studio,

fits into this because it is truly our client story of the Dell AI factory, right? So you couple client devices, which are the AI PCs with the open ecosystem, which is the Dell Pro AI Studio itself, which has open permissive models, the tools, the frameworks, and other recipes. And then we will also be standing up Dell Professional Services

to enable customers to get there where they need help. Fantastic. That's how Dell Pro AI Studio fits. Yeah, and I think a key thing here to note is before I ever started, because it's now been about a year since I started doing television ads, actually, for Dell AI Factory, I didn't know that Dell offered services. And so, you know, I think of Dell as a hardware company, as making PCs,

is making servers. And so this professional services angle that the Dell AI factory encapsulates, yeah, I think that's something, you know, it's something important to highlight for people who might not be aware. You know, Dell is an absolutely enormous company and that it has this professional services arm. Yeah, it's a big business as I've discovered in the past year. And,

And yeah, cool that the Dell Pro AI Studio, which will particularly appeal to our listeners out there who are developing AI applications. But it's nice to know for them as well as all of our listeners, I guess, that if they want to be churning out AI applications out of the factory, you have all the services needs covered for them as well. Yeah, so something that I didn't know. Absolutely, yeah. And you know, this is,

You made me remember one more thought that I would share with the listeners. If you juxtapose our Delia factory and overlay it along the journey of an enterprise developer or IT pro, the flow always starts with your data scientist. Then you come, that's where they do all of the training, model development.

fine tuning and take the base PyTorch, the base versions of the model. The AI engineers come in and apply tool chains to kind of go from the general base to maybe more of targeted model for the specific runtimes.

And then you have the software developers that come into the picture where they now take those targeted models, combine or integrate into apps and create those solutions that can be consumed by end users. That's the kind of very simplified version of a flow for AI. The way I see it in every customer conversation, this becomes pretty relevant. Like, how do I start?

And we'll always tell them, like, take your smartest people, you know, maybe your data scientists, put them in the room with the people where you need that improvement, right? So let's say if you're trying to improve healthcare delivery in an ER, take your data scientists, let them sit and coexist with your ER staff for a month or however long you can afford it.

And before they start killing each other, but the idea is let them coexist, let them really study the problem. And then you put the most powerful workstations in the hands of those data scientists and tell them, okay, go figure out a solution to this, right? Go do your POCs.

You now know what the problems are. You know how they work, what the flow is. Go figure out the solution. And so that's what I tell everyone is to take the Dell Pro Max workstations, put them in the hands of your smartest people, let them study the problem, do the POCs. Once they understand what their use case needs, you can go scale that,

on server infrastructure, right? Like you need to do some serious training or fine tuning. You want to really prove out your solution. You may need infrastructure for that. And then once you have your infrastructure, I mean, once you have a solution ready and deployed,

Then you have the opportunity to assess, okay, now that it's in production, do I really need my H100s or B200s or B300s to really run that, right? Or can I actually use a smaller model, use AI PCs? And if that's the case, let's offload that workload to my fleet of AI PCs and

and let all the users consume them right there, right? So this is the flow that we are encouraging our customers to take is start with the iPod workstations, scale to the servers and deploy to the endpoints, the AI PCs, right? So it's almost like the client devices bookend

our infrastructure on either side as part of the Delhi Eye Factory. And I do want to share that example, which I told you from healthcare earlier, which I said was really cool. But the first part I said, I'll hold back for later. This is one customer that has actually done pretty impressive job

in doing exactly what I just said. They took that data scientists, they put workstations in their hands, they went to the ER and they studied the problem. They created a custom vision transformer model using their own radiology images from the ground up. They trained it on their own radiology images. And now they were able to take new patients, get those radiology images,

compare them to their training data set and get an initial diagnosis for the physician to look at. And they started seeing that they're getting equal to physician level accuracy in the model predicted diagnosis, which is pretty impressive if you think about it. And then this is the part which I said was super cool, right? That they use this custom vision transformer. And they're very keen to bring this down to the AIPC because today all of this lives in a HIPAA compliant data center that they can't replicate

across the globe and they want their entire hospital system and doctors in mobile areas that are all over the globe to have this capability. And I'm like, the best way to do this is if I can do this on a PC.

And, you know, this is the part where I had said earlier, they took that diagnosis and converted into an auto-generated report as well, right? So that's the two-step process that I talked about. Yeah, so just to wrap up, I think making it real with Dell Pro AI Studio is really what we're trying to do for our customers. And as you saw in that example, it's all about taking your use case, how you are applying

are enabling value for your business. Again, I hark back to the four reasons why you should consider potentially bringing the workload and running it on an AI PC. Accelerated, integrated, individualized, private, and cost-effective. So any of those apply to your use case. Once you've stood it up and it's in production, you should consider

whether it's right for you to offload it to the PC using Del Pro AI Studio. Yeah, makes sense. Fantastic. All right. So you've also got another great thing for our listeners in addition to that consideration is that you have the beta program for this AI Studio that our listeners can get into or can apply to get into. So I'll have that link for them in the show notes.

Before I let my guests go, as a regular listener, you'll know that I always ask for a book recommendation. So do you have something for us? It's not data science related, but it is a book that I am reading right now. I'm terrible at author names, but I think you'll find this. It's called Manifest, and it's all about manifesting the future that you envision for your life.

And actually, you know, there's another one I'm reading concurrently. I'll give you two book recommendations. Another one that I really like is Five Types of Wealth by Sahil Bloom. They're very similar, kind of reading them concurrently, but it's all about manifesting, you know, the future you want for yourself. And Sahil Bloom's book is all about, you know,

It's not just about one type of wealth. There are actually five types of wealth in your life. So if either of those books interest you, they're both highly recommended. Nice. Fantastic, Sharish. Thank you. And as I mentioned would happen earlier in the episode, how can people follow you after this episode? Yeah, I am on LinkedIn. So if you do want to follow me, I'm on LinkedIn. It's...

LinkedIn.com slash Shreej29 is my handle, but you can search me up. I don't think there's another Shreej Gupta at Dell right now. We'll have a direct link in the show notes as well. Perfect. Yeah, fantastic. All right, Shreej, thank you for coming on the show and opening our minds to the possibilities.

with having AI PCs, something that is actually really important because of how widespread Windows operating system is across the world. And so when you're thinking about deploying into that very common environment, it makes a lot of sense to me to be using a tool like Dell Pro AI Studio that allows you to accelerate and have lots of compatibility, scalability, enterprise readiness,

Very cool. Thank you for filling us in today, Sharish. And maybe we'll catch up with you again in the future. Sounds good. Thank you very much for having me today. It's been a pleasure talking to you.

Well, I learned a ton in that episode and had my attention drawn to the importance of considering deployments of AI applications to edge devices running the globally ubiquitous Windows operating system. In today's episode, Sharish covered how neural processing units, NPUs, are specialized chips designed specifically for matrix math operations, making them highly efficient for AI inference workloads. He also talked about how NPUs are optimized for running inference rather than training, ideal for deploying AI to end-user devices with better battery life.

He talked about the AI PC advantage and how it can be remembered with his mnemonic: A accelerated low latency, I individualized learns to your style, P private data stays local, and C cost effective no cloud fees. He talked about how current NPUs can effectively run 7 to 8 billion parameter LLMs at 15 to 20 tokens per second, making local inference practical for many applications.

how the Dell Pro AI Studio may reduce AI application development and deployment time from six months to six weeks, a 75% reduction, by automating model discovery, compatibility, and lifecycle management. And he provided lots of real-world PC AI applications including manufacturing defect detection, insurance damage assessment, real-time translation for first responders, and medical image diagnostics.

As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Sharish's social media profiles, as well as my own at superdatascience.com slash 877. And if you'd like to engage with me in person as opposed to just through social media, next month you can meet me in real life at the Open Data Science Conference ODSC East.

which is running from May 13th to 15th in Boston. I'll be hosting the keynote sessions and with the extraordinary instructor Ed Donner, who's also a longtime friend and colleague of mine, will be delivering a four-hour hands-on training in Python to demonstrate how you can design, train, and deploy cutting-edge multi-agent AI systems for real-life applications. It should be exciting indeed.

All right. Thanks, of course, everyone on the Super Data Science Podcast team, our podcast manager, Sonia Brayovich, media editor, Mario Pombo, our partnerships manager, Natalie Jaisky, researcher, Serge Massis, writer, Dr. Zahra Karshey, and our founder, Kirill Amiramenko. We can never forget him.

Thanks to all of them for producing another illuminating episode for us today, for enabling that super team to create this free podcast for you. We are deeply grateful to our sponsors. You can support the show by checking out our sponsors links, which are in the show notes. And if you want to sponsor the show, you can find out how just head to johncrone.com slash podcast. Otherwise help us out by sharing the show with people who would love to learn about podcasting.

Edge AI applications. Review the show on your favorite podcasting app or on YouTube. Subscribe, obviously. Edit videos into shorts if you want to. But most importantly, just keep on tuning in. I'm so grateful to have you listening and hope I can continue to make episodes you love for years and years to come. Until next time, keep on rocking it out there and I'm looking forward to enjoying another round of the Stupid Data Science Podcast with you very soon.

877: The Neural Processing Units Bringing AI to PCs, with Shirish Gupta 01:09:32 Share

Super Data Science: ML & AI Podcast with Jon Krohn

Shownotes Transcript

877: The Neural Processing Units Bringing AI to PCs, with Shirish Gupta