We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Federated learning in production (part 1)

2025/5/30

Practical AI: Machine Learning, Data Science, LLM

AI Deep Dive AI Chapters Transcript

People

Patrick Foley

Topics

Patrick Foley: 传统的机器学习方法依赖于集中式数据训练，但这种方法在处理隐私敏感或数据量巨大的场景时会遇到挑战。联邦学习提供了一种替代方案，它不将数据集中，而是将模型发送到数据所在的位置进行训练，从而在保护隐私的同时实现分布式训练。这种方法与传统的分布式训练密切相关，但更加强调隐私保护的重要性。在联邦学习中，需要特别关注如何验证模型不泄露关于数据的敏感信息，以及如何应对来自多个互不信任的参与方的潜在威胁。因此，联邦学习不仅仅是一种技术手段，更是一种在多方协作环境下，保障数据安全和隐私的策略。 Patrick Foley: 在实际操作中，联邦学习框架首先需要对模型有一个共同的理解，然后将模型分发到各个参与方。在实验启动时，会有一个中心服务器（或称为聚合器）负责与所有参与方通信，分配任务并提供最新的模型权重。客户端（或称为合作者）则在本地运行代码，并根据联邦学习计划进行训练。这个计划包含了模型的超参数、网络配置等重要信息。为了确保数据安全，需要进行严格的审查流程，尤其是在医疗机构等对数据隐私有严格要求的场景下。在训练过程中，模型权重会被限制在NumPy字节中传输，以避免潜在的代码注入风险。训练完成后，各个参与方会将更新后的模型权重发送回聚合器进行整合，这个过程可以通过加权平均等方法实现。通过这种方式，联邦学习可以在保护数据隐私的前提下，达到接近集中式训练的准确率。 Patrick Foley: 联邦学习的聚合方法有很多种，其中FedAverage是最常用的算法之一。但针对数据异质性问题，还有其他算法可以更好地处理不同客户端之间的数据分布差异。例如，FedOpt算法会考虑各个合作者的损失项，以便更快地收敛到全局最优模型。虽然这仍然是一个活跃的研究领域，但通过应用这些先进的聚合方法，联邦学习通常可以达到与集中式训练相媲美的性能。在实际应用中，联邦学习已经被成功部署在医疗、金融等多个领域，例如脑肿瘤分割、文本预测等。这些案例充分证明了联邦学习在保护数据隐私的同时，实现高效模型训练的潜力。

Deep Dive

Shownotes Transcript

Translations:

中文

Welcome to Practical AI, the podcast that makes artificial intelligence practical, productive, and accessible to all. If you like this show, you will love The Change Log. It's news on Mondays, deep technical interviews on Wednesdays, and on Fridays, an awesome talk show for your weekend enjoyment. Find us by searching for The Change Log wherever you get your podcasts.

Thanks to our partners at Fly.io. Launch your AI apps in five minutes or less. Learn how at Fly.io. ♪

Welcome to another episode of the Practical AI Podcast. This is Daniel Witenack. I am CEO at PredictionGuard and joined as always by my co-host, Chris Benson, who is a Principal AI Research Engineer at Lockheed Martin. How are you doing, Chris? Doing great today, Daniel. How's it going?

It's going pretty good. I would say my mind is a little bit scattered today, maybe distributed over various topics, jumping from peer to peer between different meetings. Thankfully, you know, we're just going to continue that theme today into a little bit of a discussion on federated learning because I'm really happy to have Patrick Foley here with us, who is

lead AI architect who's focused on federated learning at Intel. How are you doing, Patrick? Doing great. Thanks for having me on the show. Yeah, of course. I was saying one of our engineers at PredictionGuard, Aishwarya, shout out to her. She spoke at the Flower Conference over in London not too long ago and I think bumped into you. So it was good to

Good to get that lead. But it's been maybe a little while since we talked about federated learning, which we have talked about in previous episodes. But I'm wondering just for the audience at large who's maybe been hearing a lot about

LLMs and only LLMs or Gen AI for however long now. Just circling back to that topic, could you set the stage for us and give us kind of the explainer on federated learning generally and what that means? Yeah, absolutely. And just before we continue, the views and opinions that I'm sharing today are mine alone and don't necessarily reflect the position of Intel Corporation.

So the main training paradigm for machine learning has been having your data centralized and then training your model on that local data.

There's a lot of cases where you can't centralize your data due to privacy concerns or maybe even the size of the data is an issue. And so there's a different technique where instead of sending your data to a central place, you send your model to where the data is and you train it there. So it's closely related to distributed training, as you could probably tell from the description there. But there's a much higher focus on privacy

concerns. And so how you can verify that the model is not encapsulating something about the data and who the threats are, because it's not just a single person that is controlling all of the infrastructure, but multiple parties who might not trust each other. That's where a lot of the variance of how we need to kind of focus on those concerns comes from.

And just to kind of dig in, maybe just a small bit deeper there. So if you're bringing the model to...

to this distributed data in what way, maybe just walk us through kind of a flow, I guess, of training. So you send the model to these places that have the data, what kind of happens in that training process or, or how does it, how does it iterate in a different way than maybe what people are used to hearing about?

Yeah, absolutely. So there's a number of both closed source and open source federated learning frameworks that are out there. I lead the Open Federated Learning, OpenFL, open source projects. And there's a number of people that do this in the same way. But really what it involves is first having a shared notion of what that model is.

And then there might be a distribution phase for the workspace or the code ahead of time so that everyone has a record of what the code is that's going to be running on their infrastructure. And so at the time that the experiment starts up, there's a server or what we call an aggregator. That's the central point where everyone is communicating with that server for work.

what tasks they should be doing or what the latest model weights are that they should be training on. And then the client side is what we term as the collaborator. So everyone has a view of what that code is and

we have this concept of a federated learning plan, which includes everything outside of the code itself. So this might be hyperparameters for the model, some of the network details that you might want to know, whether there's TLS being used, mutual TLS,

And a lot of other things that you might care about if you're a hospital that wants to be running the software on your infrastructure and you don't want to be exposing your data because of HIPAA or GDPR considerations. So there's this vetting process that's really important to happen ahead of time. And then once this vetting has happened, then there's an opportunity to actually launch the experiment.

And what this means is for the aggregator or the server is launching that application that starts a gRPC server or some kind of REST server. And then for the collaborators, they are just starting their local process and making the connections to that local server. So the flow is, this is really all of the setup for the experiment actually taking place. But the aggregator has a...

Initial model waits for what everyone is going to be training on for that first round of the experiment. And so then everyone receives those model waits and it's not the entirety of the model and the way that we divide things into this provisioning phase and then the runtime phases so that we can limit what actually gets sent.

across the network. We don't need to be sending Python objects, which are much higher risk in terms of being able to send code that could then exfiltrate your data and it's not necessarily vetted ahead of time. So there's very small windows of information and

we limit that communication path to NumPy bytes. And the great thing about doing things in that way is that if you're just dealing with model weights, then that means that you can train across a bunch of these different deep learning frameworks. So we can work with PyTorch models, TensorFlow models, etc.,

And you can send those model weights across the network. You can populate your Python code that's already been shipped to you ahead of time, do your local training. And then based on the updates that you have or based on your local data, you send those your updated model weights back to the aggregator and then they get combined in some way.

In the simplest case, this can be something like a weighted average based on the number of data sets that you might have locally for each of those collaborators. And then this is really what constitutes a single round of federated learning training. And then what we've seen is that just by using kind of these simple methodologies, you can get to a point where you have somewhere in the realm of 99% accuracy versus a model that's been trained on centralized data alone.

I'm curious, just as you were talking about the aggregation of each of the data back to the main server, and you talked a little bit about different ways of aggregating and stuff. I'm just curious, are there a lot of different ways

approaches algorithmically to that aggregation? Or does that tend to follow the same mechanism most of the time? And do people tend to choose different ways of aggregating data? I'm just wondering how much variability is typically found in there among practitioners?

Yeah, that's a great question. So we've seen that FedAverage works pretty well in a lot of cases. So FedAverage is the original aggregation algorithm for federated learning that was coined by Google. This was back in 2017. And they actually coined the term federated learning originally at that time. But the...

but there's others that are out there that deal much better with data heterogeneity between the different client sites that might have different data distributions. And so when that's the case, you might need to ignore some of the outliers or incorporate their local updates in a different way that allows you to capture that information or converge faster to what a global model would be that would perform well

on all of these different data distributions. So there's a number that do try to capture some of this information. So FedOpt is one of those that incorporates the lost terms of the different collaborators that are out there. And this is really a hot research area, but it really varies is what we found. But by applying some of these top methods, you can generally get to a pretty good point in convergence versus centralized data alone.

So, Patrick, I'm curious about if we could just talk through maybe a couple of example use cases kind of pointing out the actors in the process. So we've talked about kind of the central aggregation. We've talked about these.

you know, clients or collaborators, I believe you called them. So this distributed set of collaborators who have the model and are doing updates to the model, which are then aggregated back together. If you could just maybe highlight, hey, here's an example use case in this industry with this type of model. Here's who the party would be that would be the

you know, aggregator party and where that infrastructure one run. Um, and here's the parties that would be the collaborators where the model would be distributed. That would be very helpful. Yeah, absolutely. So I'll, I'll take one of, uh,

really the first real world deployments of federated learning that my team took part in. So, um, back in about 2018 or so Intel started collaborating with the university of Pennsylvania on trying to deploy federated learning in hospitals for the purpose of brain tumor segmentation. So this was very recently after Google even released, uh,

their seminal paper on federated learning showing that this had high success for text prediction on Android phones. And this was the health application of this for federated learning. And so this progressed to a point where we were able to demonstrate that we were able to achieve 99% accuracy versus a centrally trained model. And then this really spanned out to

a much larger real world federation where we were able to train across roughly 70 different hospitals across the world. And so each of those hospitals represent the collaborators in the architecture that I was speaking to earlier.

And then, uh, the university of Pennsylvania served as that central point or the aggregator for where the initial model was populated from. And it was a, a, uh, 3d, um, uh, convolutional neural network, a segmentation model. So coming in with, uh, DICOM data, and then trying to get a, an estimate of where a glioblastoma, uh, brain tumor was based on that, that, that image. Um,

And so there's the collaborators and the aggregator. And then that's really the high level of what this looks like. But then there's a lot of other details that had to be dealt with beyond just this more kind of...

I would say vanilla federated learning architecture. And really where that came from was there's a lot of issues with figuring out how to identify mislabeled data when you have privacy that's at stake. And so this really requires experts in data science or someone who has a background in federated learning to go and dive into how you're identifying data

these conversions issues that might pop up. And so, uh, UPenn was taking on a lot of that responsibility. There were Intel engineers who were very, um, very involved with a lot of those, those calls as well, and trying to get on the phone and have these zoom calls with,

I mean, these different IT admins and data owners at each of the hospitals just tried to figure out where there might be a mislabeled data set or that type of thing. But it really exposed that there were gaps in the total participant layout. And we needed to have more of this kind of shared platform for how you can exchange this information and get access to that data in a secure way. And that's one of the things that we've been working on ever since this study came out.

Well, friends, NordLayer is the toggle-ready network security platform that's built for modern businesses. It combines all the good stuff, VPN, access control, threat protection, and it's all in one easy-to-use platform. No hardware, no complex setup, just secure connections.

And full control. In less than 10 minutes. No matter if you're the business owner, the IT admin, or someone on the cybersecurity team, NordLayer has what you need. Here's a few use cases. Business VPN. How often are you traveling? A need to have secure connections from one endpoint to another. Accessing resources. Preventing online threats. Preventing IP leaks.

This happens all the time. What about threat protection? Being in a place where you want to prevent malware, where maybe there's a high risk. You're at a coffee shop.

Malware, ransomware, phishing, these things happen every single day and users who are not protected are the ones who get owned. And what about threat intelligence? What if you could spot threats way before they escalate? You can identify, analyze, prevent internal and external risks. This is like dark web stuff all day. Data breaches, breach management,

Serious stuff. Well, of course, our listeners get a super awesome deal. Up to 22% off NordLayer yearly plans, plus an additional 10% off the top with the coupon code using practically-10. Yes, that's the word practical, then L-Y-10. So practically-10. And the first step is to go to nordlayer.com slash practical AI.

Use the code PRACTICALLY-10 to get a bonus 10% off. Once again, that's NordLayer.com slash Practical AI.

Well, Patrick, I'm wondering, you know, you gave a really good example there in terms of the healthcare use case, the distributed collaborators being these hospitals, the aggregator being the university. Certainly, there's kind of other details that are relevant in that that I'm sure, you know, were a lot of difficult things to work out and research. And I'm wondering,

One of the things that I'm wondering, and this might be something that's on people's mind,

just in terms of the climate that we're in around AI and machine learning, is what are the types of models that are relevant to federated learning? It might be somewhat of a shock to people just coming into the AI world that, hey, there are still a lot of non-gen AI models. Actually, the majority of AI models, quote unquote, or machine learning models out there are not gen AI models.

So it may come as a shock to them that there's still a lot of that going on. I assume based on what you said before, that those types of non-gen AI models are relevant to the federated learning procedure or framework. But could you give us a little bit of a sense of the kinds of models that are relevant and maybe tie that into some of the, I guess, just the real world constraints of

of managing one of these federated learning experiments in terms of the compute that's available or the network overhead or whatever that is and what that kind of dictates in terms of the types of models that are currently feasible to be trained in this way. - Yeah, absolutely. So I would say most of the real world deployments of federated learning have focused on non-gen AI models up to this point.

So the example that I had was this 3D segmentation type of use case. There's been a lot of other deployments of these classification models. Really where federated learning has focused on from the framework support perspective has been around neural networks. And a lot of the reason for that is not just because of all of the advances that have, of course, happened for neural nets over the past 10 to 15 years, but it's been because you have a shared weight

for all of those models across each of the sites where they're going to be distributed. And really what I mean by this, and just as a comparison point, so say support vector machines or random forests are going to have something that is

going to be based fundamentally on the data distribution that you have locally at one of those sites. So with neural networks and using that for federated learning, that allows us to have much clearer methods for how those weights ultimately get combined for the purpose of aggregation without knowing quite as much about the data distribution ahead of time. I will say that there are some methods for how you perform federated learning on these other types of scenarios. So federated XGBoost

is something we recently added support for there in OpenFL. There's other types of methods out there that have actually performed pretty well. And I mean, getting back to the Gen AI piece of this, that is, of course, a big area of interest for federated learning too. And we have a number of customers who have been asking about

how they can incorporate, I mean, these large foundation models, generative AI models for the purpose of federative learning and this training in a privacy-preserving way. And to get to your point or the question around

the size constraints that we run into, it's of course an issue for these large gen AI models. We're very lucky to have techniques like PEFT and quantization that can be applied so that you don't necessarily need to be training on the entirety of, you know, 70 billion weights at a time and distributing those across the network because as you scale the federation, there's a

of course, a lot of network traffic that can result from that. So by shrinking that in any way that you can, we can still support those types of models, but it's still, I would say we're having to use these additional methods instead of just base training because size and the

The time that it takes to actually train them is, of course, always a concern. Yeah. And just for listeners that are maybe more or less familiar with certain terminology, those sort of PEFT, this is parameter efficient methods where maybe only some of the parameters of a model function are used.

updated during the training process and creates some efficiencies there and quantization being methods to limit the precision or the size of the total parameter set by kind of reducing the precision of those

those parameters. I'm wondering, we've kind of naturally got into it, Patrick, but you started talking about, of course, requests to add features and that sort of thing. Obviously, in your context, I think we're mostly talking about OpenFL. I'm wondering if you could just give us a little bit of an introduction. Now we've talked about federated learning more broadly, what it is, kind of some of

some use cases, that sort of thing. Obviously there needs to be frameworks to support this process and OpenFL being one of those. Could you just give us a little bit of an introduction to the project at a higher level? Yeah. So OpenFL, Open Federated Learning is what that stands for, has been around since about 2018 and it came out of this research collaboration that we had with the University of Pennsylvania. So what other federated learning frameworks have done is they've really started from a

and then expanded into real world and production deployment. We kind of took this the opposite direction. We had to deal with the real world issues that come from deployment of this framework into hospitals and the challenges that can really result from that. And when I say we, I mean, this is a collaboration between, I mean, my team at Intel, which is more focused on the productization side of how you take these technologies and then deploy

bring them into products, University of Pennsylvania, but then also Intel's security and privacy research lab. So they're, of course, very focused on research as well and have been thinking about security and privacy and confidential computing for quite a long time. So this was really a natural collaboration to bring together research with the experts in this healthcare and brain tumor segmentation type of deployments

to really bring the right features into this framework that started off as largely a research project at Intel, but then has since become a much larger framework that's focused on how you can actually perform this across companies or across very large companies.

types of deployments that involve academia, as well as, I mean, just how you bring different parties together. Yeah. And obviously it's called OpenFL. I'm assuming that people can find it somewhere in the open source community. And also I see there's kind of an association with the Linux Foundation, if I'm understanding correctly. Could you talk a little bit about those things and just sort of the

I guess the ecosystem where people can find things, but also a little bit about the kind of who is involved and, and some of how that's developed. Yeah, absolutely. So, so OpenFL started as an Intel first closed source project, and then we open sourced it around 2020. We've since donated it to the Linux foundation, um,

the data in an AI subgroup of that. And the reason was, is that open is in the name. We wanted this to be really a community driven and own project. And that's the way that we saw this gaining the most traction and success over time. So we didn't want Intel to be in the driver's seat for having complete control over what the direction of this was going to be in order to be truly successful as an open source project. You need to be thinking about the community and,

addressing really those concerns and letting them take the wheel and steering this in many cases. So Intel still has a large representation on the development and roadmap for OpenFL, but we have a technical steering committee that's governed under the Linux Foundation. So I'm the chairman of that steering committee, but then we also have

Flower Labs, who supports the Flower Federated Learning Framework, is also a participant on that technical steering committee. We have representatives from FATE, who is actually another competitor slash collaborator of ours, Leidos, and then University of Pennsylvania as well.

Their faculty has actually since moved over to Indiana University, but they still represent the original collaboration that we had. And they're longtime collaborators of ours who continue to have a strong vision of where federative learning is most applicable for research purposes.

And I guess in terms of usage, sometimes that's a hard thing to gauge with an open source project. But, you know, could you talk a little bit about that? And maybe, you know, you were just at the Flower Conference, you're engaging the community in other ways, I'm sure, at other events and, you know, online events.

Could you maybe talk a little bit about what you've seen over the past however many years in terms of actual, you know, real world usage of federated learning and kind of engagement in the OpenFL project and kind of what that momentum has looked like, how you've seen that maybe shift in certain ways over time and how you see that kind of developing moving forward?

Yeah, absolutely. So I think that it's really picked up since about 2020. We, we were the, we had the world's largest healthcare federation, um, at that, that time. And we published a page, a paper in, uh, nature communications demonstrating the work that we had done. But, um,

It's really become evident that there's a lot of real world federated learning that other frameworks are starting to get into as well. So my involvement at the Flower Summit was, I've actually, so my team at Intel and OpenFL, we've been collaborating with Flower Labs for the last three years or so.

And we're jointly very interested in interoperability and standards for federated learning. So I think that one of the things that we both recognized early on is that federated learning is pretty new compared to just deep learning as a study. And we're exploring

We've kind of seen that things are heading the same direction that they did with the early deep learning frameworks that were out there, where you have a proliferation of them at the very beginning. And then over time, there's more consolidation across those frameworks as one ecosystem becomes more mature or they specialize in really different ways.

So we've been working closely with Flower and other groups on how we can build this interoperability between our frameworks and try to get to a point where we have a defined standard for some of those lower level components, because ultimately we're solving problems.

The same problems over and over again between our different implementations. And there's not really a need to do that. If you've done it once, then if you've done it the right way, then you should be able to leverage that core piece of functionality and then just import it into whatever library you want to. That's really the open source ethos is building on top of the shoulders of giants.

So that's the direction that we're hoping to head. And so at the Flower Summit, we've gotten to the point now where we can actually run Flower workloads. And this is a competitor slash collaborator of ours, but we can run their workloads on top of OpenFL infrastructure. And

Getting into the pieces where we specialize and we do have differentiation. So Flower has done a great job building a large federated learning community. They've done wonders, I think, for the scaling of federated learning and the visibility that's on it. And they have a very close research tie as well. So they're seeing, I think, the gamut

of different things that people want to do for privacy-preserving AI. OpenFL, we've had, because of our history in security and privacy, confidential computing and how you really think deeply about preventing threats for federative learning and these distributed multi-party workloads, that's an area that we've been thinking through for quite a while too. And we have the benefit, being from Intel,

of actually having invented a lot of the technologies for confidential computing, like software guard extensions. So you can run OpenFL entirely within these secure enclaves, which means that even local root users do not have visibility into what is actually happening in the application. And if you engage other services on top of that,

like Intel Trust Authority, that allows you to actually remotely verify that someone else is running the workload that they're supposed to. So part of the vision here and why we're so excited to be working with Flower is that now you can run

As part of the flower community, this very large community, you can run these workloads now inside of these confidential compute environments on Intel hardware using OpenFL. So there's kind of a chain of how all of these things flow. But that's one of the directions that we're really excited to be undertaking with the wider federated learning community that's out there.

So Patrick, that was really interesting for me. I'm learning a lot. And you got me thinking, I'm kind of starting to think about OpenFL in my own life, in my own world. I'm really kind of focused on kind of agentic use cases.

and, you know, out on the edge with kind of, you know, physical AI, physical devices that are doing that. And I'm, and you really got me thinking about all the, the ways that we could apply federated learning in those environments. I'm, I'm kind of wondering, is there, what, what, what,

is that is, you know, obviously a big wave of activity. We're especially seeing, you know, in the last year or so, what is kind of the story around doing federated learning across, you know, physically, not just within, you know, different data centers and stuff like that where you have it, but edge devices where you're storing a ton of data in those devices and you're running a Gentic.

you know, operations and those, and you're wanting to try to, um, to, to, to apply federated learning to that environment. What's the thinking about where that's going and, you know, where it's at now and where it might be, uh, going forward. Yeah. So I, I mean, it's going to be a big area. We,

And we're fully anticipating that this is something that we want to go out and support. So for agentic, you have the neural network as one of the components, and then you have the tools that are actually performing operations based on whatever information is coming from that neural network. So at a fundamental level, we can absolutely support these agentic use cases by training that neural network and doing this in a privacy-preserving way.

So I think one of the areas that's not necessarily that well studied yet, and I think there's more and more focus on this, but how LLMs can manipulate

memorize data in a way that certain other neural networks cannot. And so that's really a hot research area. But depending on, I think, how you train these models and then ultimately how they're deployed. So if you're using privacy enhancing technologies on top of just this architecture where you're training at the edge already where the data is, then you're going to get a lot more confidence that there's not going to be your information that's somehow exposed where the

the model ultimately ends up going. Yeah. And this would be like in terms of memorization, what you're talking about here would be like, Hey, I'm training on, you know, in, in this device, let's say it's just a bunch of people's clients and there's communications on the, those clients that have personal information in theory, right.

an LLM could be trained in a distributed way, but leak that data through the centrally aggregated model. Is that, am I understanding that right? That's, that's exactly right. And we have customers come to us all the time and ask, how can we get assurance that my data is not leaking into the model?

And the best thing that we have to deal with this, there's different types of technologies that are out there. You have differential privacy that can apply noise in such a way that you're trying not to expose anything fundamentally about your data when you share those model weights.

You have other techniques like, I mean, homomorphic encryption, where you're encrypting those models ahead of time before they're actually even sent for the purpose of aggregation. But really, not all of them is completely foolproof. There's no free lunch, as we say. And then confidential computing, it has the benefit of you can actually train in these completely constrained environments where not even the root user has access to this little protected environment.

Encrypted memory enclave. But that ultimately requires that you have hardware at the edge to go and be able to perform that type of thing. So that's really where the challenge lies. And there's other statistical measures of how you can estimate data leakage into the model. We have support in OpenFL for a tool called Privacy Meter.

that actually lets you train a shadow model based on the local training that you've done and then get some kind of graph around what the percent risk is based on the local data distribution that you have and that exact model topology that you've trained on. So there's, I think, increased visibility on how you can try to quantify that amount of data leakage

But there's some cost in the case of some of these technologies at the cost of accuracy for the model overall. So it's really on a per experiment, per model and per data distribution basis that you have to tune these things. And that's where there's a bit of work and recommendations that need to be made from people who have experience in this domain.

And I have a maybe this is sort of a strange question. So humor me, humor me in this one. While you were talking, I was kind of reflecting on the fact that maybe the landscape is shifting a little bit around privacy in general and AI in the sense that.

For whatever reason, people seem to want to send a ton of their data to third-party AI providers now. And I think gradually people are becoming more sophisticated in that and sort of understanding the implications around that.

sending your data to third parties in the sense of using third party AI model providers from model builders and not running that in their own infrastructure. But there's definitely a wider, like this has opened up the topic of privacy to a much wider audience. And maybe people that aren't so

Before, there was sort of this, maybe this discussion around federated learning amongst data scientists, researchers, those that are trying to train models to be better and better. It seems like now there's this wider discussion about privacy and, you know, AI providers and a lot of people talking about this. And certainly, you know, we've seen people that

we're engaging with, of course, to build out private AI systems of their own. But I'm wondering from your perspective, you're kind of in the weeds or in the trenches, I guess is the best word in terms of helping people with their actual privacy concerns. Have you seen the landscape or perception change in one way or another around kind of AI plus privacy post the kind of

you know, chat GPT era, if you will. Yeah, absolutely. So OpenFL, this is the open source project that my team directly supports, but there's another kind of division of where my responsibility lies and that's building on top of OpenFL to really address

a lot of these customer concerns. And my team is actually building a service on top of OpenFL called Intel Tybur Secure Federated AI that makes it a lot easier for corporate customers to go and deploy secure federated learning. And so for a lot of the people that we're talking to, they're really concerned about

I mean, they have these foundation models that perform really well on their local data sets, but they ultimately don't have access to the data that's being generated at the edge or some of their subcustomers that they're working with. They're not necessarily experts in federative learning ahead of time. And so we've heard from many different parties that if...

there was a service that could actually provide a lot of the infrastructure and, uh, recommendations for them ahead of time to go and deploy this easily. Then this is something that would make it just a lot easier for, for them to actually perform a lot of these experiments and vet whether this is something that's going to work for, for them over the longterm. So I talked about the use of confidential computing earlier and how that can be successful for, for this type of thing. And, um,

That's an area that we've been trying to really specialize and make easier for a lot of our customer base. So if you have technologies like Intel SGX that are available across the extents of the parties that are participating in this federated learning experiment, then that gives you some really nice properties. Not only can you remove these untrusted administrators from the threat boundary, but you can also verify that your model IP, so

the model weights, but even the model topology itself is not something that is divulged to anyone that shouldn't have access to it. So how to protect your intellectual property. I mean, that being of course data, and that's really one of the main focuses of federative learning is not revealing that to prying eyes, but the model itself too. I think for a lot of our healthcare customers, they'll spend millions of dollars going through FDA approval. And so the

having that divulged to someone represents a risk to all of the work that they've done prior to that point. So we've been hearing this from a number of customers for years, but I think there's a

as you've mentioned, more visibility on it because of generative AI. And I think the doors that it unlocks for what the benefit is of actually deploying these models in the real world. I'm curious, I've learned a lot through this conversation. And as we, I think I probably came into it, and we've had previous federated learning conversations in the past with folks

And I think I was still kind of stuck a little bit on kind of distributed data being the driver of federated learning. And you mentioned earlier that, you know, it was that, but more than that, it seems to me in this conversation that these concerns around privacy, which can take many different forms, you know, from protecting, you know, individual personal data to IP protection, to regulation, to whatever, you know,

would it be fair to say that these might be the primary drivers of federated learning? Because it seems like that's really where this conversation has gone over time rather than what I was expecting, which was more just distributed, you know, and I brought up the edge thing a little while ago. I'm just wondering, do you think, am I getting that? Am I on the right track or in terms of getting what the drivers are these days? Absolutely the right track. And when I talked earlier about the difference between

Participants in the architecture for OpenFL, where I mentioned the collaborators and the aggregator, that's really sufficient for a single experiment when everyone inherently trusts each other or there's some central body. And so the parallel here with the University of Pennsylvania and the Federated Tumor Segmentation Initiative, which was this world's largest healthcare federation, everyone trusted the University of Pennsylvania that was ultimately deploying these workloads.

As you scale federated learning and you have people that you don't necessarily know that you're welcoming into the mix, you need to have some other way of establishing that trust. And so governance is really the piece that's missing from OpenFL. And that's where we built on top of this with the service that we've established. So how you can vet the models ahead of time, how you have a central platform of actually recording that information

Different parties have agreed to the workload that is going to run on their infrastructure and having this unmodifiable way of establishing what the data sets are that you're going to be training on, who the different identities are that are actually participating in the experiment. Governance is a huge concern for a lot of the customers that we've been talking to. And if you want to have cross-competitive types of federations where you might have

two different pharma customers who have a lot of data they've generated internally. They have mutual benefit by working together for training either one of their models on their competitions data. And they might have some kind of agreement that's set up for what ultimate model is generated that they have a revenue sharing agreement or that type of thing. Having a platform for being able to establish that type of collaboration in a competitive environment is really where we...

see federated learning going over the long term. And we're trying to figure out a way to get there. And yeah, you already were kind of going to maybe a good place to end our conversation here, which is really looking towards the future. You've been working on OpenFL and

And these other efforts for some time now and been engaged with the community as you look forward, what's most exciting for you in the coming years? Yeah, what I think is really exciting is, I mean, the collaboration between the different parties that are out there, I think right now is really exciting.

I think motivating for, for me personally, because there's, there's the spirit right now where everything is new and exciting for people who are deep into this field and people want to figure out how to just push everything forward. And I think generative AI has really been a catalyst for, for that. And in terms of figuring out how we can get people,

access to this siloed data that's out there and how we can do it in a way that actually enables industry to take up these things. Because we don't want for federated learning to sit in the research world forever. We want to actually take this forward and make it one of the main methods of how you do machine learning at scale when you have these privacy concerns that are, of course, extremely common. They're common for companies. They're

common for individuals. So opening up those silos is really one of the things that I think there's going to be a lot of benefit by doing that. And it's going to come, that benefit's going to come in the form of much more, or we expect much more accurate models over the long term and much more capable models because of just the increased access to data.

Awesome. Well, that's it. That is very exciting. I hope to have you back on the show very, very soon. You know, next year, whenever whenever we see some of that playing out, appreciate your work and, you know, the team's work, the wider the wider communities work on what you're doing. And yeah, keep up the good work. Thanks for taking time. Thank you for having me on the show, Daniel and Chris. Really appreciate it. Thank you.

All right. That is our show for this week. If you haven't checked out our ChangeLog newsletter, head to changelog.com slash news. There you'll find 29 reasons, yes, 29 reasons why you should subscribe.

I'll tell you reason number 17, you might actually start looking forward to Mondays. Sounds like somebody's got a case of the Mondays. 28 more reasons are waiting for you at changelog.com slash news. Thanks again to our partners at Fly.io, to Breakmaster Cylinder for the beats, and to you for listening. That is all for now, but we'll talk to you again next time.

Federated learning in production (part 1) 44:38 Share

Practical AI: Machine Learning, Data Science, LLM

Deep Dive

Shownotes Transcript

Federated learning in production (part 1)