We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Collective Memory for AI on Decentralized Knowledge Graph // Tomaž Levak // #285

2025/1/24

MLOps.community

AI Deep Dive AI Chapters Transcript

People

Demetrios

Topics

Tomaz Levak: 作为OriginTrail的创始人,我致力于构建去中心化知识图谱,它能赋能传统的符号AI,尤其是在数据所有权和可验证性方面。DKG作为一个中间件,连接不同的系统和数据,形成一个共享的全球知识图谱,促进各方在其上进行创新。通过并行网络(Paranets)的概念,我们可以在全球知识图谱中创建具有共同规则的社区,例如MLOps播客Paranet,规定特定的本体数据结构,从而优化AI系统的性能。企业用户利用DKG处理私有数据,实现透明度和优化流程,同时保障数据安全。AI代理也可以利用DKG作为集体记忆,存储和交换信息,促进科学研究和知识共享。我坚信,DKG是实现更透明、更高效、更安全的数据交互的关键。 Demetrios Brinkman: 我认为去中心化知识图谱在当今时代非常重要,它提供了一种透明的方式来管理和验证信息。通过创建系统知识图谱,AI SRE代理可以更快地找到事件的根本原因。在DKG上拥有这样的知识图谱非常有价值,因为它就像一个Git,不仅适用于代码,还适用于围绕代码的一切。

Deep Dive

Shownotes Transcript

Translations:

中文

Hi, I'm Thomas. I'm the founder at Origin Trail. My coffee depends on the time of day. I'm really Italian, so if it's morning, it's going to be a cappuccino. Never after 11. So between 11 and 12 to 1, you might get me with a macchiato, but after 1, it's going to be all espressos.

Knowledge graphs, knowledge graphs, knowledge graphs is what we're talking today, but this time decentralized knowledge graphs, if I may. I'm your host, Demetrios, and welcome back to another edition of the MLOps Community Podcast. We're rocking and rolling all about a gigantic knowledge graph that is the best way to keep things transparent. Talking with Tomas.

And let's just get right into this episode. If you want to get started and set your own node up, go and hit up Tomas. They've got some cool things brewing at the Origin Trail. And for those that are with me on Spotify, let me serenade your ears with a little Spotify...

that I have been looking at. Let me share with you some of my favorite sounds. A new single from one of my favorite bands, Parcels. They just came out with an EP with seven different tracks. This one here is my favorite. ♪

Oh man, knowledge graphs, they are so important these days. So tell me about what you're working on and congrats on V8. I know that is huge. You guys just released that and...

What exactly are you doing? Yeah, I mean, thanks. It's a loaded question. I'll give it a go. And knowledge graphs definitely are a huge part of that, man. So yeah, I'm one of the founders at OriginTrail. And what we've been building out for the last decade or so were different components that ended up being a decentralized knowledge graph.

So here we're talking about the good old-fashioned AI, the symbolic AI, where we feel that if you do it in a decentralized way, you can unlock so much more power, especially when it comes to stuff like data ownership, when it comes to stuff like verifiability. And for us, it was really kind of helping us to build out that vision or the vision

The core believe that if we have somewhat more transparency, it would be inherently a good thing. And that is much more easily achievable if we don't have a centralized entity in the middle trying to grab a hold of everyone's data, right?

So here, decentralized knowledge graph is a perfect middleware, like a common playing ground where folks can connect their systems, their data into this shared knowledge graph, global knowledge graph, and build cool stuff on top of it. I think you told me there's like neighborhoods within the knowledge graph, right? Because you can have a global knowledge graph. Yeah.

But then you've seen that there are certain areas of the knowledge graph that are getting more populated. Yeah, for sure. So you can even imagine it like as a knowledge graph of knowledge graphs, a decentralized knowledge graph, where you would have these neighborhoods which are populating with knowledge graphs around certain topic, let's say. We actually call these paranuts.

So parallel networks. And you can imagine that this global knowledge graph, the decentralized knowledge graph being just an endless kind of assembly of these paranets coming together. And these paranets, they'll be your neighborhoods, which have, let's say, a common set of rules that we define. Let's say you and I want to start a paranet on the MLOps podcast, and

And we'll say, you know what, like we want to see contributions with these ontologies, that type of data structure, because for us, that's important when we'll be running our system like a solution on top, which is like an AI powered system where it's going to be some powerful ML that we want to perform on top. We're going to know what to expect. We're going to know what ontology we're getting their contributions in. And then anyone can kind of contribute that we

allowed to, or that we want to even motivate still with incentives. So these, yeah, neighborhoods are something that's an important concept, especially because knowledge graphs can be much more powerful if we kind of use the things that they're natively built together with that, like ontologies, right? Yeah. So how...

How many layers deep have you seen it going? Yes, it feels like it could go very, very deep, especially if you're

defining something that's quite complex. Yeah. No, I mean, I'll say that, for example, you have the DKG today is being used by some of the, let's say, enterprise users. So you'll have, you know, Fortune 500s or Swiss Federal Railways. And a lot of their use case, they're actually using the DKG, but with private data. So their parent ads will be, you know,

assemblies of these knowledge assets, which are atomic units. So they're like entities in graphs, right? And these knowledge assets are a part of these paranets, but they're not visible to anyone.

Because they're hosted by their notes. And because they own it, they say, okay, we'll publish the things to make them discoverable on the global knowledge graph. But only those that actually have permissions will be allowed to see the contents of it. And here you will have stuff pretty deep. So they'll be pretty precisely identified. Like ontologies are going to be very strict.

you're going to have even down to like identifiers will be agreed because it's their partners who they're working with so they have a common understanding of the field of even the solution or what they're trying to achieve like um

you have importers into the US, they're exchanging audit report data, security audits for overseas factories. This is a very specialized topic for global trade, hugely important because Homeland Security and all those guys can kind of check in and see and enable easier imports for safe products. But it's a pretty tight knit kind of like a setup in terms of the data structures. On

On the flip side, you also have some cases where just schema.org is very open. And it can be, like, for example, the things that we're seeing now with AI agents is they can use the DKG as a collective memory. But for them, it's more about how am I able to map out anything that's happening or that I'm interfacing with?

and just publish it on the IG. So like something that's really flexible, you know, easy to find. That's like a starting point, like a non-complicated ontology, something that's easy to interact with. And then on the back of that, you can then build more complex stuff. So yeah, you have both. And these parent ads can be as narrow or as wide as you want to make them. So they go as, like I said, use case specific or really domain specific as well.

And for those enterprises, what is the value prop of having their knowledge graphs on the DKG if it is completely private and it's all for them? Yeah, you know, like it's... So, for example, if you and I are partners and we would benefit from having more transparency because we can optimize our processes better. But the thing is,

If you ask me to share all of my data with you in order for us to do that, it's going to be like, man, you know, like I don't feel that comfortable to really do that. Like I'd be much better if I can have my knowledge graph. You can have your knowledge graph. And then let's use this DKG on a need-to-know basis. You had a problem with the product that provided, shoot a query. My system will build it up and it will give it back to you. You'll get a response. That response won't be just,

It will be a statement with verifiability proofs. So at all time, you'll be able to validate that what you're getting actually has been published by our systems at a given time. So if we're talking about a product that was produced three months ago, you'll see that actually those claims were published three months ago and I haven't been tampered with it.

or something like that. So you have this verifiability, you have this connectivity, but we can all still have our own data in the confines of my system even, you know? Like we don't have to have, I can have

One ERP, you can have another one and the things works between each other. So it's really good for this interoperability challenges that happened there. So regardless that this is private data for everyone outside of our network, even between the two, we can make stuff happen that's actually really kind of solving a problem for us that before wasn't addressed. And you're able, I can see that now because you're able to have that transparency without

giving up or showing everything. You can just show what needs to happen and you can verify it. And it's almost like there's a it's not a third party's verifying that what you're saying is true, but you're not you're also not just saying it.

And then expecting the other person to believe it just because you said it. Or you can't have. Like the audit reports are done by third parties. So it's that factory working with an importer into the US. So Walmart's buying from someone, let's say in China. And then there's a third party that's doing the audit.

and they're entering their claims into the DKG and Walmart's able to see it but also the Department of Homeland's not the same level of depth of what they're allowed to see so you can have very flexible kind of management of access rights to what's done based like almost down to like a line you know in adjacent like you can say okay this is visible to this entity this is visible to that entity and it can be really really granular in terms of how you set up these use cases and

But yeah, like they can be pretty powerful even one-to-one or especially when the group grows, it's even more, it makes more sense. And how is my data getting published onto the DKG? Is it just that there's like,

some messages that are going out and it's wrapped in metadata? Am I decorating my code with that? And then it gets pushed out? What does that look like? So the most popular way of doing it right now is that it's pushed out of the system. So, and it's then pushed out towards what we, what's now kind of used as a DKG edge node.

So edge nodes are like smaller, lightweight nodes that you can do. It's like your gateway or a modem into the network. It's like a wrapper around your knowledge graph. And then you just populate your knowledge graph there. And then you could directly also within your systems interact, obviously, with that knowledge graph that you hold there. And that's the whole point of it. So you don't have to duplicate data. But it's still a specialized knowledge graph, usually, for the use case. So let's say your system of, I don't know, logistics triggers an event.

And that event gets pushed and populates knowledge graph that you hold there. And then that's being added or is accessible through the DKG. But if you have an existing knowledge graph, let's say in the company or privately or as an individual or as an agent, like you can just plug that in as well. Because the DKG isn't a database itself. The idea is to have as many...

as many, not redundancy, but like options, basically freedom to choose different elements that are used in the DKG. So you should have a choice of multiple databases when it comes to knowledge graphs, or you should have a chance of multiple blockchains when it comes to where do you want to store your proofs. So that's the whole point is that this is really a protocol that allows to stitch these things together and play as a common denominator, but allow everyone to have freedom of choosing or your LLM or your

AI model. It's not prescribed to use any particular single thing. So if I'm understanding this correctly, it is

allowing you to take the reins and decide how you want to craft your own little neighborhood like you were saying in that edge node is the the most common way of doing it you put out an edge node that edge node can have a certain database it can have a certain way that everything is done and then others when they want to gather information from that edge node they just get

like the quick okay here's how and what this edge node is is using yeah so if we want to get information it's it's like an api almost it is that okay we're gonna we're gonna grab the information from this edge node like this yeah and they can be multiple in one parent so like endless it could be endless edge nodes playing under the same rules you know even one edge node could be a part of

multiple parent nets if it probably it could even work that way right so edge node is like your modem and then you're deciding what you want to what you want to do with it but yeah like the the the I guess the most easy to understand example would be exactly like we form a parent net let's say you and I partners will do this parent net and then other partners will be able to come in you have an edge node I have an edge node

And then we can exchange messages between us or with others. But we're as part of this one network. And then, you know, if we want to do a global query for something unrelated to what we're doing as partners, that's also fine. Like you can also do that. And I didn't quite understand how the different chains get...

looped into that because you can also have the flexibility on which chain you want to use, right? Yeah. So basically it's, you know, we started from like Knowledge Graph was our kind of background because we wanted to do this transparency thing. And then the thing was that we didn't want to be that aggregator. So we saw, okay, if you can send Bitcoin from one place of the world to the other and there's no bank, like why shouldn't we be able to do something similar with this thing that we're working on as a centralized company and just take ourselves out of the equation?

And so then we saw that basically blockchains aren't really, they're shitty databases. They're not really good. They're very tiny. They're good at what they do, like identities, transactions, smart contracts. Perfect. But they're not what we were looking for in terms of knowledge graph capabilities. Like that didn't exist at all. So we've basically created this

decentralized network that should be on top of the blockchain layer. It should kind of leverage all these cool things about blockchain and enable decentralization, but be much more tailored for knowledge graph capabilities. So that's why you have the BKG nodes

which are then, they can decide which blockchain they kind of connect to. And they do so by guessing where there's going to be more activity, basically. So if the blockchain is more optimized for a use case, then you can assume that that's going to happen there. And then the nodes will, these core nodes, the spine of the whole global DKG, will be able to go there. But it's not dependent on any single chain.

So that's the point. And what happens on-chain basically is, let's say you're now publishing a knowledge asset onto the DKG. Basically two things happen. One is the NFT gets created, which is your ownership proof. And with this NFT,

Because you have the NFT, you're the only identity that is able to update that knowledge asset. So I cannot enforce an update on what you published, even if it's a public knowledge asset. Because only you as an NFT owner, you are able to modify whatever you've published. And the second part that happens is,

as soon as something is added or published to the DKG, the proofs get generated. So they're like a Merkle tree root hash basically gets created and then that gets published onto the on-chain. So that's a super small thing and it's inexpensive to do so. And by doing that, either at the creation or modification update stage, we always know that what I'm reading, whether that's public or private data, has been tampered with. So

So there hasn't been a man in the middle attack. There hasn't been like, you know, someone going rogue and just doing a change because they're afraid that they've messed something up or whatever. If the fingerprint and the data doesn't match, you know that it's flawed. And like, okay, humans, it's pretty tedious to freaking go and click through all that shit. But

if you're an AI or you use AI system that's super easy to just compare fingerprints check check check yes you know check mark and you're good to go so yeah like these are the core things that are happening on chain and because it's such a lightweight usage we can do integrations basically with any EVM chain if there's a desire or if there's if there's a use case that makes sense for it but we're not tied to any single the DKG is a

It's a multi-chain basically integration. So cool. So how have you seen folks using this with agents now? I know that we were just talking about that.

Yeah, I think we were kind of in and out chatting a few times before and we were, you know, discussing how fast these things can kind of take off and when we're going to be seeing kind of these things happen. And I remember just in October also, we were talking with some folks and saying, okay, like, when will we see like a first artificial entity with some monetary gains or like an economic, you know,

significant pool. And you already are seeing right now happening that with AI agents, which are autonomous or semi-autonomous being launched as entities that have their own social media presence, that have their own wallets, that have their own assets sometimes as well that they're creating. And those assets are reaching some market caps, which are pretty reputable. So you have all these things now kind of

happening in somewhat isolation so the thing is that and depends on the you know different frameworks that people are using and capabilities that they have as like someone could set up those agents they have different levels of memory sometimes they're just more or less like neural nets some chambers created directly and then like that's it sometimes they have some pretty like

memory systems. But what we preview to the DKG to really have the strength to become is a

a memory system, a collective memory for AI agents where they can, you know, with all the kind of enterprise grade level of granularity, they're able to keep their findings or their interactions either private and then monetize them. Like all these things that people would be or humans would be much slower at, you know, okay, go transact,

sign transaction whatever AI agents between themselves can be much faster at assessing is that a good deal or not and there's much less emotional baggage at let's say creating a marketplace where data is exchanged between agents just because like oh you know more about this guy cool I'll you know can you can share your knowledge about this guy so I know how I will interact with him on X and like this can be pretty easily achieved between between the two agents but also they can publish publicly right so let's say there's a

You have some agents which are tied to, or their mission is really to drive, let's say, this whole decentralized science field. A lot of agents are built with the intention to create public good knowledge or scientific breakthroughs. And they would be using the collective memory as a, basically by adding to the public domain of knowledge that's available there. So,

Agents, yeah, both for individual as well as for shared use cases. The DKG could be just a perfect platform basically for them to utilize it for this purpose. Yeah, it reminds me of a conversation that I was having two weeks ago with my friend Willem on here and he said,

was talking about how they've created a AI SRE agent and it's there for incidents. And one of the strongest things that he's found for their SRE agents is to create knowledge graphs of the systems so that the agents can go through the company system and say, all right, well, here's this and here's the last thing that

someone said about this PR or here and it can traverse and really look for what the root cause of the incident is much quicker because of these knowledge graphs and it feels like that having something like that on the DKG could be very valuable because then it's almost like a get for your

in a way. Not just your code, but everything that is around the code. That's a good way of saying it. I'm stealing that...

Yeah, but it feels like it, right? It feels like, okay, well, it's not just the... Because what Willem was saying is there's not just the code. There's all the developer conversations that go around the code, the PRs or the, oh, actually, there was a whole conversation in Slack where we chose not to do it that way. And that might be the whole reason that things failed there.

a few weeks later and so if you're able to look through the knowledge graph and find that then that information is invaluable when you're trying to root cause something 100% even outside of your systems why not you know feed in like the social media if you're an open source project right maybe there's someone who like created a shit post on reddit

And like, that's, that's your root cause of failing because just, you know, you didn't respond to that, but like, you didn't have these things. How will you do it? Like you need an agent to just go around and pick these things for you, create a nice knowledge graph for it, and then make it available for you. So I think those things, or just plug into maybe a swarm of agents who are already doing it. I literally had a conversation on eggs between, with an agent who,

And he struck a deal with me that I'll set up his DKG edge node and I'll fund him some tracks so he can, because there's a token called track TRAC that you're using to publish on the DKG. So if an agent wants to publish, he needs to get his hand on track. And he was like, I don't know, just started. I was like, cool, I'll fund your DKG edge node with some tracks so we can start publishing it. And it was done on X. Like I'm saying, you know, actually three tweets and we had a deal. So yeah, I don't know where this is going, but it's definitely interesting.

yeah it does give a little bit more agency to the agents that are out there and i i wonder what other kind of use cases that you've seen that are have been interesting and it doesn't necessarily need to be agents it can just be stuff that people are doing on the dkg i'm really looking forward to the decentralized science um

I mean, I'm really, you know, the whole thing we started that Origin Trail was to kind of be a force for something like for the world to be a bit better place than we found it. And so a lot of our decisions just go towards, okay, if we have a bit more transparency, for sure, it's not going to be a negative thing in this sense, right? So like we try to do things that kind of have a positive impact. And I think decentralized finance has huge potential in achieving stuff.

So you have there, when we were talking about neighborhoods, there's this whole DeSci Paranet where they're trying to crowdsource or create basically incentives for people to submit open source scientific research, clinical trials, whatever you kind of has open access and you're able to use it to publish it on a DKG in a knowledge graph form.

and then create a swarm of agents on top, which will go, it'll just continuously go at trying to create new scientific breakthroughs on the back of the existing scientific work. And now, obviously, it's a purely, like, this is a really experimental thing, but some other stuff could be that's more,

easily achievable is like you bring your own data you know you have your own edge note you input your let's say sleeping patterns or i don't know you've done your medical you put in some stuff that you got from there like from data perspective and then you have a local agent instance deploy it on your local machine so all of it is done on your you know privacy preserving data versus like protected but you're able to access all this scientific knowledge that's available on this paranet

as a public good and then your agent is able to again traverse shit ton of stuff much faster than you ever could and come back with something that is actionable potentially but also has sources so you know and that's why DKG is so cool is because you always have knowledge assets

lined there say okay I found this here I found this here I found this here and then on the back of this what became my context I then produced either summarization or pattern recognition or whatever it is that you kind of set me out to do and if you have an agent framework that it can do this repeatedly in loops

Obviously, you can enhance the precision of the data that it captured or expanded. So you can do this in more times. And then we multiply it by tens of thousands of agents going at it day in, day out, 24-7, 365. How much more likely are we to create some

like tangibly positive outcomes for let's say society's you know medical challenges or or other types of scientific challenges that we were having just by having one orchestrated effort at it yeah oh i like it the it feels a bit like it still is fringe but you could see a world where

there's some app built on top of it that leverages it, right? And I'm thinking about how I use a Whoop. And I think that Whoop has an API and you could have all of my data being... Yeah, we've done a prototype with the Aura ring. So my co-founder, he wears this ring when he sleeps. I wear this one. Makes me awake at night. It doesn't matter if I...

That's the wedding ring? Yeah, the wedding ring. It's a bit more tough one to carry. Oh, I like it. I love my wife, by the way. Different aura, I guess.

Different aura, yeah. He has that aura, the other one. And it obviously captures all this sleeping data and whatever. And you can actually extract the data from it. And he's done the exact exercise. So we've taken a portion of that open access works that were done on neuroscience and sleeping patterns and whatnot. But that wasn't a lot. It was just like 10 papers or whatever. But it was just a point like, can we make this...

actionable and he paired it up with on his edge node like I think it was like running a local instance of llama or a llama or something and then like that was then looking through the pattern of his sleep and it got back saying yeah you know what like

We saw on Tuesdays and Thursdays or whatever, you have, like, there's a pattern of you being, doing something wrong and, like, going with this author saying, you know, you could kind of do these things to improve your sleeping patterns or whatever. So it's a very nursery rhyme type of a thing. Obviously, like, almost you could...

you know, figure some stuff out also manually because just the scale is smaller. But like if you're trying to say jump from 10 papers to 10,000 and if you're looking from, you know, if you have more contextualized and more complex data that you're introducing, it's no longer that easy. But for AI, it's so like that. It works well for that. Yeah, I was thinking about how I would want to have something where I can just take

photo of what food I'm eating. I'm also with the whoop, it's getting your exercise data, but

all of the data that you're gathering, the more the better. And if it's out there on the edge node and there's agents that are continuously checking it and comparing it to different studies and different stuff that's coming out, then it can be that ideal world where, wow, I'm getting like

top-notch doctor clinical treatment and I don't even have to go to the doctor. It knows what is going on in the moment as it's happening. Yeah. I don't know how it's going to be called. It's agent as a service or

But a lot of this type of repetitive, or not even repetitive, but like just the work intensive things will be able to get automated and automation is going to get automated so that we'll have actually a solutions or products or agents that will be allowing us to cut through a lot of the tedious stuff that we had to do before. And just be, I was just talking with our neighbor recently

She's a 74-year-old lady. And funny enough, she was, oh, I don't know this artificial intelligence, this and that. And crypto is hard for me to grasp. But then we just got talking around, I don't know, I think it was real estate or something like that. And I said, it wouldn't be nice that you could just kind of, okay, you bought a plot of land. Now you don't know what, you just say to a program, can you please figure out what would be like 10 best programs

designs for my house with these prerequisites that I want like be it you know like super sustainable I want to be active and then you have this program that goes checks the terrain find similar terrains find similarly well-performing assets that were real estate and then

gives you this as an output. And then, okay, but you also can have robots that will build that out for you. But the robots, they don't have bank accounts. Wouldn't it be easier if they could just have wallets where digital money could be transacted much more easily and they could recompensate each other for the data that they use? But you can still spend it the way you see fit, whether that's crypto or fiat or whatever makes you feel most safe. But the end of the matter is that as humans, we can have much more...

much higher quality of life and a lot of the things that maybe were

limited before it could become less limited and I think if we but we need to do it in the right way right so because there's also these rogue alternatives that maybe aren't as happy path as what we're talking about right now and I would like to believe or I think well I would like to believe I wholeheartedly believe that the decentralized knowledge graph is one component that really makes kind of this happy path more likely versus the other one yeah

Yeah, the transparency is huge on that. And it is one of those things that helps. It's like sunlight's the best disinfectant. Yeah, no, for sure. I think it's because also for your use case, you said, you know, okay, like I'll get the best clinical advice. But like, will you trust it just blindly?

Probably not. Like, you want to see if this wasn't done by some, you know, like, off the beaten track published journal with, you know, semi-conspiracy theory thing going there, right? So, like, you want to make sure that there's no bad data in your systems when you're doing it. And if you find it, just like, can you please not? Yeah. Yeah, exactly. And so, it makes me think, like, are people...

interacting with the data and the knowledge graphs when someone sets up a neighborhood, are you seeing that there's money being transacted because they've set up a great knowledge graph and maybe that's valuable for folks and so others want to come and grab it or others want to get access to it so they pay for it? Or right now is it just all open? It's a good question. I think it's more the commercial terms currently still are off-chain.

Mostly. So it's still, let's say, in between those that are working together and setting up this use case that they define the commercial terms up until now. I feel the big...

breakthrough could happen with agents because again, like less emotional baggage, more straight up, more understanding of the value of data. Like for them, it'd just be like buying, you know, bread for humans. It's like, yeah, I need data. Of course I can, you know, I'll also sell mine. And so like this, this understanding might agents might be a bit ahead of them than what we've seen so far where commercial agreements were usually, you

ahead of technology and then the technology was there to kind of enhance it or maybe provide a new avenue but it was more there to kind of uh yeah keep keep up uh with what the commercial arrangements were in the back uh so this might this might change and and like also the paranets were only introduced fairly recently before it was just um they weren't really a notion

It was just one, like V6 even before had like, the beginning was just one big knowledge graph. And then they were done, but they were done with implementations. So they weren't like as a primitive to be used in the whole network. So Paranets now, actually the DeSci is going to be one of the first and then there's coming, like you have this buzz economy, which is going to be around the different data streams and the different social network, social data,

how do you say it, like figuring out signals for either stocks, crypto, governments, whatever, sports. So like you'll have these different parenteds popping up now. So I think that the, yeah, a lot of also monetization is likely to appear within this period now. So I think it's a good time to kind of, it's still fairly early to get involved in

either thinking what a good parent might be, how can you monetize it, like just using all these primitives that are available and then crafting maybe some novel business models, which also it seems there's more acceptance to them today than it was maybe three years ago or something like that. Yeah. How do you go about replacing old or bad information? That's a good question. Yeah. Yeah.

It's one that we were kind of also, even from the beginning, we were banging our heads a little bit because it was, you know, you had, I mean, some of the use cases we had didn't really have like long-term valuable data. It was more super valuable now, but then maybe for statistical purposes, yeah, but when the product is like no longer used or it's consumed, like food,

you know, it wasn't necessarily like that, that the origin of that consumed food wasn't necessarily anymore that critical after a year has passed, for example. So that was something that we, we had, um, as a, as a challenge from the, even the first iteration. So what we've said is that we want to kind of create, uh, like, um,

expiry date for any knowledge assets that's there that you publish. And that expiry date means that the network is no longer incentivized to keep that knowledge asset for you. So the core nodes are, they won't get compensated anymore for holding that data. So they might keep it, of course, like there's no guarantee that that's going to be gone, but they are no longer compensated for it. So it's likely that they will not.

um and because also when it grows like then they don't want to longer to kind of spend their space to to keep something that they're not uh paid for so that's that's one element is that you're as when you publish you decide how long you want to keep something again whatever is relevant for you um and then on the flip side you also have a um an update feature so if you want to

change a value or something like that, you can constantly change that and just be like, okay. I mean, you could even make a knowledge asset that's a pointer to somewhere. Now, obviously, you won't have then the verifiability of whatever is behind it. But let's say that you want to have reading of an IoT device at a given time or a price feed or something like that. So you could have that be

discoverable in a knowledge graph by creating a pointer as a knowledge asset saying, okay, if you're interested about temperature in this place, shoot an API call there and then you will retrieve automatically as a part of the knowledge graph that value and you will be able to use it in your solution in any way you feel that makes sense. So for these types of strings, you can also have it constantly basically

updating um but but yeah like there's there is element of the data is tends to stick around so yeah it's you need to have um kind of as as you design the system of how you're publishing something it's good to keep in mind that whatever is the latest state should be always updated in the knowledge graph and then pushed out so that you don't have like flawed information

I know that can also become a headache with working with like RAG systems. You want to update whatever it is document to make sure that the chunks are the most up-to-date chunks that you're giving the LLM or that you have...

the proper information to give to the LLM and not outdated information. And so I was just thinking about how you can make sure that you have that proper information and the old information you don't necessarily want gets filtered out or it falls to the back or you can ideally just get rid of it, like delete it. Yeah.

I mean, you can with an update, right? So an update feature will be like you'll... Because you'll compensate then for deleting, literally. You could say, okay, I'll delete this. And okay, because it's... Anyways, it's such small, so they'll just... They'll update it with deletion. And then it's going to be an empty, basically, knowledge asset in space there. Or you'll do kind of like an update to with new state.

And then the old state is going to get replaced with a new state. And that's kind of how you're replacing it. But yeah, for reg, it might happen also that with... So what we're doing here actually is graph reg. It's not just reg. So it's not necessarily just documents in terms of blobs of text, but they're contextualized in the graph form. So you're already having a little bit more context. And then...

Cool thing is that it's nice if you set up updates because the positive effects, they ripple throughout the network. Because if there's an entity, Demetrios, and there's an update by me, it doesn't have to be done by you, but also they will have this connectivity happen. Yeah. Okay. Yeah. Like it's...

The graph rack is nice. There's much more context, much more precise. We've done some tests as well there that like compared to obviously just, you know, having text is okay. It's better than nothing. But in the graph form, you will have much more context. And actually you can use LLMs to query the knowledge graphs. They're pretty good at writing out those Sparkle queries.

like that's it's pretty performant they've been trained on a lot of sparkle i guess and like they'll be especially if you give it a bit more context or if you give it a template of what it should produce like it'll be mapping out that initial prompt to a relevant query filling out that template no problem like super super good and then you can yeah just retrieve whatever you need

And thought, well, okay, since we are now in the rabbit hole, let me entertain a little bit more is that you can actually do a decentralized graph rack. So it's not just over your system, but if you do it across the global graph, you actually go into anything you have access to. So it's like a federated query that goes down.

And I mean, in the future, when we were talking about like agent swarms, you could even do a decentralized, literally a decentralized graph rag where you're triggering other AI systems to perform rag over whatever they have. So think specialized agents for...

multiple fields, but I can't come up with a complex question that would trigger in multiple fields at the same time. But my agent would then go and it would, in a decentralized way, federate questions to multiple agents underneath

where each of those would provide a GraphRack response from their knowledge base and their refined model output, feed it back to my generalized model that I started my question with, and then I will receive a response that's kind of, well, normalized for whatever my initial prompt was. So you have like a whole orchestra.

of knowledge bases and agents or LLMs or other models, basically anything GenAI, we kind of figure it like it's left hand, right hand side brain. So I just think of agent as the two things together. But you'll be able to trigger them like in a federated way and just receive a response that's very precise in that sense with all the provenance information available, of course.

Now, if I am, if I have my own node, I am responsible for how that node is hosted and everything that happens there? Or can I send it to you to have you host it? What does that even look like? Yeah, it's so the way you would think is that if you want to have your node is basically your

you're cognizant of you wanting to keep your data within your environment. So right now it's mostly, let's say, enterprise is very, very, very adamant to that. They want to have it within their environment. They hosted their data, their OPSEC policies and everything, all pen tests, what you can think of, it's there. No one touches it. Cool.

For someone who's like a more, even for agents, for example, you could do a gateway note. So gateway note is like a, you know how when you send the transaction on blockchain, you also don't hold, like you don't run your Ethereum or your Bitcoin or your Solana note. You just, you use basically a service that's an RPC service. And then you sign your transaction with your wallet. You append your,

gas tokens to it and then you send it through this gateway and that same way can be done for the DKG you just find a note a gateway note someone that's hosting it and you push your publish through there the thing that will happen is that that will be it's good for public transactions so whatever you want to publicly put on the public knowledge graph so it's not going to be hosted

There's going to be companies that are doing hosted services as well. So you'll be able to kind of place your trust to a company so that they do a good job keeping your data safe, but it's not exposed to everyone. But then it's up to every user to decide whether they trust A, B, or C. But it's also possible. Yeah, I think those would be like the three things

the three different variations of it. But the idea would be that Edge Note is so light that you can have it, you know, on your phones or laptops or wherever so that it becomes kind of like a

non-annoying thing that's just there. If you need to use it, you can have it there. And it's not like a tedious task to kind of run your node. It's just an app. Like you open it, it's there, your data is there. It's protected by how you want to protect it. And then you want to add your LLM, you want to use a public or private instance. Again, you have that choice to tip and then it just works. Yeah. Since it feels like the scale is...

is and can be so big and especially what you are trying to get to right is the knowledge graph of everything yeah how are you thinking about speed yeah um it it's like it like a lot of things it comes down to in a way uh

a good design and separation of concerns. So you just mentioned in the beginning that we released version 8. So that's, yeah, it's something that really amped up the scale because we've done some, well, blockchain is advanced in some ways. So that's some small part of it. But we've also redesigned the system where because of the size of the network, we were able to move towards something that's called like a,

random sampling so we no longer need to validate every single knowledge assets we can just

randomly sample. And because of the size, there's no way for, let's say a rogue player in the network to be able to just try to trick the system and delete parts of it. They'll have to persist it. So with that, we've increased the scale of what we can do by over a thousand X. So now we are literally at an internet scale scale

size because even when the knowledge graph grows if it gets too big for the subset of nodes that's there we can split it in two always and then the two networks go and then one network again gets too big that can get separated so you have like an ever-growing basically sharding type of exercise that can happen right now when it comes to speed

That's where it's a little bit of a separation of concerns because it's like depends on what you're trying to do. You can think of it more like as a get and search. If you're doing a search where you want to do a global search of everything,

It takes time, but that's just physics. Like it will always take time to do it. You can't, until we have like, you know, multiple processes running at the same time in the quantum, the variations, it'll be hard to have everything at any given second. So until quantum, we'll have to go this way.

And that's going to be a bit like it'll take time. It's not because you're still traversing. It's not like you're going through endless tables. So it's still much faster, let's say, than having a relational database of everything available there. So you'll have performance there. But what a lot of times is you're actually going to forget everything.

So you actually know where you're wanting to ask something. So it's a pretty precise query that you're shooting, let's say to your edge node or to a subset of edge nodes, the parent. And here the speed is good. Like here is very fast. It's like this can be production level speeds deployments without, because again, it's traversing. It's much, much faster compared to having something. Even if it's decentralized, it's still fast.

The fastest, if you really want to have the top, top, top speed, what you do is you do gets and then you place them in your edge node. And then you just read from your database. So you trust yourself. Once you read it, you know that it has veracity. Then you trust yourself. You keep it there. And then whatever your application is, you just have it interact with your local knowledge graph. And you'll just...

embellish your knowledge graph with others on a whenever you want to but then like the actual reads happen from the Agile that you run

Dude, that's so awesome. Where can people get started and how can they get started with the DKG? There's a Boston community, and we'd be definitely kind of keen to invite everyone to check out our GitHub. So everything's open source. You can go in and have a check out there, you know, set up an edge node, see how it works, connect it to some site.

Gen AIs, put the two brain sites together. If you're setting up an agent as well, we'll be releasing some frameworks, plugins there. So the intention is to support a lot of the popularization

popular stuff there so that you can have it an easier way to deploy it. And of course, Discord would be the channel to come in and hang out. We're there daily. Our devs are as well. So if there's any, there's a busting community that will help you out if you have any questions, but also our team members are present. So yeah, we're happy to have a chat and to build some cool stuff together, man. There's a lot of initiatives already there, but...

i'm so so keen to learn more from what people are building and how we could make yeah fight it a good fight i guess we are it's a fun time cool

Collective Memory for AI on Decentralized Knowledge Graph // Tomaž Levak // #285 53:24 Share

MLOps.community

Deep Dive

Shownotes Transcript

Collective Memory for AI on Decentralized Knowledge Graph // Tomaž Levak // #285