We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Giving New Life to Unstructured Data with LLMs and Agents

2025/6/6

AI + a16z

AI Deep Dive AI Chapters Transcript

People

Anant Bhardwaj

Derek

Guido Appenzeller

Topics

Anant Bhardwaj: 我认为AI将显著推动自动化，尤其是在处理非结构化数据方面。传统的RPA在处理非结构化数据时面临挑战，因为它无法适应数据的变化。我们正在押注AI能够克服这些限制，并实现更高效的自动化。我相信，AI不仅可以分析非结构化数据，还可以根据这些数据采取行动，从而彻底改变企业的工作流程。 Derek: 我认为非结构化数据的管理和利用是企业IT的长期目标。非结构化数据是指不能放入SQL数据库中的任何内容，例如PDF文档、图像等。这些数据对业务运营至关重要，但处理和搜索非常困难。AI agents 可以分析文档并对其采取行动。 Guido Appenzeller: 我认为企业对AI的可靠性要求正在发生变化，不再追求绝对完美。他们更关心可预测性，即AI在哪些情况下会出错，以及如何处理这些错误。企业可以接受较低的准确率，只要能够预测哪些部分需要审查。AI将简化信息，提取要点。

Deep Dive

Chapters

This chapter explores the challenges of processing unstructured data using traditional methods like RPA and introduces Instabase's innovative layout-aware models, which leverage LLMs and coordinate encoding to extract insights from complex documents. The discussion highlights the shift from rudimentary techniques to advanced AI solutions for data analysis.

Legacy robotic process automation (RPA) struggles with unstructured data.
Instabase developed layout-aware models to extract insights from PDFs and complex documents.
Encoding X and Y coordinates along with word position significantly improves LLM performance on document understanding.

Shownotes Transcript

Translations:

中文

So robotic process automation is literally, if human had to do something, you basically open some browser or whatever, take some data, put into some other system, click some button and all that stuff. So it records that human clicks on that desktop and tries to keep repeating it. So you kind of like get that automated. And the hard part that they had is you can't do robotic process for unstructured data because it's not fixed. They change it. So anything will be very, very brutal. The bet that we are taking is that AI will drive automation in a significant way.

RPA would be fully eaten by AI automation, and the future is likely going to be more of decentralized federated execution.

Thanks for listening to the A16Z AI podcast. I'm Derek, and I hope you're ready to talk unstructured data. For a long time, optimally managing and utilizing and even being able to locate unstructured data was a holy grail of enterprise IT. And what is unstructured data? As this episode's guest, Instabase founder and CEO Anant Bhardwaj explains, it's basically everything that's not nicely housed in rows and columns in a SQL database. Text files, bank statements, passport photos, you name it.

It's the stuff that's critical for any number of business operations, but that, until recently, was quite difficult to process or even search for without significant manual effort.

So, in this episode, Anat sits down with A16Z Infra partner Guido Appenzeller to talk through Instabase's history with automating the management of unstructured data, from Anat's early research at MIT through to the revolutionary advances brought by large language models. He shares some exciting new use cases, like an Indian bank approving loans via WhatsApp, and, as you just heard...

his vision of and a strategy for building a future where AI agents can make the leap from analyzing documents to acting on them. You'll hear it all, starting with some of Anand's personal journey after these disclosures.

As a reminder, please note that the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. For more details, please see a16z.com slash disclosures.

I'll just give you a little bit of history when I was doing research at MIT. And I think big data was a big thing in 2015. Everybody was doing this. And so first, let me just define unstructured data because people have different definition of unstructured data. So my definition is very simple. Anything that...

cannot be put into nice database tables where you can run SQL. Anything that is not that is unstructured data. So like a PDF document or an image or- Or anything, yeah. Anything that cannot be put into a nice table that you can run query. And we already knew how to answer questions when data is nicely in structured format. So at MIT, the question that they were trying to ask was, how do you answer a question when data is not in that format? So, and that's very heterogeneous, that basically doesn't have any schema. You don't even know what questions are relevant or not.

So that was the key sort of hypothesis. And we were building this product called Data Hub. And this has the ability to mount different kinds of things. So you could mount file systems, you could mount databases, and you can mount something called application node. So because some data also lives in random applications. And can you ask any questions?

So that was a big research project. I was like, this could be very valuable. So I dropped out. I didn't solve the whole problem and came here in Silicon Valley. And then I started talking to a bunch of companies. Tell me what is your unstructured data problem? Because we have to figure out business or where to sell and where is the real value for the organizations, especially enterprises.

And we got pulled into this gnarly problem, which is here are all my images and documents and Excel and PowerPoint. And can you help me answer questions? So my first question was, why do you even care? What question you want to answer? We need to understand that. And they're like, we do a bunch of processes that...

receives a ton of unstructured data and we have to make a decision. Like for example, if immigration, when somebody applies for immigration, they submit a bunch of things and they have to make a decision whether they should give you a visa or not. Or you apply for loan, you submit a bunch of things and they have to make a decision whether you should get a loan or not. So we were like, sounds interesting. So let's think about how to solve it.

And you won't believe it, the techniques at that time were very rudimentary. So there were four common techniques that people used. Number one, they called this like templates, where they will simply say, here is a template for passport. And if you want a passport number, go look 10 pixel below and 10 pixel from the right and draw a 20 pixel long box. And whatever you find is your passport number. Good luck with that, yeah. It's very, very brutal, right? Because as soon as you scan differently and things would break. Second techniques were basically people writing

different kind of rules, like go and look for the keyword, period, beginning, and anything right of that is a start date or something. Doesn't work, like it just breaks. Our third technique is

People were trying to train these ML models by writing features for a specific document type. And what feature do you write for like base step? It's just very, very hard. So those also didn't work. So we basically at that time started doing research, which we killed in two years, which called program synthesis, which is we were basically like, if I had access to amazingly intelligent people, how do I solve unstructured data problem? I will ask them to write code for, you know, on the fly. So can I basically...

ask computer to synthesize your program on the fly. It's very, very hard for computer to write program at that time. LLM wasn't the thing, but we would like most of the data can be extracted from documents and all that by writing some form of regular express and in those kinds of things. So let's do the synthesis of these regular express and based on what input output combination that you give. And that is the answer. And that worked reasonably well as long as your input is in the similar kind of structure, because the problem with program is it's,

deterministic. So if your input changes, it will break. But it still produces reliable results, but not good enough that we could solve many problems, but we could solve some part of the problems. This was 2017, and Transformer paper came.

And I think the Transformers paper, they also released a model called BERT at that time. So we were super excited. We were like, because that was like state of the art and like best sort of model to understand natural language. So we basically applied BERT on these unstructured documents. We took a bunch of those tokens and put that, and that produced really bad results.

really, really bad results. So we were like, at that time, actually, I sent a note, which Martins would have a copy of, which we were like, seems like this problem is not solvable. Unless somebody solves AI complete problems, which they call, you know, AGI. But we were like, there is nothing else that is promising enough. So what do we do?

So we basically tried to use some creative approach that if you look at the BERT language model, they were encoding the token as the position of the word in the sentence. And that's how the attention mechanism would work and would do the fill mask problem. So we were like, what if we also start encoding in addition to the position of the word in the sentence, X and Y coordinate two. So we basically took 110 million documents

took every single word or the token and encoded with the position in the sentence, but more importantly, X and Y coordinate, and then tried to basically solve fill mask problem by basically blocking and see that if that box can be filled by the model and trained a model which is similar to BERT, we call this InstaLM.

And that produced great results because the attention is now not just looking at the sequence of tokens, but also X, Y coordinate in the two-dimensional space, which is really, really cool from the perspective of the document layout understanding. Yeah.

And I think it's fair to say that has become, I mean, this is much later today, but today it's become sort of a standard technique almost, right? If you're looking at two-dimensional data, you have some rotary encoding of X and Y or something like that. Yeah, yeah, yeah. So at that time, that was not the case. So actually, Rafal, who was one of our ML engineers, you will see those two or three papers being in the top of the arena during that time.

So we were very happy. We started winning a lot of deals. We tripled our revenue that year, 2021 to 2022. But then OpenAI launched ChatGPT, which is November 2022. It turns out the bit of lesson held, size matters. So, and we were like, oh man, like that is about, basically you could actually pass the documents. And at that time they didn't support documents in the first release, but you could basically take the text with the positions like preserve and pass. And it did a reasonably good job.

And we were like, is this end of Instabase? Like, you can now, seems like, solve this whole problem. And then we realized that there is just a ton of things, and I think there's a paper by Databricks, which is Compound AI System, that

LLMs are very good, but you need a bunch of systems before and after this for it to be reliable and we can get into the details. But that is the history of how we are, where we are today. Yeah, amazing. Very small personal anecdote. I have a lot of PDF files, everything, every piece of paper I get, I just scan and dump into a folder. And I recently wrote myself a little tool that basically, first I asked an LLM to come up with a hierarchy of documents. You know, we're a family of five, you know, here's some things about our family and then give me a document hierarchy.

And then basically by taking a document and just taking the summary of the document, giving to an LLM and saying, which folder should this go into? That's an amazingly efficient sorting algorithm. It's really impressive what LLMs can do. So today you have a solution that basically allows enterprises or companies to work with unstructured data. Can you talk a little bit about what this does and what some of the use cases are? So the use case is pretty simple, which is, let's say I'll take a simple example of

a bank that wants to do lending or an insurance company that wants to basically process your claims. So let's take one of these two huge cases. So when people apply for, let's say, application of home loan, it's like literally a hundred page long packet and you don't even know where is what. It could be that first 10 page is their bank statement. Here's a shoebox of documents. Yeah, yeah. And in between there might be cat's picture. In between there might be some random letter from somebody. And so the issue I think is,

There is no one structure. What bank says, I need to something that can verify your income. I need something that verifies your identity. So it's not that they tell you here is my passport and here is my driver license. Here is application packet and go and process it.

You have to do this reliably because you cannot make a single error. You can just think about like, how do you solve this? So there are two techniques. And that's what, you know, one of the, I think, paper that we wrote is LLMs is not all you need. Because one thing that you can do is put that into some stuff and ask LLM the question. But the problem is if it goes beyond the context window, then that's a problem. You can do RAG because RAG is a technique where you put that into some vector database, figure out for what question, what are the relevant questions.

chunks that could be useful and then produce that. But how do you know something you did not miss? You might get precision, but if you miss something, then that's a problem. And LLMs are

Great, but they make surprising errors. So for example, let's say you have 10 page long bank statement with tables. Somehow they will get a lot of things right, but miss like four random cells with the value. And you don't even know that they missed it. And that just changes the whole thing. So these are very surprising kind of errors. Yeah.

So we looked at like, how do you solve this reliably? Because reliability part is important because these are complex decisions that banks or insurance companies or immigration make. So the right way to solve this is how do you know how to split this particular packet into a bunch of things we care about? So you have to analyze every single thing in detail. Once you have done this,

then how do you get all of these structures that we care about? Like, for example, we run separate table detection algorithm rather than passing the whole thing to LLM because how do you know you didn't miss four things? How do you make sure all the cells are correct? Similar thing with checkboxes and the signatures and other things that basically matter. Once you have classified, then what are the relevant schema that we need? Then you basically go and do those things. How do you validate that each of those things are correct? Then write validations.

and then do cross validation because is the paste of saying the same thing that W2 does? Because if not, then that's... So basically what we provide is this interface where people can build all of those things without writing a single line of code. And then you build this application and now you can run this application as part of deployment, which will integrate with your upstream and downstream. So now you can do lending in like less than five seconds rather than earlier that would have taken several weeks.

There is one very interesting huge case, is intelligence huge case, for example. So let's say, and that's what I talk about, why the approaches are critically important. So let's say you are a country and you want to collect a bunch of intelligence data, and you want to answer if there is any threat to the country. And you receive like, let's say, millions of documents per day. One way to dump that all into some rag system and ask a question is,

How do you know you didn't miss anything? Because they care about that. And maybe the right way to answer that question is not putting all the documents into a search, rather looking at every single page of the document. Look for the things that you care about, like which is,

terrorism threat or money laundering or whatever, and then extract that, put that into database, run SQL query. Once the things that match, then go and do the deeper analysis because now you guarantee completeness. So I think that what we have seen is that while drag is good for casual search, you need a complex workflow under the hood that is explainable, that is auditable, that is guaranteed to be accurate and correct.

is important for solving many of these enterprise problems. So that's what we do. We help, basically, enterprises take any kind of unstructured data and make decisions from it for reliable, 100% complete and accurate use case. There are cases where we can make error. And in that case, we have to pass to humans, like, hey, seems like something is wrong. Can you go and look at it? Totally. And look, I mean, I think this is the trend with current AI systems, right? Yeah.

I've not encountered an AI system yet that is perfect. And by some metric, I think we never will, right? I think what you need is finding things with reasonable error rates and then a good escalation path to humans to deal with those, right? Exactly. And even humans, humans are not 100% correct, right? So you have to build the right processes to catch it. So that's why I think sometimes when people say this AI didn't work, it's just that

AI is not supposed to work reliably 100% of the time. You have to build a system around it. That's right, yeah. And that is going to be a lot of investment that you will see across the board, which is how do we build the right systems around AI and LLMs that solves the problem. Is there a shift in how enterprises or general, I think, consumers of AI think about reliability? I mean, look, classically, if I'm a...

cheap compliance officer in a bank or so i have a new piece of software and you know my take is this software can never do x because that puts us out of compliance i recently spoke to a bank that said like well we tried that it doesn't work with ai right so now we're saying you know the the well-trained human gets us out of compliance about x times uh you know every x hours or so right and so the ai has to be 10x better and then we're going to sign off on it uh you know so so you you

You cannot have absolute perfection. So, you know, we have to sort of change the acceptance criteria. Is that something you're seeing as well? I think more important is predictability. I think people are fine with errors as long as errors are predictable.

When errors are not predictable, that's where the problem is. So when basically somebody makes an error and you don't even know the error was made, that's when... Because in humans, you know, they will make 3%, 4% error, you know. But if you put the second human, by default, the chance of that is low. The compounds. Yeah, yeah. But AI, the issue is that they're pretty accurate. They're very good. But they make mistakes in a surprisingly unpredictable way.

And that is a bigger problem. And that's where I think the tooling and systems around it to detect them, to be able to explain when the error was made, to be able to figure out how to catch them, or building system that allows you to minimize that effect, that is the critical part. So I think in general, what we have seen is

Enterprise is fine using AI as long as we show them predictability. They don't care about 99% accuracy. You can be 90% accurate or even 80% accurate, but just tell us which 20% need to be reviewed or which 20% need to go somewhere. And that requires a lot of systems around these tools

to get there. So I think we sometimes misunderstand what enterprises want. They don't want 100% accuracy. They want predictability. Is this the future that essentially, you know, in the future, if an organization receives a document that typically human will no longer see the document, but will primarily look at it and generate a summary or, you know, I will pre-parse it and, you know, I can reason about it at a higher layer? When I would unstructured data like documents come in,

humans will still see some kind of dashboard with like whatever stuff is and only the thing of interest they will go and double click on. And AI will do a lot of things to minimize their time to get to that thing of interest very, very quickly. Like Google is a great example. When you search,

you don't read every single thing. Google gives you like a here is maybe three or four things of interest that you want to double click and do research on. And I think AI will play similar important role where in many cases. Gets rid of the boilerplate and reduce the thing to the essential. Are we looking at a world where my system will take, you know, my couple of key points or key phrases and generate a PDF document and then your system will take the PDF document and reduce it back on the couple of key points and phrases? Exactly. That's a good question.

That's, I guess, not a bad way to operate in the future. What is the most interesting use case you've seen for your technology? Anything sort of obviously ordinary? I think what we are seeing, customers being a lot more creative than we had ever imagined. So just think of, I was working with a bank in India. And now, given that AI has become reasonably reliable, they are offering entire lending over WhatsApp. Oh, that's amazing. So you go to WhatsApp, you say like, hey, I'm a business, and...

I want a loan. And then on WhatsApp, you get a response back saying, hey, can you upload these things, your last 30 days of like all the, you know, your P&L statement and whatever those things look. And you basically piecemeal, you submit these reports like, oh, this looks good. Can you also do this? And I've never seen like lending being done conversationally over WhatsApp. This is insane. Like the customer experience is like fundamentally very different. And I think that I do believe that over the coming years,

it is going to change the user experience in a very, very significant way. Currently, I think a lot of people think AI is a technology and how we can use this inside software. I think that the biggest impact would be with the degree of affordance that it gives you, you can completely build a new class of interaction with your customers that would never have been possible. And we are seeing more and more of those

currently like all of these processes, like insurance claims and all, it's a pretty painful process, right? And I think America is slightly more conservative in those things. But if you go to developing world where digitization is more of a new thing and people are already using all the stuff on phone,

Things are just moving in a way where, you know, because AI makes you feel like you're talking to humans. Nobody loved chatbots before, but now you feel good because they basically are conversing with you in pretty similar to human-like behavior. And that interface coupled with all the

interaction that they have. Of course, one of the big use cases that everybody's trying to go after is a call center. But just think of every other things too. Like how do you create open an account? How do you do lending? How do you do processing? It will have significant impact on how the user experience is going to change in a very, very significant way. And I think...

There's even, I think, an opportunity here to take some processes which currently were very, you know, I take a lot of documents, I throw them over the wall and back comes a response to really turn to something more interactive, right? Where it's like, hey, Guido, you know, tell me more about your specific use case. Okay, then I need these documents and I send them. It's like, well, that document is missing something. And, you know, you can do this interactive with very, very short latency. Everything, even immigration, right? Like you send the stuff and you don't even know, you know, two months later, you hear like your stuff is rejected or we need something like this.

I mean, all of those things can fundamentally be changed. I just got a letter back from the IRS. I submitted a long application with lots of supporting documents. I got a form letter saying the documentation is not complete without any mention of what is not complete. Exactly. I was like, what does this mean? This can be just much more interactive. Yeah. And because now you can do things in real time. And so I'm pretty optimistic on...

the impact of this on every single business on how they interact with their customers. That makes a lot of sense. What do you see as the main barriers for companies to adopt this? It's like, you know, I mean, I've seen many classic enterprise adopting AI. There's discussions around, you know, compliance and legal and where does my data go? And, you know, like a long list of...

sort of concerns that are being expressed. What are the top sort of items that you've seen? These enterprises are not historically known for moving very quickly. So that's number one. So I think expecting that, like... I would say they're moving a little quicker in the AI revolution than they did previously. Exactly. In general, I think each of these large enterprises have to get approval from their compliance committee and the regulations committee, you know, and they all basically...

And none of them really understand things. And sometimes you get regulations that might, or questions that might not even be applicable. Like, for example, tell me every time you change the feature, how did it... LLM's like, we, LLM developers don't change features, right? But when you get like all of those things that basically is a massive time sink. But I think the two key things that they care about is...

how do you guarantee that my data is safe and secure? So that's number one. And second is, how do you give me auditability and predictability? That's the two most, like if you boil down to all their questions, they eventually boil down to those two things. Like nobody wants like AI making a decision, even if it is correct, if they cannot explain here are the set of steps that it took.

Because if something wrong happened, they have to explain. Because in the human world, you can explain. Something came, this went to these...

five different teams where they did this part and this particular error was made. And that's why we will correct in future so that this kind of mistake would not happen. If AI becomes a black box with no instrumentation of how things get done internally, that basically has hard time, especially for customer-centric use cases. For simple casual search and those kinds of things, it's fine. But the runtime has to be something that is auditable.

and you should be able to find if something went wrong, where it went wrong. And they don't tell you directly. They ask the question that eventually boils down to this, but that's what we have seen as the major requirement. Makes sense. Let me switch tacks here a little bit. We've seen, you know, one of the, I think, hottest buzzwords at the moment are agents, right? And so, you know, it's an overused term. It's sometimes used as a marketing term for, you know, a glorified set of prompts, basically, right? But we're also seeing it

as a essentially different user interface paradigm, right? Where I no longer walk through a transaction step by step, but basically I give a high level instruction to agent, agent acts autonomously. We even see it sort of as a software design paradigm where now have multiple agents that work together and make decisions more autonomously. How do you think

this will change with, you know, how enterprises process data, how we work with unstructured data, and then this entire space. So let's look at like what we already know that has worked well. So what we already know that has worked well is,

enterprises already know how to run some workflow that is created by some developer and they define a bunch of steps using some workflow management tool and you can run it. So people already know how to run this. Argument you can make is, can we just tell the agent, like give me the answer and they do it? The problem with currently the agents are if you just give them same goal and same set of tools and they might choose different paths two different times.

So they are not guaranteed to deterministically always go in one path. So in general, people don't like runtime inconsistencies. So runtime has to be consistent. So I think where I have seen things work well within enterprises, during build time, when somebody has to define the control path and the logic and all of those kind of things,

You can maybe have agent produce the first draft, like, hey, this is how I plan to execute. This is what it looks like. Because otherwise, human might have taken a long period of time. Pretty similar to Cursor, right? If you want to build something, they can write the first draft of the code. The human can look, make some minor edits. But then you run that code deterministically. Yeah, exactly. So my point is, I think it's the same way. So I do not believe that autonomous agent would be a runtime phenomenon.

However, there would be a build time or compile time phenomena, which basically means that during build phase, they can do the 90% of the work. Humans make some changes. And that's a huge, huge, huge value because the reason why things don't scale at the enterprise is because

There is either lack of enough developers or skills or drive or whatever. If AI agents can do things and make it so easy that you can build those, and then once it is approved, then we know what is running. Then it is auditable and you can also add steps and checkpoints, whatever that is needed. Like for example, cursor generated code, but you want more logging, so you can add logs in between, whatever those things could be. So once you have that deterministic artifact that can run in production,

So that's where I think the world is going to move towards, which is your compile time phenomena and the runtime phenomena. Runtime phenomena has to be deterministic, something that is auditable, debuggable. You exactly know what is happening. You should be able to see the logs and all that kind of stuff. At compile time, agent can play an important role because they can help with the reasoning and create the first draft where human can participate with the agent to produce the artifact that is going to run. That makes a lot of sense. I mean, this is a super hot debate at the moment, right? And I think we've everything from this

AGI vision where it's like, no, this is going to be a fully agentic loop and it decides when it wants to terminate, decides what tools to use. And it's just, you know, you give your credit card and let it run, right? And I personally agree. I don't think we're there yet, right? These most freeform agentic systems that we've seen, they typically don't work yet. This approach of saying,

Let the LLM generate the flow, but then freeze the flow once it works. I think at least in the short term, it's a much more pragmatic vision. Also, basically, I think we can take our lessons from what works in the human world. Let's assume every human is an agent. You don't allow every single employee in your company to make an autonomous decision. No, some person at the top says, here is the set of things that we are going to do. You can only do these set of things. And then, so basically, the runtime is pretty deterministic.

Most of the reasoning and agency and all that cool stuff is used. So LLM process re-engineering is a thing now. Yeah. I guess. That's fantastic. So looking forward, what are you excited about in your space? I mean, AI at the moment, it's hard to predict what's happening in six months, right? But if you try to stretch your crystal ball to the absolute limits here, what things do you think we'll see 12 months out, two years out? Yeah.

in your space? So we've been debating and reasoning on this for quite a period of time. And maybe my answer would be slightly controversial because different people have different view of what would be the future. So I do believe that AI will continue to improve and the capabilities, and I think they will play an important role in compile time, building things, reasoning and all that. Although runtime is going to be much more deterministic and predictable and controllable.

Now, the question is, what is going to be execution pattern? There are two different view of the world. One is that, does it make my data management problem easier, that it allows compile time, move all the things into one place and be able to answer and do things? Or you basically keep the tooling and the world the way it is, siloed everywhere, and AI would become smart enough

to have multi-agent communication where each agent can do things and figure out how to, you know, if one makes an error and affects how to do the communication. So we have been working on this idea of federated AI execution, where how you can, as an organization, you can define these thousands of agents in a very federated way, but

dynamically are able to discover other agents through some platform or whatever that could be, and then able to communicate. So if you give a bigger goal, somehow they basically, you don't need a one central person to decide everything. Dynamically, all the agents can discover, they all can share the capabilities, then you can figure out the control path, then you can figure out how to run. So we are trying to build federated, decentralized automation framework, which basically means

that can I take any process in any organization and figure out the federated decentralized execution framework and that can run? And that's where I believe that automation world would move. There are still a lot of open questions, a lot of unknowns. A lot of work to do, yeah. But the bet that we are taking is that AI will drive automation in a significant way.

RPA would be fully eaten by AI automation. And the future is likely going to be more of decentralized federated execution. Yeah. And so that's what we are. That's one hell of a vision there. I'm excited about it. So AI is progressing very rapidly. How have some of the technical advances of AI impacted AI?

what you can deliver to your end customers. I mean, they must be changing basically constantly, is that right? Yeah. Yeah. So I think the earlier we focused primarily on the unstructured data problem as part of the automation, because that's one of the long tend in the poll is how do you even understand them? Because once you get data in the structured format, you know how to do next steps. So we primarily focused until now, which is if you get a bunch of unstructured data, how to get you the things that you need to make the next step of the decision.

We did not touch the next step of the decision. Like, let's say you are a lending company or you are an insurance company. Once you get all the data, you might have to, you know, trigger some other tool like their lending system or some sort of fraud system or whatever, the risk system and things like that. Because that requires knowing about those systems, how to interpret the results and all that kind of stuff. So we are like,

Data in, we will do everything, give you valuable data out, and after that, you are responsible for all the other integrations. The way these guys solve those problems,

was by using this technology called RPA. You might have heard robotic process automation. So robotic process automation is literally, if human had to do something, you basically open some stuff, browser or whatever, take some data, put into some other system, click some button and all that stuff. So it records that human clicks on that desktop and tries to keep repeating it. So you kind of like get that automated. And the hard part that they had is,

You can't do robotic process for unstructured data because it's not fixed. They change it. So anything will be very, very brutal. But if the things are exactly the same after that, you can actually record the screen and replay. It has been very, very brutal. The problem with the RPH, even though they add value, I think there are some big players there, UiPath, Automation Anywhere, and many of them have reasonable, massive market cap. Now with AI...

The argument that we make, and we might be wrong, is once the data comes out, which we are very, very good at until that point, can we also start operating those other systems? Now, this makes a massive assumption, which is AI will help us operate those systems. And there are some interesting protocols that has come, which is model context protocol that allows you to dynamically discover capabilities, call those functions. It has a ton of problems still, which is

Does all the system even support MCP? What if they don't? They sort of punted on authentication, but we'll figure that out over time. Then authentication. Then how do you know if something breaks? One of the arguments that we're making is that maybe in future, as we basically go broader, can we do entire end-to-end workflow? So once data comes out, do we have a way to plan and region during the compile time, which AI agent can do?

How to operate those systems, how to call them, how to get the data, then call some other system. If something gets wrong, how to involve humans. So create all that stuff with the AI agent during compile time and then extend our offering to do this entire thing end-to-end. Why can RPA be fully replaced with AI automation?

RPA had some stuff that is easier to solve because some user logs in. So it always runs in the context of the user. If you're clicking on desktop and all that, one of the hacks that we believe might work is called identity pass-through. It can be assumed the user identity that can be provided during runtime and then let that user identity get passed to all of the MCP. Do I always want an agent to have the same capabilities that I have? Yeah.

An agent is like today, like a good intern, right? So I trust the intern up to a point. I don't necessarily want, other than just spending on my credit card, maybe I want to cap that at $50 or something like that. And you can decide that during compile time. So basically you can say that like, hey, even if like, let's say this user context is with this, but as soon as it gets to this operating tool, maybe like we create some like user divided by half

identity that will have less permissions or things like that. So the good thing, and that's why I said the AI agent should only be used during compile time so that it gives humans all the control that what the runtime behavior should be. Yeah, makes sense. This problems come when AI agent is making runtime decisions because then you have no control where things are going. So the separation is critically important. So during the initial build, you can choose like if you want to curb what agency they have and what limits and constraints that they have. And that's what it will go and do during the runtime. Right.

Right, Alan. Thanks for being here today. That was absolutely amazing. I think we're on a very exciting journey with AI. And looking back, I think the last big wave that I was a part of was probably the dot-com boom. And I think if there's one lesson learned for enterprises in general back in those days is that

These big technological shifts happen. You have to jump on the wave early. It may be complicated. Maybe, you know, still a little weird, a little, you know, your compliance, your legal folks, they don't know how to deal with it. But, you know, if you don't, you may end up like Barnes & Noble, right? The downside is substantial. And I think it is clear today that there's a huge opportunity for enterprises here to both,

have more efficient workflows for themselves, but also to have a much, much better end customer experience and partner experience. And in addition, it does three things, which is it saves you a lot of cost.

It does. It allows you to do things much, much faster. And the third one is fundamentally changes customer experience in a very significant way. So I think there are all the business regions for enterprises to adopt these things. Now it's just about how to make this work. I don't think I have any question on whether this, you know, whether this will work or is this the right decision. It's about how to make it work. That is the bigger question, I think. Fantastic. Well, thank you so much.

And that's it for this episode. Maybe it's the years I spent covering the data space and thinking about big data, but I thought that was a great discussion. If you agree, please do share the podcast and rate it wherever you listen. And keep listening for more great stuff in the weeks to come.

Giving New Life to Unstructured Data with LLMs and Agents 35:49 Share

AI + a16z

Deep Dive

Shownotes Transcript

Giving New Life to Unstructured Data with LLMs and Agents