We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode RAG Inventor Talks Agents, Grounded AI, and Enterprise Impact

RAG Inventor Talks Agents, Grounded AI, and Enterprise Impact

2025/3/27
logo of podcast Founded & Funded

Founded & Funded

AI Deep Dive AI Chapters Transcript
People
D
Douwe Kiela
J
Jon Turow
Topics
Douwe Kiela: 我是Contextual AI的联合创始人兼CEO,也是RAG(检索增强生成)技术的共同发明者。我们公司致力于为企业构建下一代检索系统。我希望Contextual AI能够更宏伟,不只是RAG技术栈中的一小部分,而是拥有整个市场。RAG的起源是为了解决语言模型缺乏知识背景的问题,通过将语言模型与外部知识库结合,使其能够更好地理解和生成文本。RAG最初的灵感是将语言模型与维基百科等外部文本知识库结合,以增强其理解能力。RAG的成功部分源于当时Facebook AI的图像相似性搜索技术(一种向量数据库)的可用性,这使得将向量数据库的输出与生成模型结合成为可能。RAG的初衷之一是解决生成模型知识陈旧的问题,通过检索最新的信息来增强模型的知识。RAG并非与微调或长上下文窗口相互排斥,而是可以与它们结合使用以获得最佳性能。RAG之所以受欢迎,是因为它提供了一种简单的方法来让大型语言模型处理企业内部数据。下一代RAG系统将使开发者能够更专注于业务价值和差异化,而不是技术细节,例如分块策略。虽然CTO们希望RAG易于使用,但开发人员仍然需要关注技术细节,例如分块策略,以优化性能。企业对RAG的采用程度参差不齐,有些企业还在探索阶段,有些企业已经构建了自己的RAG平台,但可能基于错误的假设。许多企业在RAG的应用场景选择上目标过低,应该选择更复杂、影响范围更大的应用场景以获得更高的投资回报。RAG的投资回报率取决于应用场景的复杂性和影响的员工数量。生成式AI的部署主要分为两类:成本节约和业务转型。企业对RAG的常见误解包括:低估了将RAG投入生产的难度,以及对RAG适用场景的误解。区分RAG能够解决的问题和不能解决的问题,例如,RAG擅长回答特定问题,但不擅长总结文档。Contextual AI的RAG 2.0以及其他技术整合,改变了与企业高管的沟通方式。Contextual AI帮助企业发现合适的RAG应用场景,并定义成功指标和测试集,以评估RAG的有效性。用户验收测试(UAT)对于RAG的成功部署至关重要,因为在实际应用中,用户行为可能与测试阶段不同。Contextual AI为企业提供支持,帮助其完成RAG的部署和应用。Contextual AI专注于构建围绕大型语言模型的系统,而不是训练基础模型,因为大型语言模型将被商品化。RAG 2.0专注于联合优化RAG管道中的各个组件,以提高性能。主动检索与被动检索的区别:主动检索允许语言模型根据需要决定是否进行检索,并根据需要调整检索策略。RAG代理通过上下文推理选择最佳信息来源。未来RAG领域的一个重要挑战是解决结构化数据和非结构化数据的融合问题。针对特定问题,专门化的模型总是优于通用的模型。Contextual AI的研究重点是解决实际的客户问题,而不是纯粹的学术研究。在AI领域,研究与产品开发的界限非常模糊,研究成果需要迅速转化为产品。AI原生公司与SaaS公司的区别:AI原生公司需要在构建产品的同时不断探索和学习。将研究成果转化为产品,并与企业合作,是一次令人兴奋的经历。当前AI技术已经具备巨大的经济影响力,但仍需解决一些非技术性问题,例如法律法规和组织管理。在AI部署中,关注不准确性以及如何降低风险至关重要。不同类型的创始人(领域专家、技术专家和研究型创始人)在AI领域的创业中都扮演着重要角色。创始人是否具备深厚的领域知识并不重要,重要的是领导能力和远见卓识。基于基础技术的“包装公司”同样具有巨大的商业机会。未来AI创业的机会在于利用AI技术的成熟度,解决更复杂的问题,并注重成本效益。新兴的AI公司需要专注于差异化,并避免被大型公司吞噬。灵活适应市场变化,保持谦逊。不要害怕挑战难题,追求更大的目标。一个公司的核心是其数据和对数据的专业知识,这应该体现在其模型评估方法中。“幻觉”是一个不够精确的术语,应该用准确性来衡量。幻觉与模型的groundedness有关,如果模型能够很好地遵循上下文,则幻觉会减少。建立AI公司并解决实际问题比预想的要困难得多。AI领域被忽视的重要问题:模型评估。模型评估对于AI的成功部署至关重要,但许多公司对评估的重视程度不足。企业可以利用Contextual AI的工具进行模型评估,但评估的专业知识需要企业自身积累。 Jon Turow: 早期生成模型的一个常见批评是其知识截止日期的限制,例如2020和2021年的模型无法识别COVID-19。RAG与其他技术(如微调、团队合作和情境学习)的关系以及在这些技术组合中的作用。RAG的流行和误解:它被视为解决所有问题的灵丹妙药。企业高管对RAG的积极反馈,以及RAG带来的便利性。下一代RAG系统将使开发者能够更专注于业务价值和差异化,而不是技术细节,例如分块策略。CTO们希望RAG能够易于集成到现有架构中,并能立即投入使用。虽然CTO们希望RAG易于使用,但开发人员仍然需要关注技术细节,例如分块策略,以优化性能。企业对RAG的采用程度参差不齐,有些企业还在探索阶段,有些企业已经构建了自己的RAG平台,但可能基于错误的假设。RAG的应用发展阶段:2023年是演示阶段,2024年是生产化阶段,2025年是追求投资回报阶段。企业对RAG投资回报率的衡量方法各不相同,一些企业注重成本节约,另一些企业关注业务转型和收入增长。许多企业在RAG的应用场景选择上目标过低,应该选择更复杂、影响范围更大的应用场景以获得更高的投资回报。RAG的投资回报率取决于应用场景的复杂性和影响的员工数量。生成式AI的部署主要分为两类:成本节约和业务转型。企业对RAG的常见误解包括:低估了将RAG投入生产的难度,以及对RAG适用场景的误解。Contextual AI的RAG 2.0以及其他技术整合,改变了与企业高管的沟通方式。关于幻觉的讨论:RAG是否能够解决幻觉问题。“幻觉”是一个不够精确的术语,应该用准确性来衡量。关于AI采用和能力的看法变化。建立AI公司并解决实际问题比预想的要困难得多。AI领域被忽视的重要问题:模型评估。模型评估对于AI的成功部署至关重要,但许多公司对评估的重视程度不足。 supporting_evidences Douwe Kiela: 'So the history of the RAG project, so we were at Facebook AI Research, obviously FAIR, and I had been doing a lot of work on grounding already for my PhD thesis. And grounding at the time really meant understanding language with respect to something else.' Douwe Kiela: 'So it was like, if you want to know the meaning of the word cat, like the embedding, word embedding of the word cat, this was before we had like sentence embeddings, then ideally you would also know what cats look like because then you understand the meaning of cat better. So that type of perceptual grounding was something that a lot of people were looking at at the time.' Douwe Kiela: 'And then I was talking with one of my PhD students, Ethan Perez, about can we ground it in something else? Maybe we can ground in other text instead of in images. So the obvious source at the time to ground in would be Wikipedia. So we would say this is true, sort of true. And then you can understand language with respect to that ground truth.' Jon Turow: 'Well, you know, this takes me back to another common critique of these early generative models that for the amazing Q&A that they were capable of, the knowledge cutoff was really striking. You've had models in 2020 and 2021 that were not aware of COVID-19.' Douwe Kiela: 'Yeah, it was part of the original motivation, right? So that is sort of what grounding is, the vision behind the original RAG project. And so we did a lot of work after that on that question as well, is can I have a very lightweight language model that basically has no knowledge?' Jon Turow: 'Now we have RAG, and we still have this constellation of other techniques. We have training, and we have teaming, and we have in-context learning. And that was, I'm sure, very hard to navigate for research labs, let alone enterprises. In the conception of RAG, in the early implementations of it, what was in your head about how RAG was going to fit into that constellation? Was it meant to be standalone?' Douwe Kiela: 'Yeah, it's interesting because the concept of in-context learning didn't really exist at the time. That really became a thing with GPT-3, I think, where they showed that that works. And that's just an amazing paper and an amazing proof point that that actually works. And I think that really unlocked a lot of possibilities. But in the original BRAC paper, we have a baseline, what we call the frozen baseline, where we don't do any training and we just give it as context.' Douwe Kiela: 'So in context learning is great, but you can probably always beat it through machine learning if you're able to do that. So if you have access to the parameters, which is obviously not the case with a lot of these black box frontier language models, but if you have access to the parameters and you can optimize them for the data you're working on or the problem you're solving, then at least theoretically, you should always be able to do better. So I see a lot of kind of' Jon Turow: 'What has happened since then is that, and we'll talk about how this is all getting combined in more sophisticated ways today, but I think it's fair to say that in the past 18, 24, 36 months, Rag has caught fire and even become misunderstood as the single silver bullet. Why do you think it's been so seductive?' Douwe Kiela: 'It's seductive because it's easy. I honestly, I think like long context is actually even more seductive if you're lazy, right? Because then you don't even have to worry about the retrieval anymore. You just put it all there and you pay a heavy price for having all of that data in the context. You're like every single time you're answering a question about Harry Potter, you have to read the whole book in order to answer the question, which is not great. So RAG is seductive, I think, because you need to have' Jon Turow: 'And we'll get to the part where we're talking about how you need to move beyond a cool demo. But I think the power of a cool demo should not be underestimated. And RAG enables that. What are some of the aha moments that you see with enterprise executives?' Douwe Kiela: 'Yeah, I mean, there are lots of aha moments. I think like that's part of the joy of my job. I think it's where you get to show what this can do and it's just amazing sometimes what these models can do. But yeah, so basic aha moments for us.' Douwe Kiela: 'So the next generation of these systems and platforms for building these RAG agents is going to enable developers to think much more about business value and differentiation, essentially. How can I be better than my competitors because I've solved this problem so much better? So your chunking strategy should really not be important for solving that problem.' Jon Turow: 'Well, so if I now connect what we were just talking about to what you said now, the seduction of long context and RAG are that it's straightforward and it's easy. It plugs into my existing architecture. And as a CTO, if I have finite resources to go implement new pieces of technology, let alone dig into concepts like chunking strategies and how the vector similarity for non-dairy will look similar to the vector similarity for milk, things like this. Is it fair to say that CTOs are wanting something coherent' Douwe Kiela: 'But then what we often find is that we talk to these people and then they talk to their architects and their developers. And those developers love thinking about chunking strategies because that's what it means in a modern era to be an AI engineer is to be very good at prompt engineering and evaluation and optimizing all the different parts of the RAG stack.' Douwe Kiela: 'So I think it's very important to have the flexibility to play with these different strategies. But you need to have very, very good defaults so that these people don't have to do that unless they really want to squeeze like the final percent and then they can do that. So that's what we're trying to offer is like you don't have to worry about all this stuff.' Douwe Kiela: 'The timeline is basically 2023 was the year of the demo. ChatGPT, it just happened. Everybody was kind of playing with it. There was a lot of experimental budget. Last year has been about trying to productionize it and you could probably get promoted if you were in a large enterprise, if you were the first one to ship GenAI into production. So there's been a lot of kind of kneecapping of those solutions happening in order to be the first one to get it into production.' Douwe Kiela: 'This year, those first past the post, so I asked the post, but so in a limited way, because it's actually very hard to get the real thing past the post. Right. So this year, people are really under a lot of pressure to deliver return on investment for all of those investments and all of the experimentation that has been happening. So it turns out that actually getting that ROI is a very different question.' Douwe Kiela: 'I think my general stance on like use case adoption is that I see a lot of people kind of aiming too low. Where it's like, oh, we have AI running in production. It's like, oh, what do you have? Well, we have something that can tell us who our 401k provider is and how many vacation days I get. And that's nice. Is that where you get the ROI of AI from? Obviously not. You need to move up in terms of complexity.' Douwe Kiela: 'Yeah, so there's roughly two categories for Gen-AI deployment, right? One is cost savings. So I have lots of people doing one thing. If I make all of them slightly more effective, then I can save myself a lot of money. And the other is more around business transformation and generating new revenue.' Douwe Kiela: 'I see some confusion around this kind of the gap between demo and production. A lot of people think that the common misconception we see is like, oh, yeah, it's great. I can easily do this myself. And then it turns out that everything breaks down after like 100 documents and they have a million. And so that is the most common one that we see. But I think there are other misconceptions maybe around what RAG is good for.' Douwe Kiela: 'and what is not. So what is a rag problem and what is not a rag problem? And so people, I think, don't have the same kind of mental model that maybe AI researchers like myself have, where if I give them access to a rag agent, often the first question they ask is, what's in the data?' Jon Turow: 'So now we have contextual, which is an amalgamation of multiple techniques. And you have what you call React 2.0, and you have fine tuning, and there's a lot of things that happen under the covers that customers ideally don't have to worry about until they choose to do so. And I expect that changes radically the conversation you have with an enterprise executive. So how do you describe the kinds of problems that they should go find and apply and prioritize?' Douwe Kiela: 'Yeah, so we often help people with use case discovery. So really just thinking through, okay, what are the rag problems? What are maybe not really rag problems? And then for the rag problems, how do you prioritize them? How do you define success? How do you come up with a proper test set?' Douwe Kiela: 'so that you can evaluate whether it actually works, what is the process for after that doing what we call UAT, user acceptability testing. So putting it in front of real people, that's really the thing that really matters, right? Sometimes we see production deployments and they're in production and then I ask them how many people use this and the answer is zero.' Douwe Kiela: 'Yes. So I think it's very tempting to pretend that AI products are mature enough to be fully self-serve and standalone. It's sort of decent if you do that, but in order to get it to be really great, you just need to put in the work.' Jon Turow: 'I want to talk about two sides of the organization that you've had to build in order to bring all this for customers. One is scaling up the research and engineering function to keep pushing the envelope. And there are a couple of very special things that Contextual has, something you call React 2.0, something you call active versus passive retrieval. Can you talk about some of those innovations that you've got inside Contextual and why they're important?' Douwe Kiela: 'We really want to be a frontier company, but we don't want to train foundation models. I mean, obviously that's a very, very capital intensive business. I think language models are going to get commoditized. The really interesting problems are around how do you build systems around these models that solve the real problem.' Douwe Kiela: 'And so most of the business problems that we encounter, they need to be solved by a system. So then there are a ton of super exciting research problems around how do I get that system to really work well together? So that's what RAC 2.0 is in our case. So like, how do you jointly optimize these components so that they can work well together?' Douwe Kiela: 'Yeah. So passive retrieval is basically old school rag. It's like I get a query and I always retrieve. And then I take the results of that retrieval and I give them to the language model and it generates. So that doesn't really work. Very often you need the language model to think, first of all, where am I going to retrieve it from? And like, how am I going to retrieve it? Are there maybe better ways to search for the thing I'm looking for?' Jon Turow: 'This implies two uses of two relationships of contextual and RAG to the agent. There is the supplying of information to the agent so that it can be performant. But if I probe into what you said, active retrieval implies a certain kind of reasoning. Maybe even' Douwe Kiela: 'Yeah, exactly. So it's like I enjoy saying, everything is contextual. That's very true for an enterprise, right? So the context that the data exists in, that really matters for the reasoning that the agent does in terms of finding the right information that all comes together in these RAG agents.' Douwe Kiela: 'What is a really thorny problem that you'd like your team and the industry to try and attack in the coming years? The most interesting problems that I see everywhere in enterprises are at the intersection of structured and unstructured. And so we have great companies working on unstructured data. There are great companies working on structured data. But once' Douwe Kiela: 'So they are different components right? It's just Despite what some people maybe like to pretend, I can always train up a better Texas SQL model if I specialize it for Texas SQL than taking a generic off-the-shelf language model and telling it, like, generate some SQL query. So specialization is always going to be better than generalization.' Jon Turow: 'Can you talk about active versus passive retrieval? Yeah. So passive retrieval is basically old school rag. It's like I get a query and I always retrieve. And then I take the results of that retrieval and I give them to the language model and it generates. So that doesn't really work. Very often you need the language model to think, first of all, where am I going to retrieve it from? And like, how am I going to retrieve it? Are there maybe better ways to search for the thing I'm looking for?' Douwe Kiela: 'Yeah. First of all, I think our researchers are really special in that we're not focused on like publishing papers or like being too far out on the frontier. As a company, I don't think you can afford that until you're much bigger and if you're like Zuck and you can afford to have FAIR. And the stuff I was working on at FAIR at the time, I was doing like Wittgensteinian language games and like all kinds of crazy stuff that I would never let people do here, honestly.' Douwe Kiela: 'But there's a there's a place for that. And that's not a startup. So the way we do research is we're very much looking at what the customer problems are that we think we can solve better than anybody else. And then really just focusing again, like thinking from the system's perspective about all of these problems. How can we make sure that we have the best system and then make that system jointly optimized and really specialized or specializable for different use cases? That's that's kind of.' Douwe Kiela: 'For like AI companies or AI native companies like us, if you compare this generation of companies with like SaaS companies, there is like, okay, all like the LAMP stack, everything was already there. You just have to basically go and like implement it. That's not the case here is that we're very much just figuring out what we're doing, like flying the airplane as we're building it sort of thing, which is exciting, I think.' Douwe Kiela: 'Yeah. So, I mean, that's kind of my personal journey as well, right? I started off like I did a PhD. I was very much like a pure research person. And' Douwe Kiela: 'I think there's going to be people problems and organizational problems and regulatory and domain constraints that fall outside the bounds of the paper? I would maybe argue that those are the main problems to still be overcome. I don't care about AGI and all of those discussions. I think the core technology is already here for huge economic disruption.' Douwe Kiela: 'So all the building blocks are here. The questions are more around how do we get lawyers to understand that? How do we get the MRM people to figure out what is an acceptable risk? One thing that we are very big on is not thinking about the accuracy, but thinking about the inaccuracy. And what do you do with the, like, if you have 98% accuracy, what do you do with the remaining 2% to make sure that you can mitigate that risk?' Jon Turow: 'What do new founders ask you? What kind of advice do they ask you? They ask me a lot about this like wrapper company thing and modes and differentiation. I think there's some fear that like incumbents are just going to eat everything. And so they obviously have amazing distribution. But yeah, I think there are just massive opportunities for companies to be AI native companies.' Douwe Kiela: 'Yeah, I think that's a very interesting question. I would argue like how many PhDs does Zuck have working for him? That's a lot, right? It's a lot. I don't think it really matters like how deep your expertise in a specific domain is. As long as you are a good leader and a good visionary, then you can recruit the PhDs to go and work for you.' Douwe Kiela: 'It's fine to be a wrapper company as long as you have an amazing business. People should have a lot more respect for companies building on top of fundamental new technology and then just discovering whole new business problems that we didn't really exist, new existed, and then solving them much better than anything else.' Douwe Kiela: 'I think so. I mean, I am also learning a lot of this myself, like about how to be a good founder basically. But I think it's always good to sort of plan for what's going to come and not for what is here right now. And that's how you really get to ride that wave in the right way. And so what's going to come is that a lot of this stuff is going to become much more mature.' Douwe Kiela: 'What do new founders ask you? What kind of advice do they ask you? They ask me a lot about this like wrapper company thing and modes and differentiation. I think there's some fear that like incumbents are just going to eat everything. And so they obviously have amazing distribution. But yeah, I think there are just massive opportunities for companies to be AI native companies.' Jon Turow: 'What is some advice that you've gotten? And I'll actually ask you to break it into two. What is advice that you've gotten that you disagree with? And what do you think about that? And then what is advice that you've gotten that you take a lot from?' Douwe Kiela: 'Maybe we can start with the advice I really like, which is one observation around why Facebook is so successful. It's like be fluid like water. It's like whatever the market is telling you or your users are telling you, like fit into that. Don't be too rigorous in like what is right and wrong. Just like be humble, I think, and just like look at what the data tells you and then try to optimize for that. That is advice that when I got it, I didn't really appreciate it fully. And I'm starting to appreciate that much more right now. Honestly, it took me too long to understand that.' Douwe Kiela: 'In terms of advice that I've gotten that I disagree with, It's very easy for people to say, like, you should do one thing and you should do it well. Sure, maybe, but I'd like to be more ambitious than that. So we could have been like one small part of a rag stack and we probably would have been the best in the world at that particular thing. But then we're just slotting into this ecosystem where we're just like a small piece.' Jon Turow: 'So if I page back a little bit and we get back into the technology for a minute, there's a common question, maybe even misunderstanding that I hear about RAG, that, oh, this is the thing that's going to solve hallucinations. And you and I have spoken about this so many times. Where is your head at right now on that?' Douwe Kiela: 'What hallucinations are, what they are not, does Rags solve it? What's the outlook there? I think like hallucination is not a very technical term. That's right. So we used to have a pretty good word for it. It was just accuracy. And so if you were like inaccurate, if you were wrong, then one way I guess to explain that or to anthropomorphize it would be to say, oh, the model hallucinated. I think it's a very ill-defined term, honestly.' Douwe Kiela: 'If I would have have to try to turn it into a technical definition. I would say the generation of the language model is not grounded in the context that it's given, where it is told that context is true. So basically, hallucination is about groundedness. If you have a model that really adheres to its context, then it will hallucinate less.' Douwe Kiela: 'What are some of the things that you might have believed a year ago about AI adoption or AI capabilities that you think very differently about today? Many things. The main thing I thought that turned out not to be true was that I thought this would be easy.' Douwe Kiela: 'What is this? This, like building the company and solving real problems with AI. I think we were very naive, especially in the beginning of the company. We were like, "Oh yeah, we just get a research cluster, get a bunch of GPUs in there, we train some models, it's going to be great."' Douwe Kiela: 'What are we, either you and I, or are we the industry not talking about nearly enough that we should be? Evaluation. I've been doing a lot of work on evaluation in my research career. Things like DynaBench where it was really about like how do we hopefully maybe get rid of like benchmarks altogether and sort of have a more dynamic way to measure model performance.' Douwe Kiela: 'But evaluation is just very boring. People don't seem to care about it. I care deeply about it. So that always surprises me. Like we did this amazing launch, I thought, around LM unit. It's natural language unit testing. So you have a response from a language model and now you want to check very specific things about that response. It's like, did it contain this? Did it not make this mistake? Like ideally, you can write unit tests as a person for what a good response looks like.' Douwe Kiela: 'So whoever is lucky enough to get that cool JP Morgan head of AI job that you would be doing in another life, is that intellectual property of JP Morgan what the evals really need to look like? Or is this something that they can ultimately ask Contextual to cover for them? No, so I think the tooling for evaluation they can use us for.' Douwe Kiela: 'but the actual expertise that goes into that evaluation, so the unit tests, They should write that themselves, right? Like in the limit we talked about, like a company is its people, but in the limit that might not even be true, right? Because there might be AI mostly and maybe only a few people. So what makes a company a company is its data and the expertise around that data and sort of the institutional knowledge. And so that is really what defines a company. And so that should be captured in how you evaluate the systems that you deploy in your company.'

Deep Dive

Chapters
This chapter details the creation of RAG at Facebook AI Research, highlighting its initial goal of grounding language models in external text, particularly Wikipedia. It emphasizes the collaboration with other researchers and the role of vector databases in enabling the combination of retrieval and generative models.
  • RAG originated from grounding language models in external text.
  • Initial grounding attempts used Wikipedia.
  • Collaboration with researchers at Facebook and elsewhere was crucial.
  • Early RAG models were multimodal, though primarily language-focused in application.

Shownotes Transcript

What does it take to invent a foundational AI paradigm — and then build a company to bring it to the enterprise?

In this episode of Founded & Funded, Madrona Partner Jon Turow sits down with Douwe Kiela, co-founder and CEO of Contextual AI and the co-inventor of RAG (Retrieval Augmented Generation). They dive into the origins of RAG, its misunderstood role in the enterprise, and how Contextual is redefining what production-grade AI systems can do. Douwe shares what most companies get wrong about RAG, why chunking shouldn't matter, how to think about hallucinations, and what founders need to know in the era of RAG agents.

Transcript: https://madrona.com/rag-inventor-talks-agents-grounded-ai-and-enterprise-impact)

Chapters:

(00:00) Introduction  (01:27) The Origin of RAG (04:00) Challenges and Innovations in RAG (09:49) Enterprise Adoption and Use Cases (20:46) Scaling and Innovations at Contextual AI (23:39) The Future of RAG Agents (24:43) Challenges in Enterprise Data  (26:34) Building a Research-Driven Company (27:55) The Intersection of Research and Product (32:10) Advice for Founders and AI Companies (38:14) Understanding and Addressing Hallucinations (40:50) Company Building is Harder Than You’d Think (42:00) The Importance of Evaluation in AI (44:14) Concluding Thoughts