We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Windsurf: The Enterprise AI IDE - with Varun and Anshul of Codeium AI

2024/12/13

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

AI Deep Dive AI Insights AI Chapters Transcript

People

Alessio

Anshul

Varun Mohan

Topics

Varun Mohan：Codeium 的扩展程序已拥有超过 80 万开发者用户，并获得了许多大型企业的认可，在企业市场取得了显著进展。Windsurf IDE 是 Codeium 的最新产品，旨在为所有开发者构建最强大的 AI 系统，支持多种平台和编程语言，克服了 VS Code 平台的限制，并具备强大的代码执行和推理能力，目标是帮助开发者从最初的想法到最终代码的整个过程。Codeium 的战略随着时间的推移而演变，从专注于模型推理到关注更高级别的抽象，目前在自动补全和超级补全等方面仍然使用自研模型，并在检索系统中应用了强大的大型语言模型。Codeium 认为，目前将重点放在大型企业客户上，而不是个人开发者身上，因为大型企业客户的切换成本更高，并且可以为其提供更具差异化的服务。Codeium 的评估方法侧重于处理不完整代码，并预测开发者的意图，致力于创建更有效的软件开发评估指标，并对现有的基准测试进行改进。Codeium 认为，端到端代码创建需要在 IDE 中实现，并且需要在保证安全性的前提下逐步实现，目标是减少人工干预，并提高执行速度。Codeium 的产品策略是为个人开发者和大型企业提供相同的系统，这使得其能够快速响应企业需求。Codeium 认为，开发者对产品的喜爱是企业客户接受该产品的重要因素，并致力于构建开发者喜爱的产品。Codeium 认为，尽早考虑企业级需求可以避免后期重构的麻烦，并强调了公司在安全、合规性、个性化、使用情况分析、延迟预算和可扩展性等方面的投入。Codeium 认为，在构建软件时，需要权衡“构建”和“购买”的利弊，并根据公司核心竞争力做出选择，同时关注个人开发者和企业客户的需求，是公司成功的关键。 Anshul：Windsurf IDE 通过理解开发者的编辑轨迹来预测其意图，从而提供更精准的代码建议，并克服了 VS Code API 的限制，整合了知识检索系统、工具访问和上下文感知系统。Cascade 是 Windsurf 的核心功能，能够根据人类和 AI 的行为轨迹提出代码更改建议并执行代码。Codeium 的目标是最终实现端到端代码创建，但需要解决安全性和效率等问题，并计划改进 Windsurf 和 Cascade 的知识检索、工具集成和用户界面等方面，并计划推出更多功能，例如多代理系统。Codeium 使用用户偏好数据来改进其超级补全功能，并认为其在用户体验方面取得了进展，并计划进一步改进。Codeium 认为，构建可扩展的销售团队需要具备技术理解能力和对产品发展的敏锐洞察力。

Deep Dive

Key Insights

Why did Codeium decide to create Windsurf, an AI IDE?

Codeium created Windsurf to build the most powerful AI system for developers, recognizing the limitations of the VS Code ecosystem in providing a seamless, agentic experience. They wanted to offer a tool that could reason about large code bases, execute code, and provide a more intuitive and magical user experience.

Why do Codeium's extensions continue to gain popularity despite the existence of alternatives?

Codeium's extensions are popular because they are platform-agnostic and work across various source code management tools, including GitLab, Bitbucket, and others. This flexibility allows developers to use Codeium regardless of their preferred development environment, which is crucial for enterprise customers with diverse tech stacks.

Why did Codeium decide to keep their autocomplete free and focus on the enterprise market?

Codeium kept their autocomplete free to build a strong user base and differentiate their product. They focused on the enterprise market to drive durable revenue and ensure their technology can solve complex, high-value problems for large organizations. Individual developers can switch products quickly, while enterprise customers provide a more stable and significant business opportunity.

Why are Codeium's future plans for Cascade and Windsurf exciting?

Future plans for Cascade and Windsurf include improving knowledge retrieval, suggesting and executing terminal commands, and providing more detailed insights into developer actions. The team aims to make the AI more intuitive and powerful, with features like Clippy-style suggestions and better handling of multi-step tasks, enhancing the overall developer experience.

Why does Codeium emphasize the importance of developer love in their enterprise strategy?

Codeium believes that developers loving their tools is crucial for enterprise adoption. Unlike many enterprise software products that developers dislike, Codeium ensures its tools are intuitive and valuable. This approach builds developer loyalty, which is essential for long-term success and minimizing churn.

Why does Codeium avoid launching products with a waitlist?

Codeium avoids waitlists to remain accessible and focused on delivering a polished product. They prefer being seen as a reliable, boring company rather than one that launches with a lot of hype and uncertainty. This approach helps them build trust and ensure that their products are ready for immediate use.

Why does Codeium hire both AI believers and skeptics?

Codeium hires both AI believers and skeptics to balance innovation and realism. Believers push the boundaries of what's possible, while skeptics, who often have high standards and a background in complex technology, ensure that the product meets rigorous quality and security requirements. This mix helps the company avoid unrealistic expectations and focus on practical, effective solutions.

Why did Codeium choose to build their own inference runtime instead of using third-party solutions?

Codeium built their own inference runtime to maintain control over core competencies and ensure high-quality, low-latency experiences. They believe that certain aspects of their technology, such as autocomplete, require proprietary solutions to achieve the best results. This approach allows them to optimize performance and user experience, which is essential for their product's success.

Why does Codeium find it challenging to hire a sales team for their AI products?

Hiring a sales team for AI products is challenging because it requires individuals with intellectual curiosity and a deep understanding of the technology. Codeium ensures their sales team can scale and maintain a high level of product knowledge, which is critical for explaining and demonstrating the value of their AI solutions to customers.

Why did Codeium choose to support multiple IDEs and not just focus on Windsurf?

Codeium supports multiple IDEs to meet developers where they are, ensuring their AI tools are accessible regardless of the development environment. This strategy is particularly important for enterprise customers who use a variety of IDEs, such as JetBrains for Java and Eclipse for older projects. By supporting multiple IDEs, Codeium maximizes the value of their AI for a broader developer audience.

Chapters

This chapter covers Codeium's incredible growth, highlighting its expansion from 10,000 to over 800,000 users and its success in the enterprise market. It also touches upon the challenges of supporting diverse development environments and the reasons behind Codeium's decision to build its own IDE.

Codeium's user base grew from 10,000 to over 800,000 developers.
Awarded JPMorgan Chase's Hall of Innovation Award.
Significant traction in the enterprise space with clients like Dell.
Recognized the limitations of relying solely on VS Code extensions.

Shownotes Transcript

Translations:

中文

Hey everyone, welcome to the Latents-Based Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swix, founder of SmallAI. Hey, and today we are delighted to be, I think, the first podcast in the new Codium office. So thanks for having us and welcome Farun and

Thanks for having us. Yeah, thanks for having us. This is the Silicon Valley office? Yeah. So what's the story behind this? The story is that the office was previously... So we used to be on Castro Street, so this is in Mountain View. And I think a lot of the people at the company previously were NSF or still NSF. And actually, one thing if you notice about the company is it's actually like a two-minute walk from the Caltrain. And I think we didn't want to move the office very far away from the Caltrain. That would probably...

piss off a lot of the people that lived in San Francisco, this guy included. So we were scouting a lot of spaces in the nearby area and this area popped up. It previously was being leased by I think Facebook/WhatsApp and then immediately after that Ghost Autonomy and then

And now here we are. And we also, you know, I guess one of the things that the landlord told us was this was the place that they shot all the scenes for Silicon Valley, at least like externally and stuff like that. So that just became a meme. Trust me, that wasn't like the main reason why we did it. But we've leaned into it. It doesn't hurt. Yeah. And obviously that played a little bit into your launch with Windsurf as well. So let's get caught up. You were guest number four? I think it was two. Maybe it was two. Two. Yeah.

So a lot has happened since then. You've raised a huge round and also just launched your idea. What's been the progress over the last year or so since the InSpace people last saw you? Yeah, so I think the biggest things that have happened are Codium's extensions have continued to gain a lot of

popularity. We have over 800,000 developers that use that product. Lots of large enterprises also use the product. We were recently awarded JPMorgan Chase's Hall of Innovation Award, which is usually not something a company gets within a year of deploying an enterprise product. And then large companies like Dell and stuff use the product. So I think we've seen a lot of traction on the enterprise space

But I think one of the most exciting things we've launched recently is this actually IDE called Windsurf. And I think for us, one of the things that we've always thought about is how do we build the most powerful AI system for developers everywhere? The reason why we started out with the extension system was we felt that there were lots of developers that were not going to be on one platform. And that still is true, by the way. Outside of Silicon Valley, a lot of people don't use GitHub.

This is like a very surprising finding, but most people use GitLab, Bitbucket, Garrett, Perverse, CVS, Harvest, Mercurial. I could keep going down a list, but there's probably 10 of them. GitHub might have less than 10% penetration of the Fortune 500, full penetration. It's very small. And then also on top of that, GitHub has very high switching costs for source code management tools, right? Because you actually need to switch over all the dependent systems on this workflow software. It's much harder than even switching off of a database, right?

So because of that, we actually found ways in which we could be better partners to our customers regardless of where they stored their source code. And then more specifically on the IDE category, a lot of developers, surprise, surprise, don't just write TypeScript and Python. They write Java. They write Golang. They write a lot of different languages. And then high-quality language servers and debuggers matter. Very honestly, JetBrains has the best debugger for Java.

It's not even close, right? These are extremely complex pieces of software. We have customers where over 70% of their developers use JetBrains. And because of that, we wanted to provide a great experience wherever the developer was. But one thing that we found was lacking was, you know, we were running into the limitations of building within the VS Code ecosystem on the VS Code platform. And I think we felt that there was an opportunity for us to build a premier sort of experience, right?

And that was within the reach of the team, right? The team has done all the work, all the infrastructure work to build the best possible experience, right? And plug it into every ID. Why don't we just build our own ID that is by far the best experience? And as these agentic products start to become more and more possible, and all the research we've done on retrieval and just reasoning about code bases became more and more to life. We were like, hey, if we launch this agentic product on top of a system that we didn't have a lot of control over, right?

It's just going to limit the value of the product, and we're just not going to be able to get the best tool. That's why we were super excited to launch Windsurf. I do think it is the most powerful IDE system out there right now in the capability, right? And this is just the beginning. I think we suspect that there's much, much more we can do, more than just the autocomplete sort of side, right? When we originally talked, probably autocomplete was the only piece of functionality the product actually had. And we've come a long way since then, right? These systems can now reason about large code bases without you adding everything, right?

Like when you use Google, do you say like at New York Times post, blah, blah, blah, and like ask it a question? No, no. We want it to be a magical experience where you don't need to do that. We want it to actually go out and execute code. We think code execution is a really, really important piece.

And when you write software, you not only just kind of come up with an idea, the way software kind of gets created is software is originally this amorphous blob. And as time goes on and you have an idea, the blob and the cloud sort of disappear and you see this mountain. And we want it to be the case that as soon as you see the mountain, the AI helps you get to the mountain. And as soon as you see the mountain, the AI just creates the mountain for you.

And that's why we don't believe in this sort of modality where you just write a task and it just goes out and does it. It's good for zero-to-one apps, and I think people have been seeing Windsurf as capable of doing that, and I'll let Anshul talk about that a little bit. But we've been seeing real value in real software development, which is more to say, this is not to say that current tools can't,

But I think more in the process of actually evolving code from a very basic idea. Code is not really built as you have a PRD and then you get some output out. It's more like you have a general vision. And yes, and as you write the code, you get more and more clarity on approaches that don't work and do work. You're killing ideas and creating ideas constantly. And we think Windsurf is the right paradigm for that. Can you spell out what you couldn't do in VS Code? Because I think when we did the cursor episode, explain, and then everybody on AgriNews is like, oh.

oh, why did you fork? You could have done it in an extension. Can you maybe just explain more of those limitations? I mean, I think a lot of the limitations around APIs are pretty well documented. I don't know if we need to necessarily go down that rabbit hole. I think it was when we started thinking, okay, what are the pieces that we actually need to give the AI to get to that kind of emergent behavior that Bruin talked about?

And yes, we were talking about all the knowledge retrieval systems that we've been building for the enterprise all this time. That's obviously a component of that. We were talking about all the different tools that we could give it access to so they can go do that kind of terminal execution and things like that.

And the third main category that we realized would be like kind of that magical thing where you're not out there writing out a PRD, you're not scoping the problem for the AI, is that if we're actually being able to understand the kind of the trajectory of what developers are doing within the editor, right? If we actually are being able to see like, oh, the developer just went and opened up this part of the directory and tried to view it, then they made these kind of edits and they tried to do like some kind of commands in the terminal. And if we actually understand that trajectory, then our ability for the AI to just be immediately be like, oh, I

oh, I understand your intent. This is what you want to do without you having to spell it all out for it. That is one that kind of like magic would really happen. I think that was kind of like that intuition. So you have the restrictions of the APIs that are well-documented. We have the kind of vision of what we actually need to be able to hook into to really expose this. And I think it was that combination of those two where we were like, I

I think it's about time to do the editor. The editor was not necessarily a new idea. I think we've been talking about the editor for a very long time. I think it's like, of course, we just pulled it all together in the last couple of months, but it was always something in the back of the mind. And it's only when we started realizing, okay, the models are not capable of doing this. We actually can look at this data. We have a really good context awareness system. We're like, I think now's the time. And we went on and executed on it. So it's busy now.

It's not like one action you couldn't do, but it's like how you brought it all together. It's like the VS Code kind of like sandbox, so to speak. Yeah, let me maybe even just to go one step deeper on each of the aspects that Anshul talked about. Let's go with the API aspect. So right now, I'll give you an example. Supercomplete is actually a feature that I think is very exciting about the product. It can suggest refactors.

of the code. I think you can do it quickly and very powerfully. On VS Code, actually the problem for us wasn't actually being able to implement the feature. We had the feature for a while. The problem was actually even to show the feature, VS Code would not expose an API for us to do this. So what we actually ended up doing was dynamically generating PNGs to actually go out and showcase this

It was not really aligned. We actually ended up doing it ourselves and it took us a couple hours to actually go out and implement this, right? And that wasn't because we were bad engineers. No, our good engineering time was being spent fighting against the system rather than being a good system. Another example is we needed to go out and find ways to refactor the code. The VS Code API would constantly keep breaking on us.

We constantly need to show a worse and worse experience. This actually comes down to the second point which Anshul brought up, which is like we can come up with great work and great research. All the work we have here is not like the research on Cascade is not like a couple month thing. This is like a nine months to a year thing that we've been investigating as a company. Investing in on evals, right? Even the evals for this are a lot of effort, right? A lot of actually systems work to actually go out and do it. But ultimately, like this needs to be a product that developers actually use.

And I think, you know, let's even go for Cascade, for example, and looking at the trajectory. Can you define Cascade? Because that's the first time you brought it up. Yeah, so Cascade is the product that is the actual agentic part of the product, right? That is capable of taking information from both these human trajectories and these AI trajectories, what the human ended up doing, what the AI ended up doing, to actually propose changes and actually execute code to finally get you the final work output, right? I'll even talk about something very basic. Cascade gives you a bunch of code. We want developers to very easily be able to review this code.

Okay, then we can show developers a hideous UI that they don't want to look at, and no one's going to really use this product. And we think that this is like a fundamental building block for us to make the product materially better. If people are not even willing to use the building block, where does this go, right? And we just felt our ceiling was capped on what we could deliver as an experience. Interestingly, JetBrains is a much more configurable paradigm than...

than VS Code is. But we just felt so limited on both the sort of directions that Anshul said that we were just like, hey, if we actually remove these limitations, we can move substantially faster. And we believe that this was a necessary step for us. I'm curious more about the evals set because you brought it up. And we have to ask about evals anytime anyone brings up evals. How do you evaluate a thing like this that is so multi-step and so...

spanning so much context. So what you can imagine we can sort of do, and this is one of the beautiful things about code is code can be executed. We can go take a bunch of open source code, we can find a bunch of commits, and we can actually see if some of these commits have tests associated with them. We can start stripping the commits, and the approach of stripping the commits is

is good because it tests the fact that the code is in an incomplete state, right? When you're writing the commit, the goal is not the commit has already been written for you. You're given it in a state where the entire thing has not been written. And can we go out and actually retrieve the right snippets and actually come up with a cohesive plan and iterative loop that gets you to a state where the code actually passes? So you can actually break down and decompose this complex problem into a planning, retrieval, and multi-step execution problem. And you can see on every single one of these axes is it getting better. And if you do this across enough repositories,

You've turned this highly discontinuous and discrete problem of make a PR work versus make it not work into a continuous problem. And now that's a hill you can actually climb. And that's a way that you can actually apply research where it's like, hey, my retrieval got way better. This made my eval get better. Right?

And then notice how the way the eval works is I'm not that interested in the eval where purely it's a commit message and you finish the entire thing. I'm more interested in the code is in an incomplete state and the commit message isn't even given to you because that's another thing about developers. They are not willing to tell you exactly what's in their head. That's the actual important piece of this problem. We believe that developers will never completely pose the problem statement.

Because the problem statement lives in their head. Conversations that you and I have had at the coffee area. Conversations that I've had over Slack. Conversations I've had over Jira. Maybe not Jira, let's say linear. That's the cool thing nowadays. We're talking about Jira. Conversations I've had on linear. And all of these things come together to actually finally propose a solution there, which is why we want to test the incomplete code. What happens if the state is in an incomplete state and am I actually able to make this pass without the commit? And can I actually guess your commitment?

well. Now you can convert the problem into a mask prediction problem where you want to guess both the high-level intent and as well as the remainder of changes to make the actual test pass. And you can imagine if you build up all of these, now you can see, hey, my systems are getting better. Retrieval quality is getting better. And you can actually start testing this on larger and larger code bases.

And I guess that's one thing that we, honestly, to be honest, we could have done a little faster. We had the technology to go out and build these zero-to-one apps very quickly. And I think people are using Windsurf to actually do that. And it's extremely impressive. But the real value, I think, is actually much deeper than that. It's actually that you take a large code base, and it's actually a really good first pass. And I'm not saying it's perfect, but it's only going to keep getting better. We have deep infrastructure to that actually is validating that we're getting better on this dimension. Right.

We mentioned the end-to-end evals that we have for the system, which I think are super cool. But I think you can even decompose each of those steps. The ideas of just take retrieval, for example. How can we make eval for retrieval really good? And I think this is just a general thing that's been true about us as a company. Most evals and benchmarks that exist out there for software development is kind of bogus.

There's not really a better way of putting it. Like, okay, you have SweBench, that's cool. No actual professional work looks like SweBench. Like, human eval, same thing. These things are just a little kind of broken. So when you're trying to optimize against a metric that's a little bit broken, you end up making kind of suboptimal decisions. So something that we're always very keen on is like, okay, what is the actual metric that we want to test for this part of the system? And so take Redrival, for example. A lot of the benchmarks for these embedding-based systems are like needle in the haystack.

problems. Like I want to find this one particular piece of information out of all this potential context.

That's not really what actually is necessary for doing software engineering because code is a super distributed knowledge store. You actually want to pull in snippets from a lot of different parts of the code base in order to do the work. And so we built systems that instead of looking at retrieval at one, you're looking at retrieval at like 50. What are the 50 highest things that you can actually retrieve and are you capturing all of the necessary pieces for that? And what are all the necessary pieces? Well, you can look again back at old commits and see what were all the different files that

were edited to make a commit because those are semantically similar things that might not actually show if you actually try to map out a code graph. And so we can actually build these kind of golden sets. We can do this evaluation even for sub-problems in the overall task.

And so now we have an engineering team that can iterate on all of these things and still make sure that the end goal that we're trying to build to is really, really strong so that we have confidence in what we're pushing out. And by the way, just to talk, let's say one more thing about the sweep bench thing, just to showcase these existing metrics. I think benchmarks are not a bad thing. You do want benchmarks. Actually, I would prefer if there were benchmarks versus let's say everything was just vibes, right?

But vibes are also very important, by the way, because they showcase where the benchmark is not valuable. Because actually, vibes sometimes show you where criminal issues exist in the benchmark. But you look at some of the ways in which people have optimized WeBench. It's like, make sure to run PyTest every time X happens. And it's like, yeah, sure. You can start prompting it in every single possible way. And if you remove that, suddenly it doesn't get good at it.

It's like, what really matters here? What really matters here is like across a broad set of tasks, you're performing like high quality sort of suggestions for people and people love using the product. And I think actually like the way these things work is beyond a certain point, because yes, I actually think it's valuable beyond a certain point. But once it starts hitting the peak of these benchmarks,

getting that last 10% actually probably is like counterintuitive to the actual goal of what the benchmark was. Like you probably should find a new hill to climb rather than sort of p-hacking or really optimizing for how you can get higher on the benchmark.

Yeah, we did an episode with Anthropic about their recent suite agent, suite bench results. And we talked about the human eval versus suite bench. And human eval is kind of like a Greenfield benchmark. You need to be good at that. Suite bench is more existing. But it sounds like, I mean, your eval creation is similar to suite bench as far as using GitHub commits and kind of like that history. But then it's more like masking at the commit level versus just testing the output of the thing. That's right.

Cool. We have some listener questions actually about the Windsurf launch and obviously I also want to give you the chance to just respond to Hacker News. Yeah.

Oh, man. Let me tell you something very, very interesting. I love Hacker News as much as the next person, but the moment we launched our product, the first comment, this was a year ago, the first comment was, this product is a virus. This was the original Codium launch like two years ago. I am analyzing the binary as we speak. We'll report back. And then he was like, it's a virus. And I was like, dude, it's not a virus. Yeah.

We just want to give autocomplete suggestions. That's all we want to do. Yeah. Okay. Wow, I didn't expect that. And then there was like Teodrama. There's enough drama on the launch to cover, but I don't know if we want to just make this a cascade piece. But we had a bunch of people in our Discord try out the product, give a lot of feedback. One question people have is like,

to them, Cascade already felt pretty agentic. Like, is that something you want to do more of? You know, obviously, since you just launched an IDE, you're kind of like, you're focusing on having people write the code. But maybe this is kind of like the Trojan horse to just do more full-on end-to-end, like, code creation. Dev-in style. Yeah, I think it's like, how do you get there in a...

in a real principled manner. We obviously have enterprise asking us all the time, oh, when's it going to be end-to-end work? The reality is, okay, well, if we have something in the IDE that, again, can see your entire actions and get a lot of intent that you can't actually get if you're not in the IDE, if

the agent there has to always get human involvement to keep on fixing itself. It's probably not ready to become a full end-to-end automated system because then we're just going to turn into a linter where it produces a bunch of things and no one looks at any of it. That's not the great end state. But if we start seeing, oh yeah, there's common patterns that people do that like,

never require human involvement, just end-to-end just totally works without any intent-based information. Sure, that can become fully agentic and we will learn what those tasks are pretty quickly because we have a lot of data. Maybe add on to that. I think that if the answer is full agentic is Devin, I think yes, the answer is this product should become fully agentic

and limited human interaction is the goal, is 100% the goal. And I think, honestly, of all usable products right now, I think we're the closest right now, of all usable products in an IDE. Now, let me caveat this by saying, I think there are lots of hard problems that have yet to be solved that we need to go out and solve to actually make this happen. Like, for instance, I think one of the most annoying parts about the product is the fact that you need to accept every command that kind of gets run.

It's actually fairly annoying. I would like it to go out and run it. Unfortunately, me going out and running arbitrary binaries has some problems in that if it like RMRs my hard disk, I'm not going to be... It's actually a virus. I'm not going to actually... It's a virus. The hacker needs to be with you. Yeah, it does become a virus. I think this is solvable with complex systems. I think...

we love working on complex systems infrastructure, I think we'll solve it. Now, the simpler way to go about solving this is don't run it on the user's machine and run it somewhere else, because then if you bought that machine, you're kind of totally fine. Now, I think though, maybe there's a little bit of trade-off of like running it locally versus remotely. And I think we might change our mind on this. But I think the goal for this is not for this to be the final state. I think the goal for this is, A, it's actually able to do very complex tasks with limited human interaction, but it needs to know when to actually go back to the humans.

Also on top of that, compress every cycle that the agent is running. Right now, actually, I even feel like the product is too slow for me sometimes right now. Even with it running really fast, it's objectively pretty fast, I would still want it to be faster. Right?

So there is systems work and probably modeling work that needs to happen there to make the product even faster on both the retrieval side and the generation side. And then finally speaking, I think another key piece here that's really important is I actually think asking people to do things explicitly is probably going to be more of an anti-pattern if we can actually go and passively suggest the entire change for the user. So almost imagine as the user is using the product that we're going to suggest the remainder of the PR without the user even asking us for it.

I think this is sort of the beginning for it. But yeah, these are hard problems. I can't give a particular deadline for this. I think this is a big step up than what we had, particularly in the past. But I think what Anshul said is 100% true. But the goal is for us to get better at this. I mean, the remote execution thing is interesting. You wrote a post about the end of localhost. Yeah.

And now it's almost like then we were kind of like, well, no, maybe we do need the internet and people want to run things. But now it's like, okay, no, actually, I don't really care. I want the model to do the thing. And if you were like, you can do a task end-to-end, but it needs to run remotely, not on your computer, I'm sure most people would say, yeah.

No, I agree with that. I actually agree with it running remotely. That's not a security issue. I totally agree with you that it's possible that everything could run remotely. That's how it is at most big companies like Facebook. Nobody runs things locally. No one does. In fact, you connect to a... You're right on that. Maybe the one thing that I do think is kind of important for these systems that is more than just running remotely is

Basically, when you look at these agents, there's kind of like a rollout of a trajectory. And I kind of want to roll this trajectory back. In some ways, I want a snapshot of the system that I can constantly checkpoint and move back and forth. And then also on top of that, I might want to do multiple rollouts of this. So basically, I think there needs to be a way to almost move forward and move backwards and

And whether that's locally or remotely, I think that's necessary. But every time if you move the system forward, it like destroys your machine. It's probably going to be a hard system to kind of, or potentially destroys your machine. That's just not a workable solution. So I think the local versus remote, I think you still need to solve the problem of this thing is not going to destroy your machine on every execution, if that makes sense. Yeah. There is a category of emerging infrastructure providers that are working on time travel VMs.

And if Varun's first episode on this podcast was any indication, we like infrastructure problems. Yeah, okay. Oh, so you're going there. All right. Okay. Well, that's funny, right? It's like when we first had you, you were doing so much on like actual model inference, optimization, all these things. And today it's almost like...

It's cloud, it's 4.0. It's like, you know, people are like forgetting about the model, you know, and now it's all about at a higher level of extraction. Yeah. So maybe I can say like a little bit about how our strategy on this has like evolved because it objectively has, right? I think I would be lying if I said it hasn't.

The things like autocomplete and supercomplete that run on every keystroke are entirely like our own models. And by the way, that is still because properties like FIM, fill-in-the-middle capabilities, are still quite bad with the current models. Non-existent. They're very bad. Non-existent. They're not good, actually, at it.

Because FIM is an actual, like, how you order the tokens.

When they make point changes, they kind of are like off here and there by a little bit. Because, yeah, when you are doing multi-point kind of like conversations, it's exact GIFs getting applied is not like even a perfect science still yet. So we care about that. The second piece where we've actually sort of trained our models is actually on the retrieval system. And this is not even for embedding, but like actually being able to use high-powered LLMs to be able to do much higher quality retrieval.

across the code base, right? So this is actually what Anshul said. For a lot of the systems, we do believe embeddings work, but for complex questions, we don't believe embeddings can encapsulate all the granularity of a particular query. Like imagine I have a question on a code base of, find me all quadratic time algorithms in this code base. Do we genuinely believe the embedding can encapsulate the fact that this function is a quadratic time function? No. No.

I don't think it does. So you are going to get extremely poor precision recall at this task. So we need to apply something a little more high-powered to actually go out and do that. So we've actually built large distributed systems to actually go out and run these at scale, run custom models at scale across large code bases. So I think it's more a question of that. The planning models right now, undoubtedly, I think the clods and the open AIs

have the best products. I think Lama 4, depending on where it goes, it could be materially better. It's very clear that they're willing to invest a similar amount of compute as the Open AIs and the Anthropics. So we'll see. I would be very happy if they got really good, but unclear so far. Don't forget Grok. Hey, dude, I think Grok is also possible.

I think don't doubt Elon. Okay, so I didn't actually know. It's not obvious when I use Cascade. I should also mention that I was part of the preview. Thanks for letting me in. I've been maining Windsurf for a long time. It's not actually obvious. You don't make it obvious that you are running your own models. I feel like you should. So that I feel like it...

has more differentiation? Like, I only have exclusive access to your models via your IDE than having the drop-down as is cloud and 4.0 because I actually thought that was what you did. No, so actually the way it works is the high-level planning that is going on in the model is actually getting done with products like the cloud. But,

extremely fast retrieval as well as the ability to take the high-level plan and actually apply it to the code base is proprietary systems that are running internally. And then the stuff that you said about embeddings not being enough, are you familiar with the concept of late interaction? No, I actually have never heard of it. Yeah, so this is Colbert, or the guy Omar Khattab from, I think, Stanford has been promoting this a lot. It is basically what you've done. This is sort of

embedding on retrieval rather than pre-embedding. Okay. In a very loose sense. I think that sounds like a very good idea that is very similar to what we're doing. Sounds like a very good idea. I think we'd say that. That's like the meme of Obama giving himself a medal right there.

There might be something to learn from contrasting the ideas and seeing where the study opinion differences. It's also been applied very effectively to vision understanding because vision models tend to just consume the whole image. If you are able to focus on images based on the query, I think that can get you a lot of extra performance. The basic idea of using compute...

in a distributed manner to do operations over a whole set of raw data rather than a materialized view. It's not anything new. I think it's just like, how does that look like for LLMs? When I hear you say build large distributed systems, you have a very strange product strategy of going down to the individual developer but also to the large enterprise. Is it the same in front of everything?

I think the answer to that is yes. The answer to that is yes. And the only reason why for the answer is yes. And to be honest, our company is a lot more complex than I think if we just wanted to serve the individual. And I'll tell you that because we don't really like pay other providers to do things for our indexing. We don't pay like other providers to do our serving of our own customer models. Right.

And I think that's a core competency within our company that we have decided to build, but that's also enabled us to go and make sure that when we're serving these products in an environment that works for these large enterprises, we're not going out and being like, we need to build this custom system for you guys. This is the same system that serves our entire user base. So that is a very unique decision we've taken as a company, and

we admit that there are probably faster ways that we could have done this. I was thinking, you know, when I was working with you for your enterprise piece, I was thinking like this philosophy of go slow to go fast, like build deliberately for the right level of abstraction that can serve the market that you really are going after. Yeah, I mean, I would say like when writing that piece, looking back and reading it back, it sounds so almost obvious in hindsight. Not all of those are really conscious decisions we made. I'll be the first to admit that. But it does help, right? When we go to an enterprise that has tens of thousands of developers working

And they're like, oh, wow, we have tens of thousands of developers. And does your infrastructure work for tens of thousands of developers? We can turn around and be like, well, we have hundreds of thousands of developers on an individual plan that we're serving. I think we'll be able to support you. So being able to do those things, we started off by just like, let's give it to individuals. Let's see what people like and what they don't like and learn. But then those become value propositions when we go to the enterprise. And to recap, when you first came on the pod, it was like auto-completion is free. And Copalo was $10 a month.

And you said, look, what we care about is building things on top of code completion. How did you decide to just not focus on short-term growth monetization of the individual developer and build some of this? Because the alternative would have been, hey, all these people are using it. It's like, we're going to make this other $5 a month plan, monetize. I think...

I think this might be a little bit of commercial instinct that the company has and unclear if the commercial instinct is right. I think that right now, optimizing for making money off of individual developers is probably the wrong, actually, strategy. Largely because I think individual developers can switch off of products very quickly.

And unless we have like a very large lead trying to optimize for making a lot of profit off of individual developers, it's probably something that someone else could just vaporize very quickly and then they move to another product. And I'm going to say this very honestly, right? Like when you use a product like Codium on the individual side, there's not much that prevents you to switch automatically

onto another product. I think that will change with time as the products get better and better and deeper and deeper. I constantly say this, like there's a book in business called like Seven Powers. And I think one of the powers that a business like ours need to have is like real switching costs. But like you first need something in the product that makes people switch on and stay on before you think about how do you make people switch off? And I think

I think for us, we believe that there's probably much more differentiation we can derive in the enterprise by working with these large companies in a way that is interesting and scalable for them. I'll be maybe more concrete here. Individual developers are much more sort of tuned towards small price changes.

They care a lot more, right? Like if our product is 10, 20 bucks a month instead of 50 or 100 bucks a month, that matters to them a lot. And for a large company where they're already spending billions of dollars on software, this is much less important. So you can actually solve maybe deeper problems for them and you can actually kind of provide more differentiation on that angle. Whereas I think...

I think individual developers could be churning as long as we don't have the best product. So focus on being the best product, not trying to take price and make a lot of money off of people. And I don't think we will, for the foreseeable future, try to be a company that tries to make a lot of money off individual developers. I mean, that makes sense. So why $10 a month for Windsurf? $10 a month was actually the pro plan. So we launched our individual pro plan before Windsurf existed. Because I think there's

we also have to be financially responsible. Yeah, we can't run out of money. It's a lot of things because of our infrastructure background. We can give for essentially free unlimited autocomplete, unlimited chat on our faster models. We have a lot of things actually out for free.

But yeah, when we started doing things like the super completes and really large amounts of indexing and all these things, there is real cogs here. We can't ignore that. And so we just created a $10 a month pro plan, mostly just to cover the cost. We're not really operating, I think, on much of a margin there either, but okay, just to cover us there. So for Windsurf, it just ended up being the same thing. And everyone who downloads Windsurf in the first, I

I forget, like a couple of weeks, like two weeks for free. Let's just have people try it out. Let us know what they like, what they don't like. And that's how we've always operated. I've talked to a lot of CTOs in like the Fortune 100 where most of the engineers they have, they don't really do much anyway. The problem is not that the developer costs 200K and you're saving 8K. It's like that developer should not be paid 200K. But that's kind of like the base price.

But then you have developers getting paid $200K that should be paid $500K. So it's almost like you're averaging out the price because most people are actually not that productive anyway. So if you make them 20% more productive, they're still not very productive. And I don't know in the future, is it that the junior developer salaries...

50k, you know, and it's like the bottom of the end gets kind of like squeezed out and then the top end gets squeezed up. Yeah, maybe Alessio, one thing that I think about a lot because I do think about this, the Perseat, anything, all of this stuff, I think about a good deal. Let's take a product like Office 365. I will say a lawyer at Codium uses Microsoft Word way more than I do. I'm still footing the same bill. But the amount of value that he's driving from Office 365 is probably, you know, tens of thousands of dollars.

By the way, everyone, you know, Google Docs, great product. Microsoft Word is a crazy product. It made it so that the moment you review anything in Microsoft Word, the only way you can review it is with other people in Microsoft Word. So it's like this virus that penetrates everything. And it's not only penetrates it within the company, it penetrates it across company too. The amount of value it's driving is way higher for him. So for these kinds of products, there's always going to be for these kinds of products, this variance between who gets value from these products.

And you're right, it's almost like a blended. Because you're actually totally right. Probably this company should be paying that one developer maybe like four times as much. But in a weird way, software is like this team activity enough that there's a bunch of blended outcomes. But hey, like 20% of the four times and there are four people is still going to cover the cost across the four individuals, right? And that's how roughly these products kind of get priced out.

I mean, more than about pricing, this is about the future of the software engineer. We could be very wrong also. Yeah. I think nobody knows. Reserve the right to be incredibly off. Yeah. I mean, business model does impact the product. Product does impact the user experience. So it's all of a kind. I don't mind. We are as concerned about the business of tech as the tech itself. That's cool.

Speaking of which, there's other listener questions. Shout out to Daniel Enfeld who's pretty active in our Discord just asking all these things. Multi-agent. Very, very hot and popular especially from the Microsoft research point of view. Have you made any explorations there? I think we

we have, I don't think we've called it a multi-agent, which is more so like once you, this notion of having many trajectories that you can spawn off, um, that kind of like validate sort of some different hypotheses and you can kind of pick the most interesting one. This is stuff that we've actually analyzed internally at the company. By the way, the reason why we have not put these things in actually is partially because we can't go out and execute some random stuff in parallel in the meantime. Um,

in the meantime on other sides. Because of the side effects. Because of the side effects, right? So there's some things that are a little bit dependent on us unlocking more and more functionality internally. And then the other thing is in the short term, I think there is also a latency component. And I think all of these things can kind of be solved. I actually believe all of these things are solvable problems. They're not unsolvable problems.

And if you want to run all of them in parallel, you probably don't want N machines to go out and do it. I think that's unnecessary, especially if most of them are IO-bound kind of operations where all you're doing is reading a little bit of data and writing out a little bit of data. It's not extremely compute-intensive. I think that it's a good idea and probably something we will pursue and is going to be in the product.

I'm still processing what you just said about things being IO-bound. So for a certain class of concurrency, you can actually just run it all in one machine. Yeah, why not? Because if you look at the changes that are made for some of these, spreading out a couple thousand bytes, maybe tens of thousands of bytes on every tree, that's not a lot.

Very small. What's next for Cascade or Windsurf? Oh, there's a lot. I don't know. We did an internal poll and we were just like, are you more excited about this launch or the launch that's happening in a month? Or what we're going to come out with in a month? And it was almost uniformly in a month. I think there's some obvious ones. I don't know how much, Bryn, you want to say. I don't want to...

to speak of, but I think you'd look at all the same axes of the system, right? Like, how can we improve the knowledge retrieval? Like, we'll always keep on figuring out how to improve knowledge retrieval. In our launch video, we even showed some of, like, the early explorations we have about looking to other data sources. That might not be the coolest thing to the individual developer building a zero-to-one app, but you can really believe that, like, the enterprise customers really think that that's very cool, right? I think on the tool side, I think there's a whole lot more that we can do. I mean, of course, Vern's talked about

not just suggesting the terminal command, but actually executing them. I think that's going to be huge. On lock, you look at the actions that people are taking, like the human actions, the trajectories that we can build, how can we make that even more detailed?

And I think all of those things, and you make some even cleaner UI, like the idea of looking at future trajectories, trying a few different things, and suggesting potential next actions to be taken, that doesn't really exist yet, but it's pretty obvious, I think, how that would look like. You open up Cascade, and instead of starting typing, it's just like, here's a bunch of things that we want to do. We kind of joke that Clippy's coming back, but maybe now's the time for Clippy to really shine. So I think there's a lot of ways that we can take this, which I

which I think is like the very exciting part. We're calling each of our launches waves, I believe, because we want to really double down on the aquatic themes. Oh yeah, does someone actually windsurf at the company? Is that... I don't think so.

We're living out our dream of being cool enough to windsurf through the rocks. I don't think we can. Yeah, all right. That was actually something we learned because I don't think any of us are windsurfers. Like in our launch video, we have someone like using windsurf on a windsurf. You saw that, right? You saw that in the beginning of the video. Someone has a computer. And we didn't realize like now apparently is like the time of the year where there's like not enough wind to windsurf. So we were trying to figure out how to do this like launch video with windsurf on the windsurf. Yeah.

Every windsurfing we talked to, like, yeah, it's not possible. And there was, like, one crazy guy who was like, yeah, I think we can do this. And we made it happen. Oh, okay. Is there anything that you want feedback on? Like, maybe there's a fork in the road. You want feedback. You want people to...

respond to this podcast and tell you what they want yeah i think i think there's a lot of things that i think could be more polished about the product that we'd like to to improve um lots of different environments that we're going to improve performance on and i think we would love to hear uh from folks uh across the gamut like hey like if you have this environment you use windows in x version it didn't work or this language oh yeah it was like very poor i think we would like to hear it um

Yeah, I gave Prev and Kevin a lot of shit for my Python issues. Yeah, yeah, yeah. And I think there's a lot to kind of improve on the environment side. I think, like, for instance, even just a dumb example, and I think sort of swigs this was a common one, is like, yeah, like, the virtual environment, where is the terminal running, what is all this stuff? These are all basic things that, like, to be honest, this is not rocket science, but we need to just fix it, right? We need to fix it. So...

We would love to hear all the feedback from the product. Was it too slow? Where was it too slow? What kind of environments could it work way more in? There's a lot of things that we don't know. Luckily, we're daily users of the product internally, so we're getting a lot of feedback inside. But I will say, there's a little bit of Silicon Valley-ism in that a lot of us develop on Mac. A lot of people, once again, over 80% of developers are on Windows. So yeah, there's a lot to learn and probably a lot of improvements down the line. Have you personally attempted, as your CEO of the company, to switch to Windows just to feel...

something. You know what? Maybe I should. Actually, I think I will. Your customers, everyone says it's 89% all Windows. You live in Windows, you will never not see something that missed. So I think in the beginning, part of the reason why we were hesitant to do that was

is a lot of our architectural decisions to work on across every IDE was because we built a platform-agnostic way of running the system on the user's local machine that was only easily buildable on dev containers that lived on a particular type of platform, so Mac was nice for that. But now there's not really an excuse if it's like, if I can also make changes to the UI and stuff like that. And yeah, WSL also exists. That's actually something that we need to add to the product. That's how early it is that we have not actually added that.

We don't have, like, remote... Anything else about Codium at large, right? Like, you still have your core business of the enterprise Codium. Yeah. Anything moving there or anything that people should know about? I think a lot are still moving there, right? I think it would be a little bit, like, you know, very kind of egotistical of us to be like, oh, we have Windsurf now. All of our enterprise customers are going to switch to Windsurf and this is the only... Like, no, we still support the other... I was going to say, you just talked about your Java guys loving JetBrains. They're never going to leave JetBrains. They're not. Like, I mean...

forget JetBrains, there's still tons and tons of enterprise people on Eclipse. Like, we're still the only code assistant that has an extension in Eclipse. That's still true years in, right? But like, that's because that's our enterprise customers. And the way that we always think about it is like, how do we still maximize the value of AI for every developer? I don't think that part

of who we are has changed since the beginning, right? And there's a lot of like meeting the developers where they are. So I think on the enterprise side, we're still pretty invested in doing that. We have like a team of engineers dedicated just to making enterprise successful and thinking about the enterprise problems. But really, if we think about it from the really macro perspective, it's like, if we can solve all the enterprise problems for an enterprise, and we have products that developers themselves just truly, truly love, then we're solving the problem from both sides.

And I think it's one of those things where I think when we started working with enterprise and we started building dev tools, right? We started as an infrastructure company. Now we're building dev tools for developers. You really quickly understand and realize just how

how much developers loving the tool make us successful in an enterprise. There's a lot of enterprise software that developers hate. I want to draw this flywheel. But like, we're giving a tool for people where they're doing their most important work. They have to love it. And it's not like we're like trying to convince, the executives at this company also ask their developers a lot, do you love this? Like that is like almost always a key aspect of whether or not Codium is accepted or not.

into the organization. I don't think we go from zero to 10 million ARR in less than a year in an enterprise product if we don't have a product that developers love. So I think that's why we're just, the IDE is more of a developer love kind of play. It will eventually make it to the enterprise. We still solve the enterprise problems. And again, we could be completely wrong about this, but we hope we're solving the right problems.

It's interesting, I asked you this before we started rolling, but it's the same team, the same edge team. In any normal company, or my normal mental model of company construction, if you were to have effectively two products like this, you would have two different teams serving two different needs, but it's the same team.

Yeah, I think one of the things that's maybe unique about our company is like this has not been one company the whole time, right? Like we were first like this GPU virtualization company pivoted to this. And then after that, we're making some changes. And like, I think there's like a versatility of the company and like this ability to move where we think the instinct, we have this instinct where, and by the way, the instinct could be wrong, but if we smell something, we're going to move fast.

And I think it's more a testament to, I think, the engineering team rather than any one of us. I'm sure you had December 19, 2022, you had one of our guests post what building Copilot for Rx really takes. Estimate inference to figure out latency quality. Build first party instead of using third party as ABI. Figure out real time because chat GPT and DALI at the RFP are too slow. Okay.

Optimize prompt because context window is limited, which is maybe not that true anymore. And then merge model outputs with the UX to make the product more intuitive. Is there anything you would add? I'd give myself a B- on that. Some parts of that are accurate. Even the context, the one that you called out. Yeah, models have larger context links now. That's absolutely true. It's grown a lot. But look at an enterprise code base. They have tens of millions of lines of code. That's hundreds of billions of tokens of code.

Never going to change. Still being really good at being able to piece together this distributed knowledge is important. So I think there are figures there that I think are still pretty accurate. There's probably some that are less so. First party versus third party. First party versus third party. I think we're wrong there. I think I would nuance that to be like, there are certain things that it's really important to do first party, like autocomplete. You have a really specific application that you can't just prompt engineer your way out of or just maybe even fine tune afterwards. You just can't do that. I think there's truth there, but like

let's also be realistic. The stuff that's coming out for the third model providers, like Cascade and Windsurf would not have been possible if it wasn't for the rapid improvements with 4.0 and 3.5 Sonnet. That just wouldn't have been possible. So I'll give myself a B-. I'll say I passed, but yeah, it's two hours, two years later. Just to be clear, we're not grading. It's more of a, what would you, you know, what would you have added? What would you like? Yeah, I mean, like that first post, right? Like that was when we had literally, I think I was like a

few weeks after we had launched Codium, I think that's like, you know, Swix and I were talking like, maybe we can write this because we're like one of the first products that people can actually use with AI. That's cool. I specifically like the Copilot for X thing because everyone is so hot. Everyone was just like, you know, that's all that was. But I think like, you know, that we didn't have an enterprise product. I don't even think we were necessarily thinking of an enterprise product at that point. Right? So like, all of the learnings that

Like, you know, we've had from the enterprise perspective, which is why I loved coming back for like a third time now on the blog. Some of those, I think we kind of like figured some of those we just honestly walked backwards into. Had to get lucky a lot of the ways. Like we had many, we just did a lot. Like there's so many of like opportunities and deals that we had that we like lost for a variety of reasons that we had to like learn from. There's just so much more to add that there's no way I would have gotten that right in 2022. Can I mention one thing that I think is, hopefully this is not very controversial.

but it's true about our engineering team as a whole. I don't think most of us got much value from Chachapiti. Largely because I think the problem was, and this is maybe a little bit of a different thing, it's like a lot of the engineers at the company who have been writing software for over eight years, and this is not to say they know everything that Chachapiti knows. They don't. They'd already gotten good enough at...

searching for Stack Overflow. Invested a lot in searching code base, right? They can very quickly grab through the code incredibly fast, like every tool. And they've spent like eight years mastering that skill. And ChaiGBT being this thing on the side that you need to provide a lot of context to, we were not able to actually get, like my co-founder just basically never used ChaiGBT at all. Literally never did.

And because of that, probably at the time, one of our incorrect sort of assumptions was probably that, hey, like a lot of these passive systems need to get good because they're always there and these active systems are going to be behind. I think actually Cascade was the big thing. It's a company where everyone is now using it. Literally everyone. Biggest skeptics. And we have a lot of people at the company that are skeptical of AI.

I think this is actually important. Why do you hire them? No, I think here's the important thing. Those people that were skeptical about AI previously worked in autonomous vehicles. These are not crypto people. These are people that care about technology and want to work on the future. Their bar for good is just very high. They won't work on the future.

They will not form a cult of, this is awesome. This is going to change the world. They were not going to be the kind of people on Twitter that are like, yeah, this changes everything. Like softwares, we know it is dead. No, there are people that are going to be incredibly honest. And we know if we hit the bar that is good for them, we found something special. And I think at that time, we probably had a lot of sentiment like that. That has changed a lot now. And I think it's actually important that you have believers that are incredibly future looking and people that kind of reign it in.

Because otherwise you just have, you know, this is like autonomous vehicles. You have a very discrete problem. People are just working in a vacuum. And there's no signal to kind of bring you down to reality. You have no good way to kill ideas.

And there are a lot of ideas we're going to come up with that are just terrible ideas. But we need to come up with terrible ideas. Otherwise, like, how does anything good come on? And I don't want to call these skeptics. Skeptics suggest that they don't know. They're realists. They're the type of people that when they see waitlist on a product online, they just will not believe it. They will not think about it at all. Kudos for launching without a waitlist. Yeah. By the way, we will never launch with a waitlist. We will never launch with a waitlist. That's the thing at the company. We'd much rather be a company that's considered the boring company than a company that launches once in a while and like, hopefully it's good.

My joke is generative AI has gotten really good at generating wait lists. Also, just to clarify, both of us used to work in a Thomas vehicle so it doesn't come across as... Oh, yeah, yeah. We love that technology. We love it. I love hard technology problems. That's what I live for.

Amazing. Just push back on the first party thing. I accept that the large model labs have just done a lot of work for you that you didn't need to duplicate. But you now are sitting on so much proprietary data that it may be worth training on the trajectories that you're collecting.

So maybe it's a pendulum back to first party. Yeah, I mean, I think like, I mean, we've been pretty clear from like a security posture perspective. Like, I think there's like both like, you know, customer trust and like... I mean, I kind of want, like, let me opt in. I think that there is signals that we do get from our users that we can utilize. Like, there's a lot of preference information that we get, for example. Which is effectively what you're saying of like our trajectories. Go ahead. Our trajectory is good.

I will say this, the super complete product that we have has gone materially better because of us not only using synthetic data, but also getting the preference data from our users of like, hey, given these set of trajectories, here's actually what a good outcome is. And in fact, one of the really beautiful parts about our product that is very different than a chat GPT is we can not only see if the acceptance happened, but if something more than the acceptance happened and it happened even more than that.

Right? Like, let's say you accepted it, but then after accepting it, you deleted three or four items in there. We can see that. So that actually lets us get to even better than acceptance as a metric. Because we're in the ultimate work output of the developer. It's the preference between the acceptance and what actually happened. If you can actually get ground truth of what actually happened, this is the beauty of being an ID, then like, yeah, you get a lot of information there. Did you have this with the extension or is this pure windsurf? We had this with the extension. Yeah, okay, great. Yes.

The Win Search just gives you more of the ID. Yes. So that means you can also start getting more information. Like, for instance, the basic thing that Anshul said, we can see if a file explorer was opened. It's actually just a piece of information we just cannot see previously. Sure. A lot of intent in there. A lot of intent. Second one. Oh, boy. How to make AI UX your mode. Oh, man. Isn't that funny that we now created the full UX experience with an ID? I think that one is pretty accurate. That one's an A? I think that one I'll give myself. I think we were doing that within...

I still think that's true within the extensions as well. We got very, very creative with things. Rune mentioned the idea of essentially rendering images to display things. We get creative to figure out what the right UX is doing. We could create a really dumb UX with a side panel, whatever. But actually going the extra mile does make that experience as good as it possibly can there. But yeah, now look at some of the UX that we're able to build in Windsurf, and it's just fun. The first time I

I saw, because now we can do command in the terminal. Like you can not have to search for a bash command. The first time I saw that, I was like, I just started smiling. And like, it's like, it's not, it's not like Cascade. It's not like a Gentic system or anything like that. But I'm like, that is just a very, very cool device. We literally couldn't do that in VS Code. Yeah, I understand that. Yeah, I've implemented a 60 line bash command called please. And you can, you know,

Do that. Oh, wow. That's cool. Yeah, so please English and then English. You know, that's actually really cool because one of the things I think we believe in is actually I like products like autocomplete more than command purely because I don't even want to open anything up. So that thing where I just can type and not have to press some button shortcuts to go in a different place, I actually like that too.

And I actually adopted warp, the terminal warp, initially for that because they gave that away for free. But now it's everywhere, so I can turn off a warp and not give Sequoia my bash commands. I'm with you. No, I use warp. No, no, look. I use warp. Okay, I don't know. I'm going to go on a rant. Hopefully somebody will. This is like warp product feedback. But they basically had this thing where you can do kind of like pound and then write in natural language. Yeah, you said it was

But then they have also the auto-infer of what you're typing is natural language. And those are different. When you do the pound, it's only like it gives you a predetermined command. When you like talk to it, it generates a flow. Okay. It's a bit confusing of a UX. But going back to your post, you had the three Ps. Yes. Of a UX. What were they again? Present, practical, powerful.

Actually, that was really good. I liked it. And I think in the beginning, being present was enough. Even when you launch, it's like, oh, you have AI. That's cool. Other people don't have it. Do you think we're still in the practical where the experience is actually... The model doesn't even need to be that powerful. Just having better experience is enough? Or do you think really being able to do the whole... Because your point was you're powerful when you generate a lot of value for the customer. Yeah.

you're practical when you're busy repping it in a nicer way. Where are we in the market today? I think there's always going to be room for practical UX. The command terminal, that's a very practical UX. I do think with things like Cascade and these agentic systems, we are starting to get onto powerful. Because there's so many pieces from a UX perspective that make Cascade really good. It's really micro things that are just all over the place. But

As we're streaming in, we're showing the changes. We're allowing you to jump in open diffs and see it. We can run background terminal commands. You can see what's running, background processes are running. There's all these small UX things that together come to a really powerful and intuitive UX. I think we're starting to get there. It's definitely just the start. And that's why we're so excited about where all this is going to go. I think we're starting to see the glimpses of it. I'm excited. It's going to be a whole new ballgame. Yeah.

Awesome. First of all, it's just been really nice to work with you. I do work with a number of guest posters and not everyone makes it through to the end and nobody else has done it three times. So kudos. Remember our hat trick. This one was more like the money one, which it's funny because I think developers are quite uninterested in money. Isn't it weird?

Yeah, I mean, I think like, I don't know if this is just the nature of our company. Like, I think there's some people who said like, there's all like the San Francisco AI companies and like everyone's like hyping each other like on the tech and everything, which is like great. The tech's really important. We're here in Mountain View, beautiful office. We just really care about like actually driving value and making money.

Which is kind of like a core part of the company. I think maybe the selfish way of saying that, or like a little more of the selfless way, is like, yeah, we can be kind of like this VC-funded company forever. But ultimately speaking, you know, if we actually want to transform the way software happens, we need this part of the business that's cash regenerative that enables us to actually invest tremendously in the software. And that needs to be durable cash, be cash that like churns the next year.

And we want to set ourselves up to be a company that is durable and can actually solve these problems. Yeah, yeah, excellent. So for people, obviously we're going to link in the show notes, but for people who are...

listening to this for the first time. I had a lot of trouble naming this piece. So we originally called it, you had like how to make money something. I apologize. I was super bitty. I was, I think I was like on a plane flight. So I apologize.

He had like $3 signs in the title. Oh, I absolutely had $3 signs. I was like, I can't do that. So it's either building AI for the enterprise. And then I also said the worst, the most dangerous thing an AI startup can do is build for other AI startups, which I think both of you will co-sign. And I think basically the main thesis, which I really liked, was like, go slow to go fast. Like here's the, if you actually build for like security, compliance, personalization, usage, analytics, latency budgets, and scale from the start,

then you're going to pay that cost now, but eventually it's going to pay off in the long run. And this is the actual insight. You cannot do this later. If you build the easy thing first as an MVP, it's like, yeah, just ship it with whatever's easy to do. And then you tack on the enterprise-ready.io set of 12 things that you have, you actually end up with a different product or you end up worse off than if you had started from the beginning. So that I had never heard before.

Yeah, I mean, we see that repeatedly. I mean, just like right now, we have a lot of customers in the defense space, for example. We're going through FedRAMP accreditation right now. And people that we're working with, they saw all the fact like, oh yeah, we already have a containerized system. We can already deploy in these manners. We've already gone through security. They're like, oh, you guys are going to have a much easier time doing this than most companies that are just like, okay, we have a big SaaS blob and now we need to do all these things.

It might sound like a really deep thing. I think it's just anyone who's worked for an extended period of time at a company on a certain project has probably seen this happen. The technology just keeps on improving. And then you realize that you have to now re-architect your whole system to get something improving. Just making that kind of change when you've invested so much effort. People have important hours. They're emotionally invested. Whatever it might be, it's really hard to make that change.

So I'm sure we're going to hit that also. Yes, I think we've done things a little bit earlier than most companies. I think we're going to hit points where we're going to see parts of our systems where we're like, oh, we really need to re-architect that. Actually, we've definitely hit that already. And I think that's just at the project level, the product level, or is that your whole company? I think the thesis behind here is to some degree, your company needs to have this DNA from the beginning. And I think then you'll be able to go through those bumps a lot more smoother.

and be able to drive the value. Can I say two points? So first point I'd like to say is, this is something that me and Douglas, my co-founder, talk about a lot. It's like, there's this constant thing of build versus buy. I think the answer is, a lot of the time the answer should be buy. We're not going to go build our own sales tool. We should go buy Salesforce. That's kind of dumb. That's undifferentiated. And the reason why you go with buy instead of build is, hey, look, the ROI of what exists out there is good.

from like an opportunity cost standpoint, it's better to actually go out and buy it than build it and do a shittier job, right? There's a company that's actually going out and focused on that. But here's the hidden thing that I think is like really important. When you go out and buy, you're losing a core competency inside the company. And that's a core competency you can never get. It's very hard. Like startups are so limited on time. Let me just say, like, let's say as a company, we did not invest in, I don't know, model inference. Yeah, we have like a custom inference runtime. We give that up right now. We will never get it back.

It's going to be very hard to get it back.

You can't just use VLM and TensorFlow 2.0. That would be our only option. If we use VLM, we would not be talking with you right now. But the point is, this is more a question of, I try to think about it from first principles. Google's a great company, makes a lot of money. What happens if they actually made the search index of the product something that someone else built for them? They could. Maybe someone else could have done a good job. Maybe that's a bad example. Particularly because Google is a search index, but like,

tough luck getting that core competency back. You've lost it. Right? And I think for us, it's more a question of like, what core competencies do we need inside the business? And yeah, like sometimes it's painful. Like sometimes actually like some of these core competencies are annoying. Sometimes we'll be behind, behind what exists out there. Right? And we need, just need to be very honest. That's where the truth seekingness of the company matters. Like,

are we really honest about this core competency? Can we actually keep up? The answer is we truly can't keep up, then why are we keeping up with the charade? We should just buy, right? Let's not build. The answer is we can, and we think that this will differentiately make our company a better company in the long term. Then the answer is we need to.

We need to, because the race is not won in the next year. The race is won over the next five, ten years, right? Maybe even longer, right? So that's maybe one thing. And then the second thing, actually, from the enterprise standpoint, I think one of the unique parts of the company now is we have both this individual and enterprise side, and usually companies stick to one or the other. And I think that needs to be part of the DNA, I think, kind of early on in the company, as Anshul said. I mean, there's stories of companies like Dropbox and stuff that tried. And Dropbox is an amazing company, fantastic company that...

One of the fastest growing consumer companies of all time. Consumer more on the software company of all time. But yeah, like when you have everyone sort of product oriented on the consumer side, the enterprise is just, it's checking off a lot of boxes that ultimately do not help the consumer at all. Doesn't help your growth metrics. And effectively, if the original group of people didn't care, it's incredibly hard to get them to care down the line, right? It's incredibly hard. Why do it?

And you need to feel like, hey, this is an important part for the company's viability. So I think there's a little bit of the build versus buy part and then also the cultural DNA of the company that I think are both really important. And yeah, it's something we think about all the time. I have the privilege of being friends with you guys off the air. I don't feel like, I think I know your work histories. You say cultural DNA, but it's not like you've built giant enterprise SaaS before, right? Yeah, I think so.

I think, yeah. So like, where are you getting this from? Yeah. In fact, I think the only other sort of, I guess like, you know, when I look at my previous internships, maybe Anshul can provide some context here. It's like, I worked at like LinkedIn and then Quora and then Databricks. And to be honest, like I was not that interested in B2B ETL software that much.

That's not what drives me when I wake up at night. So because of that, I decided to go work in an autonomous vehicle company immediately after. I think part of it comes down to maybe a little bit of the unique aspect of the company and the fact that we pivoted as a company is like we want to be a durable company.

And then the question is, how do you work backwards from that? There's a lot of things about being very honest about what we're good at and what we're not good at. Like, I think, surprisingly, enterprise sales is, like, not something that, like, it came out of the womb knowing how to do. I didn't really know. And because of that, like, obviously, like, a lot of sales happen between sort of folks like Anshul and I helping partner with companies. But very soon, we hired actually a VP of sales. And

And we've actually been deeply involved in the process of scaling out like a large go-to-market team. And I think it's more a question of like, what matters to the company and how do you actually go out and build it? And I think one of the people that I think about a lot actually is someone like Alex Wang. He dropped out of college. He was a year younger than us at MIT.

and he has figured out how to constantly change the direction of the company. Effectively, it starts out as a human task interface, then an AV labeling company, then a cataloging company, then now a generative AI labeling company. And every time, the revenue of the company kind of goes up by a factor of 10, even though the business is doing something largely different. I mean, now it's all about military contracts. Yeah, now it's probably going to be military, and then after that, it might be taking over the world. He's just going to keep increasing the stakes. And there's no playbook on how this really works. It's just...

a little bit of like, you know, solve a hard problem and work backwards from that, right? And we'll get lucky along the way. I don't think like, you think everything from first principles to the best of our abilities, but there's just so many variable unknowns that, yeah, like we don't know everything that's happening in every company out there and everyone knows how fast the AI space is moving. Like,

We have to be pretty good at adapting. I want to double click on one thing just because you brought it up and it's like a rare thing to touch on. VP of sales. We don't get to actually, we talk to pretty early stage founders mostly. They don't usually have a pretty built out sales function. Advice, what kind of sales works for you?

in this kind of field? What didn't work? Anything you can share with other founders? I think one of the hard parts about hiring people in sales, and Graham Unshul can also attest, we have amazing VP of sales at the company. One of the things is if you're purely a developer, salespeople, their job is to talk really

really well, prim and proper. I mean, very obvious if you hear like me talk, like I'm not a very polished person. You're great by the way, I don't know. Or compared to most pure, pure salespeople. So actually just checking based on the way they speak is not that interesting. I think like, you know, what matters in a space like ours that is very quickly, moving very quickly, I think is like,

Intellectual curiosity is very important. Intellectual horsepower. Understanding how to build a factory. I'm not trying to minimize it, but in some ways, you need to build something incredibly scalable here, right? It's almost like every year you're kind of making this factory twice, thrice, maybe as big, right?

Because in some ways you have people that are quota carrying, you need some number of people and you need to make the math work. And you actually, the process of building a factory is not something you can just take someone who is a great rep at another company and just make them build a factory. This is actually a very different skill. How do you actually make sure you have hundreds of people that actually deeply understand the product? Actually, Unshul works very closely also with sales to make sure that they're enabled properly.

Make sure that they understand the technology. Our technology is also changing very quickly. Let's maybe take an example on how our company is very different than a company like MongoDB. When you sell a product like MongoDB, no one at the company is interested in how the data is being stored. It's not that interesting, right? I love databases. I would be interested. But most people are like, solve the application problem I have at hand.

People are curious about how our technology works. People are curious about RAG, right? People that are buying our technology. And imagine we had a sales team that is scaling where no one understands any of this stuff. We're not going to be great partners for our customers. So how do you create almost this growing factory that is able to actually distribute the software...

in a way that is true to our partners and also at the same time, like taking on all the new parts of our product, right? Like they're actually able to expound on new parts of our product. So sorry, that was more a question, more a statement of like building a scalable sales team. But in terms of like who you hire is you just need to have a sense of

Like in some ways, this is maybe an example of talk to enough people, find out what good looks like potentially in your category and find someone who's good and humble and willing to work with you. Yeah, that's just generic hiring. It's just generic hiring. I think here, there's sales for AI. Yeah.

or sales for AI infrastructure. And then there's also the sales feeding into products in a way that we're talking about here, right? Where like they basically tell you what they need. I imagine a lot of that happened. I think a lot of that happened. I mean, still have, and Varun mentioned like Varun, myself, a number of other people who are developers by trade engineers. Like we're pretty involved in the sales process because like there's a lot to learn, right? Like we, before we went out and hired a sales leader, like,

If all we want is neither of us had ever done a sale for Codium in our lives and we went and tried to find a sales leader, we probably would have not hired the right person. We had sold a product to 30 or 40 customers at that time. We had done hundreds and hundreds of deals, cycles ourselves, personally. We read a lot of books and we just did a lot of stuff and we learned what messaging worked, what did we need to do. And then I think we found the right person. Second, Graham's amazing. Who we brought on as our VP of sales.

That just has to be part of the nature. And it doesn't stop now. Just because we have a VP of sales and people dedicated to sales, it doesn't stop that we can't be involved or engineering can't be involved. We have lots of people. We hire plenty of deployed engineers. These are people, I think Palantir kind of made this really famous. Deployed engineers work very, very closely with the sales team on very technical aspects because they can also understand what are people trying to do with AI?

As in, they work at Codium as deployed engineers? Yeah. Okay. And then they partner with our account executives to make our customers successful and learn what is it that people are actually getting value with AI. And that's information that we keep on collating. And it's like, we will both jump into any deal cycle just to learn more because that's how we're going to just keep on building the best product. It comes back to the same thing.

care i don't know and hopefully we build the right thing cool guys thank you for the time it's great to have you back on the pod yeah thanks a lot for having us hopefully in a year we can do another one yeah you'll be 10 billion by then yeah exactly at this rate then next year we try not thinking about that try to not be a zero billion company that's well there's that yeah all right cool that's it awesome

Windsurf: The Enterprise AI IDE - with Varun and Anshul of Codeium AI 01:06:35 Share