We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode How To Get The Most Out Of Vibe Coding | Startup School

How To Get The Most Out Of Vibe Coding | Startup School

2025/4/25
logo of podcast Y Combinator Startup Podcast

Y Combinator Startup Podcast

AI Deep Dive AI Chapters Transcript
People
T
Tom Blomfield
一位创始人
Topics
Tom Blomfield: 我在过去一个月里尝试使用 Vibe Coding 来开发几个副项目,发现它不仅非常好用,而且如果你愿意尝试和学习最佳实践,你的技能也会显著提高。这是一种新的编程方式,关键在于获得最佳结果,而不是简单地将它等同于传统的软件工程。我建议从一个全面的计划开始,将其记录在项目文件夹中的 Markdown 文件中,然后分段实施,每次完成一个部分后进行测试和提交,确保每个步骤都有一个可工作的实现。同时,要使用版本控制(Git),以便在出现问题时可以回滚到已知的工作版本。避免多次提示 AI 来尝试解决问题,如果得到解决方案,应该进行 Git reset,然后将解决方案输入到干净的代码库中。编写测试用例,特别是高层次的集成测试,以尽早发现 LLM 产生的回归问题。LLM 不仅可以用于编码,还可以用于其他非编码工作,例如配置服务器、创建图像等。遇到bug时,首先将错误信息直接复制粘贴到LLM中,通常这足以让AI识别并修复问题。对于复杂的bug,可以要求LLM考虑几种可能的原因,并在每次修复尝试失败后进行Git reset。添加日志,并在必要时切换不同的模型。为LLM编写指令,并将其放在指令文件中,这可以提高LLM的效率。将文档下载到本地,并在指令中告诉LLM去阅读文档,这可以提高准确性。可以使用LLM作为学习工具,通过让LLM逐行解释代码来学习新技术。对于复杂的特性,可以先在一个干净的代码库中创建一个独立的项目,然后将实现指向LLM,让它在更大的代码库中重新实现。保持代码文件小而模块化,这对于人和LLM来说都更容易理解。选择合适的技术栈,例如Ruby on Rails,因为它有大量的训练数据。可以使用截图来演示UI中的bug或获取设计灵感。使用语音输入可以提高指令输入速度。经常重构代码,并使用LLM来识别需要重构的部分。不断尝试新的模型和技术,以找到最适合自己需求的工具。 一位创始人: 如果在 IDE 中 LLM 无法实现或调试某些内容,可以尝试直接访问 LLM 的网站并粘贴代码来寻求解决方案。可以同时使用多个 LLM 工具,例如 Cursor 和 Windsurf,并行处理前端和后端工作,提高效率。将 AI 视为一种新的编程语言,需要提供详细的上下文和信息才能获得良好的结果。从测试用例开始反向进行 vibe coding,先编写测试用例,再让 LLM 生成代码,确保代码满足测试用例的要求。在使用编码工具之前,应先花足够的时间使用纯 LLM 来构建项目的范围和架构。 另一位创始人: 密切关注 LLM 是否陷入死循环,如果代码看起来很奇怪,或者需要反复复制粘贴错误信息,则应回退并检查问题原因。

Deep Dive

Chapters
This chapter explores the use of AI tools like LLMs in software development, focusing on techniques to improve efficiency and overcome challenges. It highlights the importance of best practices and professional software development processes when using AI for coding.
  • Using AI for coding is similar to prompt engineering; continuous improvement is key.
  • Switching between different LLMs and IDEs can solve problems.
  • Professional software development practices, such as testing and version control, remain crucial when using AI tools.

Shownotes Transcript

Translations:
中文

Hi, I'm Tom and I'm a partner here at YC. For the last month I've been experimenting vibe coding a couple of side projects and I found not only is it remarkably good but it's also a practice you can get measurably better at if you're open to tinkering and picking up best practices. In this video I want to share some ways you can get great results when vibe coding. It's kind of like

prompt engineering from a year or two ago, people were discovering new stuff every week and posting about it on social media. The best techniques are the same techniques that a professional software engineer might use. And some people are like, well, that's not vibe coding, is it? You're now just software engineering. I don't

I kind of think that's beside the point. We're trying to use these tools to get the best results. And the YC Spring batch just kicked off a couple of weeks ago. And before I give you my advice for vibe coding, let's hear from the founders on the tips they're using to get the best out of the AI tools today. If you get stuck in a place where the AI ID can't implement or can't debug something and it's just stuck in a loop,

Sometimes going to the LLM's website, like literally to the UI and just pasting in your code and asking the same question can get you a result that for whatever reason the IDE couldn't get to. And you can solve your problem that way. So I'd say just load up both Cursor and Windsurf on the same project. Cursor, it's a bit faster. So you can do a lot of the front end, a little more full stacky, link the front end to the back end. Windsurf thinks for a bit longer. I used to just be scrolling on my phone while I type, build this agent or like, you know, like...

like modify this prompt and I'll just like scroll, fix, scroll, or like, you know, paste an error. Now, while I'm waiting for a windsurf to think I can go on cursor and like, you know, just start updating the front end. Sometimes I'll load up both at the same time and like have like the same context. Maybe if I'm trying to update the front end, I'll give it like style it in like the style of that file. And then I'll just press enter for both. And then they'll

both basically give me like slightly different iterations of the same front end and I'll just pick which one I like better. My advice would be to think of the AI as a different kind of programming language and vibe coding as being a different, a new type of programming language. And so instead of programming with code, you're programming with language. And, uh,

Because of that, you kind of have to provide a lot of the necessary context and information in a very detailed way if you want to get good results. I usually start wipe coding in the reverse direction. That is first starting from the test cases. I handcraft my test cases. I don't use any LLMs to write my test cases.

And once it is done, I have strong guardrails that my LLMs can follow for generating the code. And then they can freely generate the code that they want to generate. And once I see those green flags on my test cases, the job is done. I don't need to micromanage my code basis. I just take an overview about the modularity of the code. Other than that, it's fine. Yeah, I think it's very important to first spend an unreasonable amount of time

in like a pure LLM to build out like the scope and the actual architecture of what you're trying to build before offloading that to Cursor or any other kind of coding tool and let it adjust like free run in the code base, just random making up stuff that doesn't really work. So make sure you understand what the actual goal of what you're building is. - My advice would be to really monitor whether the LLM is falling into a rabbit hole when it's answering your question.

And if you notice that it just keeps regenerating code and it looks kind of funky, it's not really able to figure it out. If you're having to find yourself copying, pasting error messages all the time, it probably means something's gone awry and you should take a step back, even prompt the LLM and say, hey, let's take a step back and try to examine basically why it's failing. Is it because you haven't provided enough context for

for the LLM to be able to figure it out? Or have you just gotten unlucky and it's unable to do your request? The overarching theme here is to make the LLM follow the processes that a good professional software developer would use. So let's dive in and explore some of the best vibe coding advice I've seen.

First, where to start? If you've never written any code before, I would probably go for a tool like Replit or Lovable. They give you an easy to use visual interface and it's a great way to try out new UIs directly in code. Many product managers and designers are actually going straight to implementation of a new idea in code rather than designing mockups in something like Figma.

just because it's so quick. But when I tried this, I was impressed with the UIs, but tools like Lovable started to struggle when I wanted to more precisely modify backend logic rather than just pure UI changes. I'd change a button over here and the backend logic would...

bizarrely change. So if you've written code before, even if you're a little bit rusty like me, you can probably leap straight to tools like Windsurf, Cursor or Claudecode. Once you've picked the tool you want to use, the first step is not to dive in and write code. Instead, I would work with the LLM to write a comprehensive plan.

put that in a markdown file inside your project folder and keep referring back to it. This is a plan that you develop with the AI and you sort of step through while you're implementing the project rather than trying to one-shot the whole thing.

And so what I'd do, after you've created the first draft of this plan, go through it, delete or remove things that you don't like. You might mark certain features explicitly as won't do, too complicated. And you might also like to keep a section of ideas for later, you know, to tell the LLM, look, I consider this, but it's out of scope for now. Once you've got that plan, work with the LLM to implement it section by section and explicitly say, let's just do section two right now.

Then you check that it works. You run your tests and you git commit. Then have the AI go back to your plan and mark section two as complete. I probably wouldn't expect the models to one shot entire products yet, especially if they're complicated. I prefer to do this piece by piece and make sure I have a working implementation of each step and crucially commit it to git so that you can revert if things go wrong on the next step.

But honestly, this advice might change in the next two or three months. The models are getting better so quickly that it's hard to say where we're going to be in the near future. My next tip is to use version control. Version control is your friend. Use Git religiously. I know the tools have these kind of revert sort of functionality features,

I don't trust them yet. So I always make sure I'm starting with a kind of a clean Git slate before I start a new feature so that I can revert to a known working version if the AI goes off on a vision quest. So don't be afraid to Git reset head hard if it's not working and just roll the dice again. I found I had bad results if I'm like prompting the AI multiple times to try and get something working.

it tends to accumulate layers and layers and layers of bad code rather than like really understanding the root cause. You might go and try four, five, six different prompts and you finally get the solution. I'd actually just take that solution

Git reset and then feed that solution into the AI on a clean code base so you can implement that clean solution without layers and layers of cruft. The next thing you should do is write tests. Well, get your LLM to write tests for you. They're pretty good at this, although they often default to writing very low level like unit tests. I prefer to keep these tests super high level.

Basically, you want to simulate someone clicking through the site or the app and ensure that the features are working end-to-end rather than testing functions on a kind of unit basis. And so make sure you write high-level integration tests before you move on to the next feature. LLMs have a bad habit

of making unnecessary changes to unrelated logic. So you tell it to fix this thing over there and it just changes some logic over here for really no reason at all. And so having these test suites in place, catch these regressions early will identify when the LLM has gone off and made unnecessary changes so that you can get reset and start again. And keep in mind, LLMs aren't just for coding.

I use them for a lot of non-coding work when I'm building these kind of side projects. For example, I had ClaudeSonic3.7 configure my DNS servers, which is always a task I hated, and set up Heroku hosting via a command line tool. It was a DevOps engineer for me and accelerated my progress like 10x. I also use ChatGPT to create an image for my site's favicon, that little icon that appears at the top of the browser window.

And then Claude took that image and wrote a quick throwaway script to resize it into the six different sizes and formats I needed for favicons across all different platforms. So the AI is now my designer as well. OK, so now let's look at bug fixes.

The first thing I do when I encounter any bug is just copy paste the error message straight back into the LLM. It might be from your server log files or the JavaScript console in the browser. Often this error message is enough for the AI to identify and fix a problem. You don't even need to explain what's going wrong or what you think is going wrong. Simply the error message is in.

is enough. It's so powerful that pretty soon I actually expect all the major coding tools to be able to ingest these errors without humans having to copy paste. If you think about it, our value being the copy paste machine is kind of weird, right? We're leaving the thinking to the LLM. But I think that copy pasting is going to go out the window and these LLM tools are going to be able to tail logs or, you know,

spin up a headless browser and inspect the kind of JavaScript errors. With more complex bugs, you can ask the LLM to think through three or four possible causes before writing any code. After each failed attempt at fixing the bug, I would git reset and start again. Again, so you're not accumulating layers and layers of cruft. Don't make multiple attempts at bug fixes without resetting because the LLM just adds more layers of crap. Git reset, start again. And add logging. Logging is your friend.

If in doubt, if it's not working, switch models. Maybe it's CloredSonic 3.7, maybe it's one of the OpenAI models, maybe it's Gemini. I often find that different models succeed where the others fail. And if you do eventually find the source of a gnarly bug, I would just reset all of the changes and then give the LLM very specific instructions on how to fix that precise bug on a clean code base to avoid this layers and layers of

junk code accumulating. Next tip is to write instructions for the LLM. Put these instructions in whether it's cursor rules, windsurf rules, claw markdown file. Each tool has a slightly different naming convention.

But I know founders who've written hundreds of lines of instructions for their AI coding agent. It makes them way, way, way more effective. There's tons of advice online about what to put in these instruction files. I'll let you go and find that on your own. Okay, let's talk about documentation. I still find that pointing these agents at online web documentation is a little bit patchy still. Some people are suggesting using an MCP server to access this documentation.

which works for some people, seems like overkill to me. So I'll often just download all of the documentation for a given set of APIs and put them in a subdirectory of my working folder so the LLM can access them locally. And then in my instructions, I'll say, go and read the docs before you implement this thing.

and it's often much more accurate. A side note to remember: you can use the LLM as a teacher, especially for people who are less familiar with the coding language. You might implement something and then get the AI to walk through that implementation line by line and explain it to you. It's a great way to learn new technologies. It's much better than scrolling stack overflow like we all used to do. Now let's look at more complex functionality.

If you're working on a new piece of functionality, a new feature that's more complex than you'd normally trust the AI to implement, I would do it as a standalone project in a totally clean code base. Get a small reference implementation working without the complication of your existing project, or even download a reference implementation if someone's written one and posted it on GitHub. Then you point your LLM at the implementation,

and tell it to follow that while re-implementing it inside your larger code base. It actually works surprisingly well. Remember, small files and modularity are your friend. This is true for human coders as well. I think we might see a shift towards more modular or service-based architecture

where the LLM has clear API boundaries that it can work within while maintaining a consistent external interface, rather than these huge monorepos with massive interdependencies. These are hard for both humans and LLMs. It's just not clear if a change in one place is going to impact another part of the codebase.

And so having this modular architecture with a consistent external API means you can change the internals as long as the external interface and the tests still pass, you're probably good. Now a note on choosing the right tech stack. I chose to build my project partially in Ruby on Rails, mostly because I was familiar with it from when I used to be a professional developer. But I was blown away by the AI's performance, especially when it was writing Ruby on Rails code.

And I think this is because Rails is a 20-year-old framework with a ton of well-established conventions. A lot of Rails code bases look very, very similar. And it's obvious to an experienced Ruby on Rails developer where a specific piece of functionality should live or the right Rails way of achieving a certain outcome.

That means there's a ton of pretty consistent, high-quality training data for Rails codebases online. I've had other friends have less success with languages like Rust or Elixir, where there's just not as much training data online. But who knows? That might change very soon.

Okay, next bit of advice. Use screenshots. You can copy and paste screenshots into most coding agents these days, and it's very useful either to demonstrate a bug in the UI implementation that you can see, or to pull in design inspiration from another site that you might want to use in your project.

Voice is another really cool way to interact with these tools. I use Aqua, a YC company, and basically I can just talk at my computer and Aqua transcribes whatever I'm saying into the tool I'm using. I'm switching a lot between Windsurf and Claude Code at the moment, but with Aqua, I

I can input instructions at 140 words per minute, which is about double what I can type. And the AI is so tolerant of minor grammar and punctuation mistakes that it honestly doesn't matter if the transcription is not perfect. I actually wrote this entire talk with Aqua. Next.

Make sure to refactor frequently. When you've got the code working and crucially the tests implemented, you can refactor at will, knowing that your tests are going to catch any regressions. You can even ask the LLM to identify parts of your code base that seem repetitive or might be good candidates for refactoring.

And again, this is just a tip that any professional software developer would follow. You don't have files that are thousands of lines long. You keep them small and modular. It makes it much easier for both humans and LLMs to understand what's going on. Finally, keep experimenting.

It seems like the state of the art of this stuff changes week by week. I try every new model release to see which performs better in each different scenario. Some are better at debugging or long-term planning or implementing features or refactoring. For example, at the moment Gemini seems best for whole code base indexing and coming up with implementation plans.

while Sonnet 3.7, to me at least, seems like the leading contender to actually implement the code changes. I tried GPT 4.1 just a couple of days ago and honestly I wasn't yet as impressed. It just came back with me with too many questions and actually got the implementation wrong too many times. But I'll try it again next week and I'm sure things will have changed again. Thanks for watching and I'd love it if you have tips or tricks for getting the most out of these models. Please share them in the comments below.