We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Now Anyone Can Code: How AI Agents Can Build Your Whole App

2024/10/18

Lightcone Podcast

AI Deep Dive AI Insights AI Chapters Transcript

People

Amjad Masad

Francesc Campoy

Gary

无足够信息创建详细个人资料。

Topics

Gary: AI 驱动的软件开发平台 Replit Agent 降低了软件开发的门槛，让更多人能够参与到软件开发中，即使没有编程经验也能在短时间内构建复杂的应用。 Amjad Masad: Replit Agent 通过简单的自然语言提示，即可自动生成代码并构建完整的 Web 应用，包括前端、后端和数据库等，极大地简化了软件开发流程。Replit Agent 使用多 Agent 系统，结合多种模型（包括 Claude Sonnet 3.5 和 GPT-4O）以及自主研发的嵌入模型和检索系统，能够高效地生成和编辑代码，并克服了传统 RAG 系统的局限性。Replit Agent 的编码方式类似于人类程序员，会犯错并进行调试，用户可以参与其中并进行代码修改。学习编程仍然非常重要，它能够赋予开发者更大的能力和控制力，学习编程的回报随着 AI 技术的发展而不断提升，掌握一定的编程技能将越来越有价值。Replit Agent 未来将支持更多代码库和技术栈，并提升自主性，支持后台运行和团队协作，并集成人工专家协助功能。 Mark Mandel: Replit Agent 可以显著缩短软件开发时间，节省大量人力成本，Replit Agent 体现了 AGI 的潜力，能够根据用户的需求自主设计和构建应用，并具备一定的推理和学习能力。 Yiu-Jing Li: Replit Agent 可以自动处理软件开发中繁琐的依赖安装和配置工作，极大地提高了开发效率。 Francesc Campoy: Replit Agent 不仅能够根据用户需求生成代码，还能够像开发伙伴一样与用户互动，提出问题并根据用户的反馈进行调整，Replit Agent 可以帮助无代码用户逐步学习编程，提升开发能力，Replit Agent 的组织架构采用了任务小组模式，不同团队协同工作，提高了开发效率。

Deep Dive

Key Insights

What is Repl.it Agent and how does it change the software development landscape?

Repl.it Agent is an AI-powered platform that allows users to generate and deploy functional custom software through simple prompts. It democratizes coding by enabling anyone, regardless of technical expertise, to build apps quickly. This shifts the landscape from requiring extensive coding knowledge to leveraging AI agents for rapid development, making software creation accessible to a broader audience.

What was the first app built during the live demo of Repl.it Agent?

The first app built during the live demo was a personal mood tracker. It logged the user's morning mood, coffee and alcohol consumption, and exercise habits. The app was created using Flask, Vanilla.js, and Postgres, and it included features like visualization and reminders.

What models and technologies power Repl.it Agent?

Repl.it Agent uses a multi-agent system powered by models like Claude Sonnet 3.5 for CodeGen and GPT-4O for specific tasks. It also incorporates in-house models for embeddings, retrieval systems, and indexing. The platform leverages tools like Lankchain for agent DAGs and Lanksmith for debugging traces, ensuring efficient code generation and editing.

How does Repl.it Agent handle debugging and testing?

Repl.it Agent includes a language server that provides real-time feedback on coding errors, similar to human coding. It also performs automated testing, such as taking screenshots and using computer vision to verify app functionality. Users can manually test and debug the code, making it a collaborative process between the AI and the user.

What are some examples of apps built using Repl.it Agent?

Users have built a variety of apps, including a personal memory map app that attaches files and audio to locations, a Stripe coupon tool for course creators, and a Hacker News clone. These apps were created in minutes, showcasing the platform's ability to quickly turn ideas into functional software.

What is the future vision for Repl.it Agent?

The future of Repl.it Agent includes improving reliability, expanding support for any tech stack, and adding more interactive features like drawing and voice commands. The platform also aims to introduce single-step agents for advanced users, allowing them to preview and approve changes before implementation.

How does Repl.it Agent compare to no-code tools?

Repl.it Agent offers more flexibility than no-code tools by generating actual code, which users can edit and customize. While no-code tools often hit limits in functionality, Repl.it Agent allows users to push beyond those constraints, making it a more powerful solution for complex projects.

What organizational changes did Replit undergo to develop Repl.it Agent?

Replit formed an agent task force, bringing together teams from IDE, DevX, UX, and AI to collaborate on the project. The organization flattened its structure, focusing on rapid progress through weekly meetings where priorities were set and issues were addressed. This approach allowed for quick iterations and significant advancements in the agent's development.

Chapters

The episode starts by comparing the impact of personal computing in 1984 to the potential of personal software in 2024, facilitated by AI agents. A live demo showcases Replit Agent building a mood tracking app from a simple prompt, highlighting its ability to generate code, manage dependencies, and even suggest features. The discussion then delves into the technical aspects of the agent, including its multi-agent system, model choices (Claude Sonnet 3.5, GPT-4O), and the crucial role of the retrieval system.

Comparison of 1984's personal computing revolution to 2024's personal software era
Live demo of Replit Agent building a functional web app from a simple prompt
Replit Agent's multi-agent system, utilizing Claude Sonnet 3.5 and GPT-4O
Importance of the retrieval system in editing and code generation

Shownotes Transcript

Translations:

中文

1984, the Mac brought personal computing to the masses.

2024, we have personal software. You actually are going to be able to orchestrate this giant army of agents and I think of Mickey Mouse and Fantasia just like learning this new magical sort of ability and suddenly all the brooms are walking and talking and dancing and it's this incredible menagerie of being able to build whatever the heck you want whenever you want. Someone who had an idea for 15 years but didn't have the

the tools to build it and was able to build it in 15 minutes. And he recorded his reaction. I almost shed a tear on that. Welcome back to another episode of The Light Cone. I'm Gary. This is Jared, Harj, and Diana. And collectively, we funded companies worth hundreds of billions of dollars right at the beginning, just a few people with an idea. And today, we have...

one of our best alumni to show off what he just launched, Repl.it Agent. Amjad, thanks so much for joining us today. My pleasure. Thank you for having me. Yeah, so we just launched this product. It is in early access, meaning it's barely beta software.

but people got really excited about it. It works some of the time. So there's a lot of bugs, but we're going to do a live demo here. And I wanted to like build an app, like a personal app that could track my morning mood correlated with like what I've done the previous day. So I want an app to log my mood in the morning and also things like,

I've done the previous day, such as the last time I had coffee or if I had alcohol and if I exercised that day.

That'll send it to the agent now. You have this chat interface. So you can see the agent just read the message, and it's now thinking. MARK MANDEL: So what we're looking at here is actually how you might chat with another user? Or is this specifically-- FRANCESC CAMPOY: Yeah, I mean, it's similar. It's very similar to a multiplayer experience on Repl.it. MARK MANDEL: Got it.

So here it's saying I created a plan for you to log your daily mood. The app will show your mood, coffee, alcohol consumption and exercise. And it also suggests other features. So for example, it's suggesting visualization and that sounds good. Reminders, I don't know, I'll remember. So let's just go with these two steps.

YIU-JING LI: I think what was also cool, it picked the tech stack that's very quick to get started. So Flask, Vanilla.js, Postgres, very, very good. MARK MANDEL: So now we're looking at what we're calling the progress pane. So the progress pane is-- you can see what the AI is doing. Right now it's installing packages. It actually wrote a lot of code. And it looks like it built a database connection and all of that. And it's now installing packages. And we should be able to see a result pretty soon. YIU-JING LI: This is really cool because I think a lot of times

for new software engineers, one of the annoying parts is just getting all the packages and dependencies and picking the right stuff. And this just does it for you, the agent. So here we have our mood app. I can kind of put that I'm feeling pretty good today. I did have coffee yesterday, but I didn't exercise. I log my mood, go to history,

So it's a complete web app with just a prompt, like no further instruction from you. Yes. And it has a backend. It has Postgres. And I can just deploy this. So this is already pretty useful. You have this rating and you have the history. And it's asking me if it did the right thing. It actually is asking you to test it for them.

Yeah, it actually did some testing on its own. So it took a screenshot here. And so it knows that at least something is presented, but it wants someone to actually go in and do a little bit of QA. Is it using computer vision to look at the screenshot? Yeah. OK. Yeah. And now all the models are multimodal, and so it's fairly straightforward. What's on the back end right now?

We have actually a few models because it's a multi-agent system and we found different models work for different types of agents. The main CodeGen one is Claude Sonnet 3.5, which is just unbeatable on CodeGen. It is the best thing. But we use GPT-4O in some cases. There's also some enhanced

in-house models like we built the embedding model. It's a super fast binary embedding model. And the retrieval system and indexing, this is all built in-house. And a big part of what makes this work is the sort of retrieval system.

Because figuring out what to edit, it turns out, is the most important thing for making these agents work. You're going a step beyond just Rack because Rack hits the limit for this. And you basically have to find a new way to search and find the right places to edit in the code. Yes. Which is actually something that I don't think has happened yet, but I think...

It's going to happen that for all these multi-agent systems, people are going to move away from RAC and start building custom orchestration like this.

So this is very notable. This is a very cool thing that you figure out. MARK MANDEL: Yeah, just throwing the code base in RAG is not going to work. You actually have several different representations that allow the agents to do better work. MARK MANDEL: That's right. And we have the trends thing working right now. MARK MANDEL: Ooh. Nice. MARK MANDEL: So we have a couple graphs. We don't have a lot of entries here. I can actually ask it to create data. MARK MANDEL: Oh, really? You can have it create data as well? MARK MANDEL: Yes. Now it's asking me to deploy because it's done. It's like, it's not going to deploy.

And here we have the activity trends, like what am I doing by day? MARK MANDEL: There you have it. It's going directly from just an idea to a deployed web app that anyone in the world can access right now. FRANCESC CAMPOY: Exactly. And one of the things I'm really excited about is this idea of personal software. 1984, the Mac brought personal computing to the masses. 2024--

We have personal software. I think we just experienced this. You know, Karpathy just tweeted about Repl.it Agent. He said, this is a feel the AGI moment. Did you just feel the AGI? I definitely did. And I did last night. I spent a few hours last night using Repl.it Agent to make a Hacker News clone. Nice. There were a couple of moments where, like, I really felt the AGI. Yeah.

The first was it actually had like really good intuition about what you buy to make and how to design it. Like we saw that there were like you didn't give it the idea to make the slider bar be like like like emojis. It just came up with that on its own. And then the second thing was it.

when I was using it, it really felt like I had a development partner where he would ask me questions. He would ask me to like change things. At one point it got like stuck. It wasn't sure how to do something. And so it asked me how to do the thing. And then I told it and then it's like, cool, got it. And just kept going. Yeah.

Yeah, it feels great. And sometimes you want to give it some help, right? You want to go debug, if you know how to debug yourself, or you go ask ChatGPT about something and come back to it. Just give it more information. You'll be able to kind of react to it. It definitely feels like talking to a developer. You should do the...

Gronk thing and have different modes. You could have like graph G programming where it just tells you ideas are bad and it wants to build something else anyway. Oh, that would be cool. Just like have a toggle, for example, like an over-engineer. Yeah. Just like over-engineer everything. Exactly. So it added this toggle, but I don't think it works. I don't think it connected up to the x-axis. Yeah. I think this is interesting about all these AI programmers, which is that it's not like

we created some super intelligence that somehow can just build an entire app perfectly from start to finish without making any mistakes. It actually codes the way a human does, which is it like write some code and this is like, well, I think this is right, but I'm not sure. I guess I'll try it. And then it tries like, oh no, I have a bug. It's like, it's the same thing. Yeah. Yeah. And we, again, our design decision has been always like, this is a co-worker and you can just close this and you can go to the code.

And you can code yourself. Just fix it yourself. Fix it yourself. And again, if you don't know how to code, my hope is as you are reading what the agent is doing is that you've learned a little bit of coding along the way. And by the way, this is how I think our generation learned how to code, not through agents, but almost by doing these incremental small things like editing your MySpace page or doing a GeoCities...

thing and I feel like we sort of lost that

incremental learning scale where now you need to go to get a computer science degree or go to a coding boot camp to kind of figure this out. But if we made this fun thing that people can go build side projects in and get exposed to what code is, I think that would be perfect. And again, my view is that we're still far from fully automated software engineering agents and people should still learn how to code. You have to do way less coding

But you will have to read the code. You will have to debug it in some cases. The agent will get you fairly far, but sometimes it will get stuck and you need to go into the code and figure it out. Yeah, I think that that's actually pretty important. I've been meeting a lot of 18, 19-year-olds who are freshmen

And they're like, well, the code will write itself, right? Like, I don't have to study this stuff anymore. And I'm like, no, that's not true at all. Like, I actually think that now it is actually more leverage. It is far more leverage to know how to code than ever before. And it's actually even more important. And it will make you way more powerful. Like, you don't have to be all the way in the weeds on everything. You actually are going to be able to, like, orchestrate this giant army of

agents. And I think of Mickey Mouse and Fantasia, just like, you know, like learning this new magical sort of ability. And like, you know, suddenly all the brooms are like, you know, walking and talking and dancing. And it's this incredible menagerie of being able to build whatever the heck you want, whenever you want, just like, like literally from any computer, from any web browser. Yeah. I try to come up with like a Moore's law type type thing where it's like,

the return on learning code is like doubling every six months or something like that. So learning code a little bit in 2020 was not that useful because you would still get blocked. You wouldn't know how to deploy something. You wouldn't know how to configure something. Let's go to 2023 with ChatGPT. Learn to code just a little bit. We'll get you fairly far because ChatGPT can help you.

And then 2024, learning to code a little bit is a massive leverage because we have agents like this and others. And there's a lot of really cool tools out there like Cursor and others that will get you super far by just like having a little bit of coding and just extend that forward. Like six months later, you're going to have even more power.

So programmers are just on this massive trajectory of increased power. Tell us more about the tech behind this. It's kind of fascinating. At the heart of it, it is sort of this, as I described before, it's multi-agent system. You have this core sort of React-like loop. So React is an agent chain of thought type system.

prompting that's been around for a couple of years now. And most agents are built on that. But ours is also a multi-agent system. We give it a ton of tools using tool calling. And those tools are the same tools, again, that are exposed to people. And by the way, you need to be really careful about how to expose these tools and how to

Does the agent see them? So for example, our edit tool returns errors from the language server. So we have a language server here, a Python language server. It's like a human coding, if I make a mistake.

anywhere here, it will show me. Similarly, when the agent is coding, it gets feedback from the language server. So again, you want to treat it as much as you can like a real user. And so for any action, it gets feedback, and then it can react to that feedback. And so these are the tools. Again, this is package management, editing, deployment, the database. All those are tools.

And then there are a lot of things that make sure that it doesn't go totally off the rails because it's very easy. We've all used agents that go off the rails and go into endless loops. This still sometimes does it, but we have another loop that is doing a reflection that's always thinking,

Am I doing the right thing? We use a lot of a Lankchain tools. So LankGraph is an interesting new tool from Lankchain that allows you to build agent DAGs very nicely. And they have some logging mechanism and a tool called Lanksmith where you can look at the traces.

Looking at the traces for DAGs is very, very difficult and very hard. So debugging these things have been fairly difficult. Because you want a tool to actually visualize that graph. And there isn't a lot of tools that do that right now. And so there's this reflection tool, reflection agent,

And the other thing that we talked about earlier is retrieval is crucial. And again, this has to be kind of neuro-symbolic. It has to be able to do RAG style embeddings retrieval, but it has to be able to look up functions and symbols inside the code.

This is why I do think I may be extrapolating a bit more, even if we get into the world of foundation models that have really, really large context windows. I mean, Gemini already is in the millions of tokens. You will still need very specialized things that do lookups like this because applied to different contexts,

knowing the functions and treating it more like how it compiles at the end, like an AST graph. Large context windows, you can totally shoot yourself in the foot with them. Yes. Because it's easy for the model to, it's actually, you know, the model will bias a lot more towards whatever's at the end. Kind of like a human. Yes, exactly. And so you still need to do context management. Um,

And you need to figure out how to rank memories. So this agent, every time it does a step, it goes into a memory bank. And then every time we go into the next step, we'll be able to pick the right memories and figure out how to put them in context. If you pick the wrong memories, for example, if you pick a memory that had a bug or there was an error in it, whatever,

it might still think that there's a bug. But if you already recovered from that, you want to make sure that memory of having created a bug is either kind of augmented by another memory of fixing it or entirely removed from the context. And so memory management is crucial here. You don't want to put the entire memory in context.

You want to be able to pick the right memories for the right tasks. I feel like this is a really concrete rebuttal to situational awareness and that whole like sort of sci-fi, you know, AGI is going to kill us tomorrow kind of argument simply because that all is predicated on larger context window, more parameters, throw GPUs at it and it's going to work. Like you can't just scale it up.

You're not going to get what you want from just scaling it up. There is actually a lot of utility in having these agents work with one another, with being actually smart about what is the intermediate representation and being able to pull back, sort of model what a human would do. I mean, this is sort of like the...

the case study and like, oh yeah, you can't just scale up everything by 50X and have it work the way that they think it will. Yeah. In many ways, building a system like that sort of humbles you, sets your expectations about AI and the progress in AI in sort of a different way because yeah, the systems are very fragile. They're really still not great at following instructions. People talk a lot about the hallucination problem,

I think the bigger problem is just following orders. It's so hard to get them to actually do the right thing. What do you think is the path to AGI? So my view in AGI is that maybe we'll get to something we can call functional AGI, which is...

We automate all those sort of economically useful tasks. I think that's fairly within reach. I think it's almost like a brute force problem. It's sort of the bitter lesson, right? Do you think it involves doing a lot of work like what you guys did, like basically building...

like carefully fine tuning orchestrations of groups of agents for each task. So doing what you did for programming, but doing it for customer support and for sales for every accounting, every function. Yeah, I think so. And maybe you can eventually put it all into one model. The history of machine learning has been we create the systems, we

we grow these systems around these models and eventually the model will eat the systems. So hopefully like everything that we did at some, someday there's like an end to end system, a machine learning system that could do it. Tesla, you know, famously, you know, had all these logic and, and whatever. And now like, you know, I think after V13, they, it's just end to end training.

And so eventually we'll get there. But I wouldn't consider it true AGI because you throw something out of distribution at it and it wouldn't be able to handle it. I think true AGI would require efficient learning. Being able to be thrown in an environment with no information at all, being able to

Understand the environment by examining it and learning a skill required to navigate that environment. And LLMs are not that. Maybe they're a component of that, but they're not efficient learners at all. You actually demonstrated this because the way you describe LLMs are intuition machines. And in order to get them to work in programming tasks, you had to add this layer of

with symbolic representation, like in programming and ASTs, a lot of concepts in programming and how computation works, like Turing complete with DAGs and all that, right? Yes, exactly. Those are like very explicit classical computer science. Classical AI, yeah. We do backtracking and all that, yes. That's not generalized, that's specialized. I mean, incredibly useful specialized. Yes.

So it's only been live for four days. Yeah. But already people have done a bunch of like really interesting and impressive stuff with it. Do you want to talk about some of the things that you've seen people do with it that are most like surprising and interesting? Yeah. One of my favorite thing that I saw was someone who had an idea for 15 years, but didn't have the tools to build it and was able to build it in 15 minutes. And he recorded his reaction. And it's like a personal app. He built an app where he can...

put memories on a map and attach files and audio files to it. Memories about his life. I went to school here and like add a picture, whatever. When the app showed up and he tested it and he was like, he was so surprised. I almost shed a tear on that. I was like,

You're being able to unlock people's creativity is so rewarding. And then I want an integration with Apple Photos or to use it to actually build an export tool. Yes. And another user, Mekke, built sort of a Stripe coupon tool.

So he has a course, he runs it on Stripe, and he wants to be able to send people coupons. And so he built it in like five, 10 minutes. And actually, I don't think you would be able to build something like that in no code. You would struggle really hard. You would probably use two or three no code tools. People use like Bubble on the front end and Zapier in the back end and what have you. Sometimes I'm surprised the no code people are actually quite

quite smart and quite hardworking because they figure out how to create these systems using no code. But it's just actually a lot easier to just generate the code for it. It's a coding tool for the no codes. Yes, yes. And so, yeah, we're seeing a lot of traction there. Which is actually a challenge. I think the no code tools have in general is straddling this line between the

start very much no code and then they find that people keep pushing their limits of what they want to build in these tools. And then the frustrating part with no code tools is that if you hit the limits, you're just stuck. You just can't solve it. And the cool thing is, as you were saying earlier, if you can get the no code people to switch to Replit, maybe initially they don't program at all. All they know how to do is prompt it. But then at some point they're going to like

look at the code and they'll realize that they can just edit it. Like it isn't that hard. And then that's how they like gradually become programmers. Yeah, that's interesting. I played around with it to build just like a simple recruiting CRM, which is actually the kind of thing you would have used Airtable for. And one of the suggested, when it told me the plans, one of the, oh, would you like this feature was exactly that. It was just like role-based permissions and all. I was like, oh, that's pretty like a sophisticated thing

prompt or suggestion off the bat. MARK MANDEL: Yeah, that's a $10,000 a month enterprise feature right there that you could just prompt and have it work. It's crazy. I mean, this is like the definition of low bar, high ceiling. All of the biggest software companies in the world capture that idea really powerfully.

My favorite thing is these multiple order magnitude sort of time difference of building something. Someone said they spent 18 months building a startup. They were able to generate the same app in 10 minutes using Replit. Someone said they spent a year building a certain app that they were able to build it in an hour with Replit agent. But yeah, I think it will save time.

you know, millions of dollars of human hours. What a time to be alive, guys. Can I take a Replit agent and apply it to my existing coding stack yet? Not yet. Got it. So again, it's sort of super early. We built the, again, the retrieval system that we built is to be able to do this.

We should be able to throw it into any code base, index the code base really quickly and be able to give it intelligence about the code base. The system also has like summaries of files and summaries of projects. So we use LLMs to kind of as we're indexing the system to create these like small summaries for the agent to understand what a project is. So we have the infrastructure for it. But that's that's the next step.

And we also want to add more autonomy for people who want it. So for the team version of this, we want to be able to send it to the background. So be able to give it a prompt and then it forking the project, going and working as autonomously as it can. And then when it's done, it sends you a pull request back. Or if it runs into a problem, it comebacks you with a problem.

The other thing I want to do is, you know, the vision for this has been, you know, we have this bounties program and bounties, people submit things they want to build or problems they have and people in our community users help them fix it for a certain price. And I was thinking, you know, agents are not perfect. And so perhaps the agents can also summon a human.

So another tool that it has is being able to summon a bounty hunter. And so it will go to the market and ask the creator working with it, "Hey, I'm running into a problem. Do you want to put some money on it? And we can go grab an expert."

And so it's like, yeah, cool. Yeah, put $50 on it and we'll go to this market, hopefully a real-time market. We'll say, for $50, we have this problem. Can you come in? A human expert comes in as another multiplayer into the system, either helps you by prompting the agent or by going and editing the code themselves. That's so clever. I mean, this whole thing of getting the human to be another agent in this greater intelligence orchestration system you have. Yes. I'm a big fan of Licklider's sort of...

human machine symbiosis. That's always been the thing. I like to talk about AGI and all of that, but I just feel like computers are fundamentally better by being extensions of us and by joining with us as opposed to being this competitor. 100% with you. Team human. We need to print t-shirts. Yeah.

You had a, I guess, sort of mini-Chesky moment earlier this year then. We're all blown away by this demo and sort of, you know, you've been working hard on sort of remaking the way all software is deployed and written for some time. I mean...

What did it take to get to this moment? You did have to do a layoff and reset your org. What happened? Yeah, so last year we raised a big round. We felt we were making fast progress and there was a lot of energy. And I felt like I needed to, okay, grow the company. For a long time, Jared knows, for a long time, Replit was like 10%.

tiny. It was actually run out of your apartment? Yes. For how many years? For many years, so like three or four years. And we're like four or five people for many years. So we started growing in 2021. Even when you had a lot of users? Yes. Like you were four or five employees when you had millions of users. Yes, that's right. And so we were always kind of lean, but I thought last year, okay, we have really big ambitions. We got to go hire people. I got to hire executives. I got to create a management structure. I got to grow up

Is this what investors were telling you? It's like, oh, you got to hire people. No, actually, I was dumb on my own. But it definitely was the prevalent advice. I mean, you were absorbing this advice from sort of like the world that ordinarily advises startups to do exactly that. That's right. That's right. And it just got really miserable.

We had like, you know, multiple layers. We had different meetings where I'm trying to like run the company from. We had like an executive meeting, staff meeting, whatever. We had roadmaps. We had planning sessions. And I just couldn't shake the feeling that it was all LARPing. It was not work. It was LARPing. But right now we don't have a roadmap.

Right now, literally, we work on like three or four things. I'm involved in all of them and I know what's going on there. I know what people are working on. And I think we got a lot more productive

by getting smaller, by flattening the organization. I think one thing, that's a story that I think we've heard from many founders. One thing I'm curious to see how this plays out is I feel like what actually sparked off a lot of manager mode was feeling that people had more ideas to run with and they had resources to execute on. And you realize that bureaucracy creeps in and you actually just can't get ideas done as quickly as you want. And so I feel like everyone's getting rid of middle management, like

And I'm curious to see if the same thing, the same temptation I think will happen again. I think we thought a little bit personally even is when you make it easy to go from like zero to one, you, it actually helps you create more good ideas because you're like, Oh yeah, it's actually like, I can just get things off the ground really, really quickly.

And so then it'll be interesting to see how people stay. Now you have the smaller, flatter org structure. You'll get more ideas for things you want to do. And then staying disciplined to not go back into the, oh, yeah, we should actually do the 10 things we could possibly be doing versus the five or six you can keep in your head, I think is actually a challenge. MARK MANDEL: I guess that there's a warring idea here because there's Parker Conrad's Compound startup.

But the interesting thing about the compound startup is I think they're trying to explicitly make the other product lines feel like a startup and govern like a startup unto itself, which is like sort of the opposite of having like divisional responsibility.

I also think with Rippling and Parker's known for having this hiring tactic of where he only hire or tries to hire a lot of former founders and then puts them in charge of a product line, which has obviously worked really well for Rippling. I think it's hard for most people to pull that off because you can't hire the quality of former founder unless you have, I think unless the company's already proven successful or you're just like a

top tier recruiter. Parker's pretty top 0.1% of ability to recruit really great people. But Parker's totally found remoting though because he gave a talk at YC Growth when we did this a couple of years ago

And he was still doing support tickets. Oh, yeah. He told us that. Harge hosted him a couple of months ago, actually, right over there. And he said that. He said, basically, he loves answering customer support tickets and he will never let it go because it's his direct line of information to know what's really going on with the customer. Yeah.

Yeah, I mean, that's total founder promoting. I think maybe he's doing the compound startup. He's giving them a lot of autonomy, but he's in the details. How did this play out for this AI agent? We've talked about how you built it technically. How did you build it organizationally?

Which is a whole big, like a big bet. It was totally new technology that like the reflet team wasn't used to working on. How did you pull it off organizationally? Yeah, great question. We tried building agents multiple times in the past and just the technology wasn't there. And finally, when we felt it was there, actually one of our employees, Zen Lee, who kind of started this new incarnation of this, made a demo and he showed me the demo.

And it was so simple. It was just like the agent, like calling a couple of tools and doing things in ID. But I could see that it's finally almost here. Like I could taste it almost. And in that feeling, just like, okay, we're going to make this big bet. And so it created something called the agent task force. So in the task force, it's like people from a lot of different teams. So you have the IDE team.

presence in the SaaS force. You have the DevX team that works on package management and things like that. You have UX and design components, and you have the AI team. So you have the AI team at the center. So it's almost similar to the Carpathia diagram. So we organize it in the same way that the diagram works. The kernel OS is the sort of the AI team, and then they're connecting out to all these tools that are created by the tool teams.

And then on top of all of that, you have the product instead of UX theme that is working on the entry points and how do you structure this, which was very tough as well. The design was tough. And we had like two meetings every week. On Monday, we had this forum meeting where Michele, our head of AI, will do like a run and we'll see what's broken, what's wrong with it. They'll come up with the priorities for this week.

And then on Friday, we have the agent salon where I do a run and I look at what's working, what's broken. I ask them about their priorities. We might reprioritize some things. I might change some things in the product. We make big changes like rapidly. And so every week we made a ton of progress. What does doing a run mean?

Doing an agent run. Literally actually going through and using the product and seeing where it broke. Seeing where it breaks and figuring out what the priority is in order to fix where it broke. Brilliant. Yeah. Did each of the team basically build their own agent as well?

Some of them did, because some of them you had to... The screenshot tool was an agent, because you had to have an AI look at the screenshot, come up with the thoughts, and then return them to the main manager agent. So the ID team wrote the screenshot agent, and then the package management team built probably the tech stack setup type of configuration, which is really cool. Yeah.

The org structure worked out really well. I mean, surprisingly well, because I think it is similar to how we worked when at the center was the user.

And now the user is the AI. FRANCESC CAMPOY: What's coming next with the agent? What do you want to add to it? What do you think are going to be the big next leap forwards for it? MARK MANDEL: Reliability. I think the most important thing right now is reliability and making sure it's not spinning, making sure it's not breaking, and then expanding it to support any stack you would want. So right now, we don't really listen to the user when they give us a stack. We push back. The agent pushes back. It's like, ah, I'm just going to do it in Python or whatever.

But if you really want... Crafty engineer mode. So we want to be able to accept user requirements with regards to stack. You should have the Paul Graham mode where I only write it in Lisp. You know this modes thing is a really like an April Fool thing. Paul Graham, over-engineer. Yeah.

Bad UI. Doesn't care about UI. Everything's literally correct, but very confusing. How about just the interaction? I mean, you mentioned like Licklider and the whole human-computer symbiosis theory. Like, is text like as far as it goes? Are there other ways that people you think will want to interact with their AI agent? You should be able to like draw on...

and the UI and communicate with the AI by drawing, right? Should be able to say, hey, like this button's not working, maybe move this here or this file, you know, refactor this file, whatever. So, you know, if the whole thing is a canvas that you can draw on, you can communicate a lot more expressively with the agent. And of course you're talking, you know, as opposed to typing, being able to talk and draw. Imagine you're on the iPad too. We have an iPad app.

It could get really, really fun and creative. Kind of like a full UI mockup that you would do in Figma. You could kind of hand sketch it and get it to do it. Like how running a real engineering product team would feel like. That's right. And then we're going to add more simpler agentic tools. So right now, the agent kind of takes over and it's writing everything. But a lot of people just want more agency.

more advanced users. So we want to be able to do like single step or single action agents. So say like I want to add this feature, show me what you're going to do. I'll do a dry run, show you all the diff, show you all the packages it's going to install, and then you'll be able to accept it or reject it. And that way, you know, more advanced users will have more control over the code they're writing.

Amjad, thank you so much for coming and showing us the future in such a profound way. If I wanted to do this all myself, what would I do? Well, first of all, I want to say it's again, barely beta software. If you're brave and you want to test it and give us feedback, go to Replit, sign up for our core plan because this thing is expensive. We can't give it away for free. And you'll be able to see that module on the homepage that says, what do you want to build today?

And then you can go through that and start working with the agent. Just have an idea in your mind. Just write a couple sentences. Don't make it too complicated or too technical and get started. You'll get a feel of how to work with the agent pretty quickly. It should be pretty intuitive.

And share with us what you're building. Happy to kind of reshare, retweet whatever people are building with the agent. Amazing. Well, it's time to feel the AGI. We'll see you guys next week.

Now Anyone Can Code: How AI Agents Can Build Your Whole App 37:13 Share