We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

AI, Agents and Software 3.0

2025/6/29

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript

People

主

主持人

专注于电动车和能源领域的播客主持人和内容创作者。

Topics

主持人：我认为软件领域在过去70年里变化不大，但现在在很短的时间内发生了两次重大转变。这体现在工程领域的讨论中，我们现在谈论的AI工程师不仅仅是机器学习研究人员和数据科学家，而是构建在新的生态系统之上，利用基础模型、代理和新工具来重新设计人们与软件交互方式的人。我引用Andrej Karpathy的话，AI工程师的数量可能会大大超过ML工程师。评估、应用和产品化AI面临着无数挑战，这是一项全职工作。软件工程将产生一个新的子学科，专门研究AI的应用，就像站点可靠性工程师、DevOps工程师、数据工程师和分析工程师一样。Context engineering是构建动态系统，以正确的格式提供正确的信息和工具，使LLM能够合理地完成任务。

Deep Dive

Chapters

This chapter introduces Software 3.0, a new era of software development characterized by the use of large language models (LLMs). It explores the evolution of software from Software 1.0 (human-written code) to Software 2.0 (neural networks) and Software 3.0, where LLMs are programmed using natural language. The chapter also discusses the rise of the AI engineer and the challenges of evaluating and applying AI in product development.

Software 1.0: Human-written code
Software 2.0: Neural networks learned from data
Software 3.0: LLMs programmed with natural language prompts
Rise of the AI engineer as a new sub-discipline in software engineering
Challenges in evaluating, applying, and productizing AI

Shownotes Transcript

Translations:

中文

Today on the AI Daily Brief, software in the era of AI or software 3.0. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Hello, friends. Quick announcements. First of all, thank you to our sponsors for today's show, KPMG, Blitzy, Plum, and Vanta. To get an ad-free version of the show, you can go to patreon.com slash AI Daily Brief.

Now, today is one of our weekend episodes, classically our long reads episodes, which really in some ways aren't about long reads as much as big ideas. And we have a very big idea to explore today. This one comes courtesy of former OpenAI co-founder Andrej Karpathy. A couple of weeks ago, Andrej gave the keynote at the YC Startup School.

Subsequent to that speech, it was published to YouTube. You should watch the whole thing. I will include a link. We're going to discuss it and try to contextualize it, so this is not meant to just repeat what Andre said. But the funny thing about it is that the video wasn't posted right away, and people were going so crazy for the speech, and there were so many tweets and X posts about it, that the fine folks at Latent Space actually were able to hack together the slide deck out of clips and pictures from X.

The speech is in many ways about redesigning the world of software for LLM native operations and LLMs being a new type of computing. And one of the really interesting things that Andre notes is that while software stayed largely the same, at least from a paradigm perspective, for about 70 years, we have now had two big shifts in a very short period of time. And we'll get into in a moment his articulation of those shifts, but you can also see this just in the discourse around the engineering field.

A couple of years ago, Latent Space wrote an incredibly important post called "The Rise of the AI Engineer." And the distinction that Swix was trying to draw here was that when we were talking about AI engineers now, we were no longer just talking about machine learning researchers and data scientists.

We weren't just talking about people who were dealing with training and evaluation and inference and data. We were dealing with people who were building on top of this new ecosystem focused on product and taking advantage of foundation models, agents, new tooling and infrastructure to redesign how people interact with software. That piece actually quoted Andrej Karpathy back then as well, when he said, in numbers, there's probably going to be significantly more AI engineers than there are ML engineers.

And at the time, Swicks was trying to put some context around what this actually meant. He said that there are no end of challenges in successfully evaluating, applying, and productizing AI. He talked about model selection, tool selection, and just keeping on top of research, progress, and new opportunities. The conclusion, which seems so obvious now, is that this was a full-time job.

Quote, I think software engineering will spawn a new sub-discipline specializing in applications of AI and wielding the emerging stack effectively, just as site reliability engineer, DevOps engineer, data engineer, and analytics engineer emerged.

The emerging and least cringe version of this role seems to be AI engineer. Now, even what AI engineering means has continued to evolve over the last couple of years. If you were listening earlier this week, we talked all about context engineering. The definition that Langchain's Harrison Chase gives is this. Context engineering is building dynamic systems to provide the right information and tools in the right format such that the LLM can plausibly accomplish the task.

In other words, it's about giving AI models the context they need to accomplish their goals, something that's become even more important in the architecture of agents, which are dealing with much more context and much more complexity. The point is that the very field of software and engineering is continuing to evolve, and that's basically the context for Andre's speech.

In software 1.0, it was computer code being written by humans to program a computer. Software 2.0, which Andrei wrote about a number of years ago, shifted computer code to neural network weights learned from data, with the output being the neural net itself. In software 3.0, large language models can themselves be programmed with natural language prompts. To quote Andrei himself from back in January 2023, the hottest new programming language is English.

Discussing the transition from software 1 to software 2, Karpathy drew on his time at Tesla. As the company built out Autopilot, the code base was largely written in C++, but most of the visual data was handled by the neural network. Over time, as the Autopilot improved, the neural network component grew while C++ code was deleted. Karpathy said the software 2.0 stack quite literally ate through the software stack of the Autopilot. He believes we're seeing the same thing again with the proliferation of LLMs. Karpathy described LLMs as functionally a type of programmable neural network.

Rather than a set path, the user can program the LLMs to produce a variety of different outcomes. Now, this is not about vibe coding or getting an LLM to spit out lines of traditional code. This is about shifting our thinking to consider the use of LLMs themselves as an entirely new type of software. By way of example, if you're prompting an LLM to produce a deep research report, that's akin to writing a Python script that could search the web and summarize data. Of course, there are a huge number of differences.

But the key point is that we're talking about using an LLM to achieve a particular outcome in the same way you would use a traditional program. And because of all this, he argued that we need to think about LLMs in a slightly different way. He argued effectively that AI is the new electricity, and pointed out that LLMs feel like they have the properties of utilities right now. Karpathy drew links to how infrastructure is built, how tokens are metered, and even how brownouts in AI when a major service goes down can be similar to the electricity shutting off.

He also argued that LLMs are like computer chip fabs, that they require massive capex and have deeply held secrets in their construction, naturally trending towards a small number of powerful players. Finally though, he settled on the analogy of LLMs as operating systems. Rather than thinking of LLMs as similar to electricity, where every electron is the same, he argued that LLMs are now complex ecosystems, where there's differentiated functionality, tool use, and performance.

Giving a direct example, he noted that Cursor can be run using models from OpenAI, Google, or Anthropic, each with different outcomes. Looking towards the future, he noted that we're still in the 1970s era for the LLM computer, with large centralized players serving a very finite amount of compute. But Carpathy anticipates something similar to the PC revolution coming to LLMs, with users able to run them on their own hardware eventually.

Taking the analogy further, he suggested that current LLMs are still very similar to using an operating system directly through the terminal, arguing, I think a GUI hasn't yet been invested in a general way. Shouldn't ChatGBT have a graphical user interface different to the text bubbles?

Today's episode is brought to you by KPMG. In today's fiercely competitive market, unlocking AI's potential could help give you a competitive edge, foster growth, and drive new value. But here's the key. You don't need an AI strategy. You need to embed AI into your overall business strategy to truly power it up.

KPMG can show you how to integrate AI and AI agents into your business strategy in a way that truly works and is built on trusted AI principles and platforms. Check out real stories from KPMG to hear how AI is driving success with its clients, and

at www.kpmg.us slash AI. Again, that's www.kpmg.us slash AI. This episode is brought to you by Blitzy. Now, I talk to a lot of technical and business leaders who are eager to implement cutting-edge AI, but instead of building competitive moats, their best engineers are stuck modernizing ancient code bases or updating frameworks just to keep the lights on. These projects, like migrating Java 17 to Java 21, often mean staffing a team for a year or more.

And sure, co-pilots help, but we all know they hit context limits fast, especially on large legacy systems. Blitzy flips the script. Instead of engineers doing 80% of the work, Blitzy's autonomous platform handles the heavy lifting, processing millions of lines of code and making 80% of the required changes automatically. One major financial firm used Blitzy to modernize a 20 million line Java code base in just three and a half months, cutting 30,000 engineering hours and accelerating their entire roadmap.

Email jack at blitzy.com with modernize in the subject line for prioritized onboarding. Visit blitzy.com today before your competitors do. Today's episode is brought to you by Plum. You put in the hours, testing the prompts, refining JSON, and wrangling nodes on the canvas. Now, it's time to get paid for it.

Plum is the only platform designed for technical creators who want to productize their AI workflows. With Plum, you can build, share, and monetize your flows without giving away your prompts or configuration. When you're ready to make improvements, you can push updates to your subscribers with a single click.

Launch your first paid workflow at useplum.com. That's plum with a B and start scaling your impact. Today's episode is brought to you by Vanta. In today's business landscape, businesses can't just claim security, they have to prove it. Achieving compliance with a framework like SOC 2, ISO 27001, HIPAA, GDPR, and more is how businesses can demonstrate strong security practices.

The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources. Vanta makes it easy and faster by automating compliance across 35+ frameworks. It gets you audit-ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC whitepaper found that Vanta customers achieved $535,000 per year in benefits, and the platform pays for itself in just three months.

The proof is in the numbers. More than 10,000 global companies trust Vanta. For a limited time, listeners get $1,000 off at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off. Now when it comes to a different era of software, what's most interesting is about how it is different from previous eras.

One example he points out is that during software 1.0, the early adopters were governments and massive corporations because they were the only ones that could afford to operate mainframes. A similar thing was true in software 2.0, with neural networks largely the domain of research labs and tech companies. This time, however, the everyday user was the first adopter of LLMs and able to access this powerful new way to program a computer. He said, "'It's really fascinating to me that we have a new magical computer, and it's helping me boil an egg rather than helping the government with military ballistics.'"

Indeed, corporations and governments are lagging behind the adoption of all of us. His point is that this is completely unprecedented. He continued, We all have a computer, it's all just software, and ChatGPT was beamed down to billions of people instantly and overnight. It's kind of insane to me that this is the case and now it's our time to program these computers. Which is not to say that they are perfect computers.

Indeed, with a new era of software, we're finding new problems as well. There are, of course, the problems of hallucination and just more generally, jagged intelligence. In other words, while LLMs have perfect knowledge in some areas, they can also then fail to be able to see how many R's are in the word strawberry.

Less discussed, though, is the idea that LLMs don't natively learn new things. While a human working in an organization will learn how to perform specific tasks, an LLM will forget everything as soon as the context window is closed. This presents some very real limitations and breaks the analogy of human thought. Karpathy said, "...you have to simultaneously think through this superhuman thing that has a bunch of cognitive deficits and issues."

Yet Karpathy also believes that there's an entire category of computing tasks that are unlocked by LLMs that we're only starting to scratch the surface of. One of these ideas he called partial autonomy apps, or Copilot or Cursor for X.

The idea is an app like Cursor which acts as an overlay to LLMs and allows users to move faster. Rather than talking to the LLM operating system directly, Cursor can orchestrate many actions with the human overseeing the process. He noted that these kinds of apps often have a feature he referred to as an autonomy slider, where the user can select how much autonomy the LLM has to take actions and make changes depending on how sensitive the task is.

Karpathy in fact suggested that most software will become partially autonomous, with some big implications for the software industry who need to figure out how to integrate the new modality. He said, "Traditional software right now has all these switches designed for humans, but that has to change to be accessible to LLMs."

One of the conclusions is that software should seek to make the feedback loop between LLM generation and human verification as tight as possible. Carpathy is apparently interested in MCU references, as he used the Iron Man suit as a way to explain this autonomy slider idea. On one end of the spectrum, there is Tony Stark wearing the suit, versus when, a little bit down the line, he actually built autonomous versions of the suit that could operate themselves.

Carpathy said, "We can build augmentations or we can build agents, but we kind of want to do a bit of both. At this stage, working with fallible LLMs, it's less building flashy demonstrations of autonomous agents and more building partial autonomy products."

And in one more example of the need for interfaces that connect the dots more fluidly between what semi-autonomous software is producing and humans, he gave the example of vibe coding. As it stands at the moment, Karpathy said, vibe coding is super great when you want to build something custom that doesn't appear to exist and you just want to wing it. But he also walked through an app he has in production that transforms restaurant menus into pictures for easy selection. He said, the code was actually the easy part. Most of it was actually adding authentication and payments and a domain name. All of this was really hard. It was me in a browser clicking stuff.

I had the app working in a few hours, and then it took me a week because I was trying to make it real.

Bringing it all together, Karpathy argued that there's a new category of consumer that needs infrastructure, saying, It used to be just humans through GUIs or computers through APIs. Agents are computers, but they're human-like. There's people-spirits on the internet and they need to interact with our software infrastructure. One example he gave of what it's going to look like to design for this audience is Vercell and Stripe, who allow LLMs to access their documentation via Markdown. Karpathy said, If we can make docs accessible to LLMs, it's going to unlock a huge amount of use. And

And while accessibility is a big deal, the docs also need to fundamentally change to reflect how an LLM will take actions. Vercel, for example, has already done this, replacing the word click with agent-friendly API commands. Anthropix MCP is built on a similar concept. Karpathy said, The big takeaway is that there is still an absolute ton of code to be written to re-architect the world of software for agents.

The revolution in practice is about slowly and incrementally moving the slider from augmentation to full automation. But the architecture build-out, which Carpathy views as at least a decade long, has only just begun. And so that is LRS for this week. Like I said, guys, I have barely scratched the surface on this and would highly encourage you to go watch the whole video. For now, though, that is going to do it for today's AI Daily Brief. I appreciate you listening or watching, as always. And until next time, peace.

AI, Agents and Software 3.0 14:26 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

AI, Agents and Software 3.0