We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Context Engineering: What It Is and Why It Matters

2025/6/26

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript

People

McKay Wrigley

Nick Dobo

Sam Altman

领导 OpenAI 实现 AGI 和超智能，重新定义 AI 发展路径，并推动 AI 技术的商业化和应用。

Topics

Toby Lutke: 我认为“上下文工程”比“提示工程”更能准确地描述核心技能，即为大型语言模型提供足够的背景信息，使其能够合理地解决任务。这不仅仅是提出问题，而是提供所有必要的信息，以便模型能够理解并产生有用的输出。 McKay Wrigley: 我完全同意上下文工程的重要性。现在，通过精心组织上下文来减少模型的“战争迷雾”才能获得更好的性能。这意味着我们需要更好地理解模型的信息需求，并以一种易于理解的方式提供信息。 Nick Dobo: 我认为上下文工程的未来将包括提供工具、代理环境和护栏，以便大型语言模型能够自行查找所需的上下文。这将使模型能够更自主地解决问题，并减少对人工干预的需求。

Deep Dive

Shownotes Transcript

Translations:

中文

This podcast is supported by Google. Hey, everyone. David here, one of the product leads for Google Gemini. If you dream it and describe it, VO3 and Gemini can help you bring it to life as a video. Now with incredible sound effects, background noise, and even dialogue. Try it with a Google AI Pro plan or get the highest access with the Ultra plan. Sign up at Gemini.Google to get started and show us what you create.

Today on the AI Daily Brief, what is context engineering and why does it matter? Before that in the headlines, a big victory for Anthropic when it comes to fair use. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Hello friends. Quick announcements today. Thank you first to our sponsors for today's show. That would be Blitzy, Plum, Vanta, and Google Gemini. And of course, if you are interested in getting an ad-free version of the show, go to patreon.com slash AI Daily Brief.

Announcements are the same as they've been for a while. We are deep in fall sponsorship discussion, so if you are interested, hit me at nlw at breakdown.network. We also will have some more super intelligent news soon, including some hiring, so keep an ear out for that. But there is a lot to talk about today, including a new term which you are going to hear a lot more. So with that, let's dive in.

Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. We kick off today with a fairly big victory for Anthropic in their copyright case as a federal judge rules that AI training is fair use.

Now, Anthropic is one of the many AI labs that are fighting with authors and publishers over their use of copyrighted works and training data. Each lab is running effectively the same argument that AI training is analogous to reading and is therefore not a breach of copyright under fair use provisions. A federal judge has now accepted that argument, handing Anthropic an early victory in the case.

Christina Frohach, a professor of legal writing at the University of Miami School of Law, explained, She said the court came to the same conclusion about AI models.

In handing down the ruling, the judge commented that the, quote, He noted that copyright law, quote,

Now all that said, this is only a partial victory for Anthropic. The order was only decided on this extremely narrow piece of law, with a further dispute in the case going to a separate trial. That trial will deal with what Anthropic refers to as their central library, a corpus of all the books in the world, used to create training datasets. As well as scanning in physical books, plaintiffs claim Anthropic pirated 7 million digital copies to create this repository.

In a prelude of what's to come, the judge wrote, According to the copyright law, willful infringement can carry a maximum penalty of $150,000 per work. If the court rules that Anthropic breached copyright millions of times in pirating books, the fines could easily bankrupt the startup.

The judge noted that the fact that, quote, Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for theft, but it may affect the extent of statutory damages.

One of the interesting points to note is that the fair use ruling was based on Anthropics AI outputs being transformative. That is, the AI model wasn't capable of directly reproducing copyrighted works, it was trained to create something new out of its training data. In fact, the judge referred to Claude's outputs as exceedingly transformative, noting, "...like any reader aspiring to be a writer, Anthropics LLMs trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different."

So ultimately, Anthropic is far from off the hook, and the law is far from settled, but it is still a landmark ruling for the AI copyright question. Importantly, this is only a ruling in federal court, so it isn't binding in other cases and could still be appealed. However, it can be used to persuade other judges to follow this interpretation of the law for the time being. Obviously, this is and remains a contentious area, one in which I fully anticipate we will need to have end up before the Supreme Court before we actually finally know how it will be handled.

Next up, staying on the law train for a moment, Sam Altman is fighting the IO lawsuit on X. The legal battle surrounding Johnny Ives' AI device startup and the identically named Google spinoff is starting to get a little bit nasty. Lawsuit filings are now circulating, but Sam Altman decided to take his version of the story direct. Yesterday, he posted, "'Jason Rugolo had been hoping we would invest in or acquire his company IO, I-Y-O, and was quite persistent in his efforts. We passed and were clear along the way.'

Now he is suing OpenAI over the name. This is silly, disappointing, and wrong. I made a lot of time to talk to Jason on his repeated outreaches because I like helping founders. A few days before the lawsuit, he asked again for us to acquire his company, even after we tried to pass just before.

It is cool to try super hard to raise money or get acquired and to do whatever you can to make your company succeed. It is not cool to turn to a lawsuit when you don't get what you want. It sets a terrible precedent for trying to help the ecosystem. All that said, I wish Jason and his team the best building great products. The world certainly needs more of that and less lawsuits.

OpenAI's legal filings, meanwhile, tell basically the same story. That technical staff at IO, this new OpenAI division, met with Rugolo out of a sense of professional courtesy, were unimpressed with a broken demo, and moved on to build something other than what they had seen. In Rugolo's responses on Twitter, he basically said it was about the name. In one post, he wrote, "...there are 675 other two-letter names that they can choose that aren't ours."

And basically, if you want to read on how the community thinks about this, I think that on the one hand, after Sam shared these emails, the OpenAI side of the story that they just weren't all that impressed looks pretty resonant or at least true from what they were discussing internally. But at the same time, people also kind of feel like, hey, did you have to choose a name that close? Ultimately, my guess is that it's not worth the trouble and OpenAI just changes the name. But what do I know?

One more interesting thing on OpenAI, the company has quietly designed a productivity suite for ChatGPT, which could put them in direct competition with big backers like Microsoft. The features would allow users to collaborate on documents and communicate with each other, similar to the functionality of Microsoft Office or maybe even more directly, Google Workspace. The information reports that no decision has been made about launching the feature, but a release could drive a further wedge in OpenAI's relationship with their backer Microsoft.

In some ways, though, this is just a natural extension of the Canvas feature, which gives users a separate document window inside ChatGPT, making the Assistant more useful in work settings. It would also allow OpenAI to compete to be an everything app.

Coincidentally, we got news earlier this week that XAI is working on a productivity suite as well. So is this some big competitive change? Or is it just that all of these products are trending towards the same direction and are going to have some similar features? I tend to think it's more that than any sort of big new competition between these two frenemies.

Last up, a couple of product updates before we get to the main episode. First, Airtable have made a major move relaunching as an AI-native app. CEO Howie Liu posted, Instead of just adding more AI capabilities to our existing platform, we treated this as a re-founding moment for the company. We started with a clean slate imagining of the ideal form factor for building apps in the agentic era.

The no-code database platform is now a fully functional vibe coding app as well. Users can now use natural language to prompt apps into existence while integrating them into Airtable's production-ready components. Lou gave the examples of creating a VC deal tracker that does automated company research or a marketing campaign manager that monitors all relevant competitors.

The AI integration also means you can easily run queries across your database. For example, you can get the Assistant to crunch thousands of support tickets to find common pain points quickly. The rebuild also adds agentic functionality built in to help you manage large data workflows. Howey wrote, When the cost of making and continually evolving apps drops to zero, everything changes.

Companies will build exactly what they need rather than settling for rigid off-the-shelf software. The new default is AI-generated apps plus built-in AI agents working 24-7. What's needed in this new era is a new form factor and paradigm for software, the AI-native app platform. This is the new Airtable. And what's launching today is just the beginning. We're excited to release a slew of new AI-powered capabilities in the months ahead, sneak peek, generate any visualization, agents leveraging MCP, agentically sourced datasets, and much, much more.

Another company getting all agentic is Eleven Labs, who have launched a new voice AI assistant called Eleven AI.

The pitch is that this voice assistant has full MCP integration, so it can pull data from services including Perplexity, Slack, Gmail, and Google Calendar. You can even connect to your own MCP server so the assistant can theoretically access anything you want it to. Functionally, this is pretty similar to the voice assistant that Anthropic announced earlier last month alongside the launch of Voice Mode. It's designed as a voice interface to access all sorts of AI functionality. The advertising is even similar.

Anthropic advertised their product as being able to help power a young professional through their morning, while the Eleven Labs ad followed a similar story but featuring a young man rolling out of bed with five minutes to spare until a web conference with his boss. The assistant helped him delay the meeting over email, order a greasy breakfast, and remember what his boss's pet is for small talk. The release came alongside the long-awaited mobile app so you can chat to Eleven Labs' assistant on the go.

Now, I'm not sure about the positioning of these assistants. I am a little more skeptical than most of these sort of generalist consumer assistants, but I could be very wrong. But still, once again, if some of the theme of the OpenAI productivity suite is the convergence of all of these platforms into one common set of features, this is yet again another example of that.

Anyways, that is going to do it for today's AI Daily Brief Headlines Edition. Next up, the main episode. This episode is brought to you by Blitzy. Now, I talk to a lot of technical and business leaders who are eager to implement cutting-edge AI. But instead of building competitive moats, their best engineers are stuck modernizing ancient code bases or updating frameworks just to keep the lights on. These projects, like migrating Java 17 to Java 21, often mean staffing a team for a year or more.

And sure, co-pilots help, but we all know they hit context limits fast, especially on large legacy systems. Blitzy flips the script. Instead of engineers doing 80% of the work, Blitzy's autonomous platform handles the heavy lifting, processing millions of lines of code and making 80% of the required changes automatically. One major financial firm used Blitzy to modernize a 20 million line Java code base in just three and a half months, cutting 30,000 engineering hours and accelerating their entire roadmap.

Email jack at blitzy.com with modernize in the subject line for prioritized onboarding. Visit blitzy.com today before your competitors do. Today's episode is brought to you by Plum. You put in the hours, testing the prompts, refining JSON, and wrangling nodes on the canvas. Now, it's time to get paid for it.

Plum is the only platform designed for technical creators who want to productize their AI workflows. With Plum, you can build, share, and monetize your flows without giving away your prompts or configuration. When you're ready to make improvements, you can push updates to your subscribers with a single click.

Launch your first paid workflow at useplum.com. That's plum with a B and start scaling your impact.

The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources. Vanta makes it easy and faster by automating compliance across 35-plus frameworks. It gets you audit-ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC white paper found that Vanta customers achieve $535,000 per year in benefits, and the platform pays for itself in just three months.

The proof is in the numbers. More than 10,000 global companies trust Vanta. For a limited time, listeners get $1,000 off at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off. Welcome back to the AI Daily Brief.

Today we are talking about a term that you might have heard a little bit here and there on this show. Maybe you've seen it start to appear more on X or in articles. We're going to talk about what it means, why it's coming up more and more right now, and why it matters for the industry as a whole. To kick us off, let's turn to a recent tweet from Toby Lutke, the CEO of Shopify. Last week he wrote, I really like the term context engineering over prompt engineering. It describes

It describes the core skill better. The art of providing all the context for the task to be plausibly solved by the LLM. Now, a lot of folks jumped into the conversation to agree. McKay Wrigley wrote, totally agree. These days you get way less performance bonus out of dumb tricks like I'll pay you $100 if you get it right, which is how it should be.

All of the alpha is an assembling context well to reduce the fog of war for the model. It's converging to human-ish info needs. Nick Dobo says soon it, context engineering, will include providing the tools, agent environment, and guardrails so the LLMs can find the context on its own. So basically what we have here is a different way to think about how to get the most out of LLMs.

Since the beginning of ChatGPT, there has been this new field of prompt engineering, which has spawned innumerable courses and online tutorials, and many tricks and tips and quirks of how to ask in the right way to get the things you need out of LLMs. Now, along the way, prompt engineering has become more and more, let's say, diffuse, if not at this stage, less important. And what I mean by that

is that the smarter that models get, the more that tips from six or 12 months ago cease to work. And in many cases, there's also UI-related or interface-related abstraction of prompting where some amount of prompt engineering is being taken over by the tools themselves. To take one example, when I was designing a cover for a recent episode,

My prompt to ideogram was fun retro-futuristic cover for The Quest for the Solopreneur Unicorn, 1950s mid-century modern. However, what ideogram turned that into was a retro-futuristic book cover in the style of 1950s mid-century modern design featuring a determined sharply dressed solopreneur riding a majestic glowing white unicorn through a swirling nebula. The solopreneur wears a tailored gray suit, a confident smile, and pilots the unicorn with a futuristic joystick, while the unicorn's horn emits a beam of light illuminating the path ahead.

The background consists of stylized geometric planets and stars rendered in a vibrant palette of teal, orange, and yellow, with the title, The Quest for the Solopreneur Unicorn, boldly displayed in a classic chrome-accented font. So you're seeing this type of thing happen a lot more in different tools where the tool themselves are trying to take the essence of what you were asking for and do a better prompt than you could do.

Context is something different, and it refers to another part of the value chain of these LLMs. Context is all of the information you give an LLM that helps it answer the question more correctly. So for example, if you are using ChatGPT's O3 or O3 Pro, which is particularly optimized to be better at context, when you add a bunch of files to your prompt,

That is the context that you're giving it. Context engineering then becomes about, are you giving the LLM the right information that it needs to give you the output that you're looking for? And it turns out this isn't just about which documents to share with it. It's also literally an engineering task around how to carry context across more complex systems. You might remember we recently talked about a post from Cognition, who creates Devin, called Don't Build Multi-Agents.

And this was all anchored around context and context engineering. Basically, the argument in this piece, for those who don't remember, was that the multi-agent workflow, where an agent breaks down a task and hands it to multiple different sub-agents with an agent that then combines the results on the other side, is one that is doomed to be fairly brittle because the transmission of context from agent to sub-agent and then sub-agent back to agent can be really difficult.

The example he gave was this: Suppose your task is build a Flappy Bird clone. This gets divided into Subtask 1: Build a moving game background with green pipes and hitboxes. And Subtask 2: Build a bird that you can move up and down. It turns out Subagent 1 accidentally mistook your subtask and started building a background that looks like Super Mario Bros. Subagent 2 built you a bird, but it doesn't look like a game asset, and it moves nothing like the one in Flappy Bird.

Now the final agent is left with the undesirable task of combining these two miscommunications. Now he goes to some potential solves, but still finds them unreliable and ultimately comes to the idea of instead building a single-threaded linear agent.

In the cognition model, the agent breaks down the task and breaks it into subtasks rather than sub-agents. So the same agent does the breaking down of the task, then the doing of subtask 1 and subtask 2, and then combines the results, with the idea being largely that this carries context between the different tasks better than the other multi-agent system. Here the context is continuous.

At the same time, they recognize that as very large tasks start to have so many subparts that context windows start to overflow, that there may be a need for a new approach. One architecture that they share is the idea of a side-long context compression LLM, which basically across each stage compresses the conversation and action so far, i.e. the context, into a set of key moments and decisions, with that compressed context being what informs the next subtask's work.

Now, whether you agree with this strategy or not is not the point of this piece. It's to show how context engineering is starting to become a part of some of the most important questions in AI, which has to do with how to build agents that are actually highly functional. And if you look around for about five minutes, we are seeing a ton of discussion of context engineering pop up. Just a couple of days ago, Lance Martin wrote on their blog a post called Context Engineering for Agents.

Lance writes,

Just like RAM, the LLM context window has limited communication bandwidth to handle these various sources of context. And just like an operating system curates what fits into a CPU's RAM, we can think about context engineering as packaging and managing the context needed for an LLM to perform a task. So once again, this is coming at that same issue that we saw in the Cognition blog of having to engineer systems that get the right context, but don't just dump everything in willy-nilly.

What Lance points out is the growing importance of this domain. He points to a quote from Cognition again, who writes, "...context engineering is effectively the number one job of engineers building AI agents," and another quote from Anthropic that read, "...agents often engage in conversations spanning hundreds of turns, requiring careful context management strategies."

Now, the second part of this blog is all about the ways that we can manage that context and new strategies for that sort of context management, which is a little bit more technical and out of scope for this particular show. But I will include this in the show notes so you can go check it out for yourself. Lance talks about curating context, i.e. managing the tokens that an agent sees at each turn, persisting context,

involving systems to store, save, and retrieve context over time, and isolating context, involving approaches to partition context across agents or environments. Lance points out that we are still at the very beginning early baby steps for forming general principles for building agents, and that's why there's such an explosion in this discussion. Another post that was published on the same day comes from the Langchain blog and is called The Rise of Context Engineering.

The piece reads, Context engineering is building dynamic systems to provide the right information and tools in the right format, such that the LLM can plausibly accomplish the task. Most of the time when an agent is not performing reliably, the underlying cause is that the appropriate context, instructions, and tools have not been communicated to the model. LLM applications are evolving from single prompts to more complex dynamic agentic systems. As such, context engineering is becoming the most important skill an AI engineer can develop.

And again, this piece really reiterates that when agentic systems mess up and LLMs tend to mess up, either because they're just not good enough or because it didn't have the appropriate context. What's more, the author argues that as models get better, it tends to be more that second reason. The author concludes, context engineering isn't a new idea. Agent builders have been doing it for the past year or two. It's a new term that aptly describes an increasingly important skill.

So I think that there are actually two different domains of context engineering that are worth us keeping in mind and that are worth you and I exploring. The first is context engineering in the context of AI engineers and actual agent building.

In other words, for people who are building agentic systems, software engineers that are thinking about how to make agents more performant and work on higher complexity and higher order tasks, these questions of context engineering are about system design. They're about things like the context compression LLM that sits alongside a single agent system and makes it work better.

There is a whole entire important discourse happening in that domain that will influence the shape of the agents that even non-coders and non-technical people ultimately interact with. However, my strong guess is that we're likely to start seeing context engineering also refer to a term for consumers and just regular LLM users. In the same way that we have increasingly taught ourselves or tried to teach ourselves

how to prompt LLMs to get the most out of them, my guess is that context engineering in a user environment is going to become a more important field and discipline as well. What's the right amount of information to give any given model? Which models are better at different types of information?

Indeed, one area where we have started to see this is in the release of O3 Pro. You'll remember that the piece from latent space that I thought was the best summary of O3 Pro was called God is Hungry for Context, and it basically argued that the big difference between O3 and O3 Pro was that O3 Pro was better at handling lots and lots of context.

When the authors of this piece gave it a huge volume of information about their company, including past meeting notes and recorded audio, it came back with a much better strategy for them than O3 did alone. And so in that, we have context engineering from a user standpoint, both in terms of model selection and which model is going to be better at context, and second, in terms of what type of context to give it.

I think it's an extremely dynamic field. I think it's likely going to be every bit, if not more important than prompt engineering in how we use these tools. And I'm excited to share more about this as it becomes a bigger part of conversation. For now, though, that is going to do it for today's little baby primer on context engineering. I hope this was useful. Appreciate you guys listening or watching as always. And until next time, peace.

Context Engineering: What It Is and Why It Matters 23:39 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

Context Engineering: What It Is and Why It Matters