We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

⚡️The new OpenAI Agents Platform

2025/3/11

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

AI Deep Dive AI Chapters Transcript

People

Alessio

Nikunj

Romain

Swyx

Topics

Alessio: 我是Decibel的合伙人和CTO Alessio，我和Small AI的创始人Swyx以及我们的老朋友Roman和Nikunj一起，讨论OpenAI今天发布的新API。我们讨论了OpenAI发布的新的代理平台，包括Responses API、网页搜索工具、文件搜索工具和计算机使用工具，以及一个新的开源Agents SDK。我询问了关于Chat Completion API是否会被淘汰的问题，以及如何选择使用哪种API。 Swyx: 我关注的是Chat Completion API是否会被淘汰，以及Responses API与Assistant API和Chat Completion API的关系。我还询问了关于知识截止日期、搜索深度和广度、文件搜索和网页搜索的结合使用以及RAG的问题。 Roman: Chat Completion API不会被淘汰，Responses API是为了支持新的代理工作流程而创建的。Responses API结合了Assistant API的优点，并简化了工具的集成方式。知识截止日期取决于用例和所需的信息来源。网页搜索工具可以与其他工具和功能结合使用。 Agents SDK的开发是为了解决在生产环境中编排代理的难度，它和跟踪UI可以帮助开发者构建更好的代理工作流程。 Nikunj: OpenAI今天发布了三个新的内置工具：网页搜索工具、改进的文件搜索工具和计算机使用工具。 OpenAI发布了一个新的Responses API来支持这些新工具，它是一个更灵活的基础，用于构建代理应用程序。OpenAI将使Assistant API用户能够轻松迁移到Responses API，Responses API支持Chat Completion API和Assistant API的所有功能，并且具有无状态模式。网页搜索工具以两种方式发布：作为Responses API中的工具和作为Chat Completion API中的微调模型。文件搜索和网页搜索API可以结合使用，例如将用户偏好存储在向量存储中，然后搜索与这些偏好相关的产品。文件搜索API提供了一种托管的RAG服务。计算机使用模型还处于早期阶段，但它是一个单独的模型，可以接收屏幕截图并指示要采取的操作。预览模型最终会合并到主分支中。 Agents SDK增加了对类型、防护栏和跟踪的支持，并且更加灵活。Agents SDK的跟踪数据将与评估产品连接，以进行强化学习微调。

Deep Dive

Shownotes Transcript

Translations:

中文

Hey everyone, welcome back to another Latents-Based Lightning episode. This is Alessio, partner and CTO at Decibel, and I'm joined by Spix, founder of SmallAI. Hi, and today we have a super special episode because we're talking with our old friend Roman. Hi, welcome. Hi.

Thank you. Thank you for having me. And Nikunj, who is most famously, if anyone has ever tried to get any access to anything on the API, Nikunj is the guy. So I know your emails because I look forward to them. Yeah. Nice to meet all of you. I think that we're basically convening today to talk about the new API. So perhaps you guys want to just kick off. What is OpenAI launching today? Yeah, so I can kick it off. We're launching a bunch of new things today. We're going to do three new built-in tools.

So we're launching the web search tool. This is basically ChatGPD for search, but available in the API. We're launching an improved file search tool. So this is you bringing your data to OpenAI. You upload it. We take care of parsing it, chunking it, embedding it, making it searchable, give you this ready vector store that you can use. So that's the file search tool. And then we're also launching our computer use tool. So this is the tool behind the operator product in ChatGPD. So that's coming to developers today.

And to support all of these tools, we're going to have a new API. So, you know, we launched that completions like I think March 23 or so. It's been a while. So we're looking for an update over here to support all the new things that the models can do. And so we're launching this new API called the Responses API. It is, you know,

It works with tools. We think it'll be like a great option for all the future agentic products that we build. And so that is also launching today. Actually, the last thing we're launching is the agent's SDK. We launched this thing called Swarm last year where it was an experimental SDK for people to do multi-agent orchestration and stuff like that. It was supposed to be like educational, experimental, but like people really loved it. They like ate it up.

And so we were like, all right, let's upgrade this thing. Let's give it a new name. And so we're calling it the Agents SDK. It's going to have built-in tracing in the OpenAI dashboard. So lots of cool stuff going out. So yeah, excited about it. That's a lot, but we said 2025 was the year of Agents. So there you have it, like a lot of new tools to build these agents for developers.

Okay, I guess we'll just kind of go one by one and we'll leave the Agents SDK towards the end. So responses API, I think the sort of primary concern that people have and something I think I voiced to you guys when I was talking with you in the planning process was, is chat completions going away? So I just wanted to let you guys respond to the concerns that people might have.

Chat completion is definitely here to stay. It's a bare-metal API we've had for quite some time, lots of tools built around it. So we want to make sure that it's maintained and people can confidently keep on building on it. At the same time, it was optimized for a different world. It was optimized for a pre-multi-modality world. We also optimized for single turn,

text prompt in, text response out. And now with these agentic workflows, we noticed that developers and companies want to build longer horizon tasks, things that require multiple turns to get the task accomplished. And computer use is one of those, for instance. And so that's why the responses API came to life to support these new agentic workflows. But chat completion is definitely here to stay.

In Assistance API, we have the target sunset date of first half of 2026. So this is kind of like, in my mind, there was a kind of very poetic mirroring of the API with the models. I kind of view this as like kind of the merging of Assistance API and chat completions, right? Into one unified responses. So it's kind of like how GPT and the old series models are also unifying.

Yeah, that's exactly the right framing. I think we took the best of what we learned from the assistance API, especially being able to access tools very conveniently. But at the same time, simplifying the way you have to integrate-- you no longer have to think about six different objects to get access to these tools.

the responses API, you just get one API request and suddenly you can sweep in those tools, right? Yeah, absolutely. And I think we're going to make it really easy and straightforward for assistance API users to migrate over to responsive API without any loss of functionality or data. So our plan is absolutely to add, you know, assistant like objects and thread light objects

that work really well with the Responsive API. We'll also add the Code Interpreter tool, which is not launching today, but will come soon. And we'll add Async mode to Responsive API because that's another difference with assistance.

I will have web hooks and stuff like that. But I think it's going to be like a pretty smooth transition once we have all of that in place and we'll give folks like a full year to migrate and help them through any issues they face. So I feel like this is going to benefit from this longer term with this more flexible, primitive.

How should people think about when to use each type of API? So I know that in the past, the assistance was maybe more stateful, kind of like long running, many tool use, kind of like file based things. And the chat completions is more stateless, you know, kind of like traditional completion API. Is that still the mental model that people should have or like should you by default always try and use the responses API? So the responses API is going to support everything that it's at launch, going to support everything that chat completion supports.

And then over time, it's going to support everything that assistance supports. So it's going to be a pretty good fit for anyone starting out with OpenAI. They should be able to go to responses. Responses, by the way, also has a stateless mode. So you can pass in store false and that'll make the whole API stateless, just like chat completions. We're really trying to get this unification story in so that people don't have to juggle multiple endpoints.

That being said, ChatCompletion is our most widely adopted API. It's so popular, so we're still going to support it for years with new models and features. But if you're a new user or if you want to tap into some of these built-in tools or something, you should feel totally fine migrating to responses and you'll have more capabilities and performance than ChatCompletion.

I think the messaging that I agree that I think resonated the most when I talked to you was that it is a strict superset, right? Like you should be able to do everything that you could do in chat commissions and with assistants. The thing that I just assumed that because you're now, you know, by default is stateful, you're actually storing the chat logs or the chat state. I thought you'd be charging me for it. So, you know, to me, it was very surprising that you figured out how to make it free.

Yeah, it's free. We store your state for 30 days. You can turn it off. But yeah, it's free. Interesting thing on state is that it just makes, particularly for me, it makes debugging things and building things so much simpler where I can create a responses object that's pretty complicated and part of this more complex application that I've built. I can just go into my dashboard and see exactly what happened. Did I mess up my prompt? Did it not call one of these tools? Did I misconfigure one of the tools?

Like the visual observability of everything that you're doing is so, so helpful. So I'm excited about people trying that out and getting benefits from it too. Yeah, it's really, I think, really nice to have. But all I'll say is that my friend Corey Quinn says that anything that can be used as a database will be used as a database. So be prepared for some abuse. Yeah.

All right. Yeah, that's a good one. Try with the metadata. People are very, very creative at stuffing data into an object. We do have metadata with responses. Exactly.

Let's get through all of these. So web search. I think when I first said web search, I thought you were going to just expose an API that then returned kind of like a nice list of things. But the way it's named is like GPT-4.0 search preview. So I'm guessing you're using basically the same model that is in the chat GPT search, which is fine-tuned for search, I'm guessing. It's a different model than the base one. And it's impressive, the jump in performance. So just to give an example, in SimpleQA,

GPT-4.0 is 38% accuracy. 4.0 search is 90%. We always talk about how tools are like, models is not everything you need. Like tools around it are just as important. So yeah, maybe give people a quick preview on like the work that went into making this special. Should I do that? Yeah. Go for it. So, yeah,

Firstly, we're launching web search in two ways. One, in responses API, which is our API for tools, it's going to be available as a web search tool itself. So you'll be able to go tools, turn on web search, and you're ready to go. We still wanted to give chat completions people access to real-time information. So in that chat completions API, which does not support built-in tools,

we're launching the direct access to the fine-tuned model that ChatGPT for Search uses, and we call it GPT-4.0 Search Preview. How is this model built? Basically, our search research team has been working on this for a while. Their main goal is to get a bunch of information from all of our data sources that we use to gather information for search.

and then pick the right things and then cite them as accurately as possible. And that's what the search team has really focused on. They've done some pretty cool stuff. They use synthetic data techniques. They've done O-series model distillation to make these 4.0 fine tunes really good. But yeah, the main thing is, can it remain factual? Can it answer questions based on what it retrieves? And can it cite it accurately? And that's what this fine tune model really excels at.

And so, yeah, I'm super excited that it's going to be directly available in chat completions along with being available as a tool. Yeah, just to clarify, if I'm using the responses API, this is a tool. But if I'm using chat completions, I have to switch model. I cannot use 01 and call search as a tool. Yeah, that's right. Exactly. I think what's really compelling, at least for me and my own experience,

uses of it so far is that when you use like web search as a tool, it combines nicely with every other tool and every other feature of the platform. So think about this for a second. For instance, imagine you have like a responses API called with the web search tool, but suddenly you turn on function calling. You also turn on, let's say, structured outputs. Now you can have like the ability to structure any data from the web in real time.

in the JSON schema that you need for your application. So it's quite powerful when you start combining those features and tools together. It's kind of like an API for the internet, almost, you know, like you get like access to the precise schema you need for your app. Yeah. And then just to wrap up on the infrastructure side of it, I read on the...

posts that people, publisher can choose to appear in the web search. So are people by default in it? Like how can we get latent space in the web search API? Yeah. Yeah. I think we have some documentation around how

websites, publishers can control what shows up in a web search tool. And I think you should be able to read that. I think we should be able to get Latent Space in for sure. Yeah, I think so. I compare this to a broader trend that I started covering last year of online LLMs. Actually, Perplexity, I think, was the first to offer an API that is connected to search.

And then Gemini had the sort of search grounding API. And I think you guys, I actually didn't, I missed this in the original reading of the docs, but you even give like citations with like the exact sub paragraph that is matching, which I think is the standard nowadays. I think my question is, how do we take what a knowledge cutoff is for something like this, right? Because like now basically there's no knowledge cutoff is always live.

But then there's a difference between what the model has sort of internalized in its backpropagation and what is searching up its rag.

I think it kind of depends on the use case, right? And what you want to showcase as the source. Like, for instance, you take a company like Hebbia that has used this like web search tool. They can combine like for credit firms or law firm, they can find like, you know, public information from the internet with the live sources and citation that sometimes you do want to have access to, as opposed to like the internal knowledge tool.

But if you're building something different where like you just want to have an assistant that relies on the deep knowledge that the model has, you may not need to have these like direct citations. So I think it kind of depends on the use case a little bit. But there are many, many companies like Hebia that will need that access to these citations to precisely know where the information comes from.

Yeah, yeah, for sure. And then one thing on the breadth, you know, I think a lot of the deep research, open deep research implementations have this sort of hyper parameter about how deep they're searching and how wide they're searching. I don't see that in the docs, but is that something that we can tune? Is that something you recommend?

Thinking about. Super interesting. It's definitely not a parameter today, but we should explore that. It's very interesting. I imagine like how you would do it with the web search tool and responses API is you would have some form of like, you know, agent orchestration over here where you have a planning step and then each like web search call that you do like explicitly goes a layer deeper and deeper and deeper. But it's not a parameter that's available out of the box. But it's a cool, it's a cool thing to think about. Yeah. The only guidance I'll offer there is

A lot of these implementations offer top K, which is like, you know, top 10, top 20, but actually don't really want that. You want like sort of some kind of similarity cutoff, right? Like some matching score cutoff, because if there's only five things, five documents that match fine. If there's 500 that match, maybe that's what I want. Right. Yeah.

But also that might make my costs very unpredictable because the costs are something like $30 per 1,000 queries, right? So yeah. Yeah. Yeah. I guess you could have some form of a context budget and then you're like, go as deep as you can and pick the best stuff and put it into X number of tokens.

There could be some creative ways of managing cost. But yeah, that's a super interesting thing to explore. Do you see people using the files and the search API together where you can kind of search and then store everything in the file so the next time I'm not paying for the search again? And like, yeah, how should people balance that?

That's actually a very interesting question. Let me first tell you about how I've seen a really cool way I've seen people use files in search together is they put their user preferences or memories in the vector store. And so a query comes in, you use the file search tool to get someone's reading preferences or fashion preferences and stuff like that. And then you search the web for information or products that they can buy related to those preferences.

And you then render something beautiful to show them like, here are five things that you might be interested in. So that's how I've seen like file search, web search work together. And by the way, that's like a single responsive API call, which is really cool. So you just like configure these things, go boom and like everything just happens. But yeah, that's how I've seen like files and web work together. But I think that what you're pointing out is like interesting. And I'm sure developers will surprise us as they always do in terms of how they combine these tools and how they might use them.

file search as a way to have memory and preferences, like Nikum says. But I think zooming out, what I find very compelling and powerful here is when you have these neural nets that have all of the knowledge that they have today, plus real-time access to the internet,

for like any kind of real-time information that you might need for your app and file search where you can have a lot of company, private documents, private details. You combine those three and you have like very, very compelling and precise answers for any kind of use case that your company or your product might want to enable. It's a difference between sort of internal documents versus the open web, right? Like you're going to need both.

Exactly, exactly. I never thought about it doing memory as well. I guess, again, you know, anything that's a database, you can store it and we'll use it as a database. That sounds awesome. But I think also you've been, you know, expanding the file search. You have more file types. You have query optimization, custom re-ranking. So it really seems like, you know, it's been fleshed out. Obviously, I haven't been paying attention

a ton of attention to the file search capability. But it sounds like your team has added a lot of features. Yeah. Metadata filtering was like the main thing people were asking us for for a while. And that's the one I'm super excited about. I mean, it's just so critical once you're like, your store size goes over, you know, more than like, you know, five, 10,000 records. You kind of need that. So yeah, metadata filtering is coming too. And for most companies, it's also not like a competency that you want to rebuild in-house necessarily. You know, like,

you know, thinking about embeddings and chunking and, you know, how of that like it sounds like very complex for something very like obvious to ship for your users like companies like Navant, for instance, they were able to build with the file search, like, you know, take all of the FAQ and travel policies, for instance, that you have you, you put that in file search tool, and then you don't have to think about anything. Now your assistant becomes naturally much more aware of all of these policies from the files.

The question is, there's a very, very vibrant RAG industry already, as you well know. So there's many other vector databases, many other frameworks. Probably if it's an open source stack, I'll say a lot of the AI engineers that I talk to want to own this part of the stack. And it feels like, when should we DIY and when should we just use whatever OpenAI offers?

Yeah. I mean, if you're doing something completely from scratch, you're going to have more control, right? So super supportive of people trying to roll up their sleeves, build their super custom chunking strategy and super custom retrieval strategy and all of that. And those are things that will be harder to do with open-end tools. Open-end tool has like, we have an out-of-the-box solution. We give you some knobs to customize things, but it's more of like a managed rack service.

So my recommendation would be like start with the open end thing, see if it like meets your needs. And over time, we're going to be adding more and more knobs to make it even more customizable. But, you know, if you want like the completely custom thing, you want control over every single thing.

then you'd probably want to go and hand roll it using other solutions. So we're supportive of both, like engineers should pick. And then we got computer use, which I think Operator was obviously one of the hot releases of the year and we're only...

Let's talk about that. And that's also, it seems like a separate model that has been fine tuned for operator that has browser access. Yeah, absolutely. I mean, the computer use models are exciting. The cool thing about computer use is that we're just so, so early. It's like the GPT-2 of computer use or maybe GPT-1 of computer use right now. But it is a separate model that has been developed.

know, the computer use team has been working on. You send it screenshots and it tells you what action to take. So the outputs of it are almost always tool calls and you're inputting screenshots based on whatever computer you're trying to operate. Maybe zooming out for a second, because like I'm sure your audience is like super, super like native, obviously. But like what is computer use as a tool, right? And what's operator? So the idea for computer use is like, how do we let developers also build agents

that can complete tasks for the users, but using a computer or a browser instead. And so how do you get that done? And so that's why we have this custom model, like optimized for computer use that we use like for operator ourselves. But the idea behind like putting it as an API is that imagine like now you want to automate some tasks for your product or your own customers, then now you can have like the ability to spin up

one of these agents that will look at the screen and act on the screen. So that means the ability to click, the ability to scroll, the ability to type and to report back on the action. So that's what we mean by computer use and wrapping it as a tool also in the responses API. So now like that gives a hint also at the multi-turn thing that we were hinting at earlier. The idea that like, yeah, maybe one of these actions can take a couple of minutes to complete because there's maybe like 20 steps to complete that task. But now you can.

Do you think computer use can play Pokemon? That would be interesting. I guess we should try it. There's a lot of interest. I think Pokemon really is a good agent benchmark, to be honest. It seems like Claude is running into a lot of trouble.

Sounds like we should make that a new eval, it looks like. Yeah. Oh, and then one more thing before we move on to Agents SDK. I know you have a hard stop. There's all these, you know, blah, blah, dash preview, right? Like search preview, computer use preview, right? And you see them all like fine-tuned of 4.0. I think the question is, are they all going to be merged into the main branch or are we basically always going to have subsets of these models? Yeah.

Yeah, I think in the early days, research teams at OpenAI operate with fine-tuned models. And then once the thing gets more stable, we merge it into the mainline. So that's definitely the vision going out of preview as we get more comfortable with

and learn about all the developer use cases and we're doing a good job at them. We'll sort of like make them part of like the core models so that you don't have to like deal with the bifurcation. You should think of it this way as exactly what happened last year when we introduced vision capabilities, you know, vision capabilities were in like a vision preview model based off of GPT-4 and then vision capabilities now are like obviously built into GPT-4.0. You can think about it the same way for like the other modalities like audio and those kind of like models like optimized for search and computer use.

Agents SDK, we have a few minutes left. So let's just assume that everyone has looked at Swarm. Sure. I think that Swarm has really popularized the handoff technique, which I thought was really interesting for sort of a multi-agent world. What is new with the SDK? Yeah, for sure. So we've basically added support for types. We've made this a lot more...

We've added support for types. We've added support for guardrailing, which is a very common pattern. In the guardrail example, you basically have two things happen in parallel. The guardrail can block the execution. It's a type of optimistic generation that happens.

And I think we've added support for tracing so you can basically look at the traces that the agents SDK creates in the open dashboard. We also made this pretty flexible so you can pick any API from any provider that supports the chat completions API format. So it supports responses by default, but you can easily plug it into anyone that use the chat completions API. And similarly, on the tracing side, you can support like multiple tracing providers.

By default, it sort of points to the OpenAI dashboard. But, you know, there's like so many tracing companies out there and we'll announce some partnerships on that front, too. So just like, you know, adding lots of core features and making it more usable, but still centered around like handoffs is like the main, main concept. And by the way, it's interesting, right? Because Swarm

just came to life out of like learning from customers directly that like orchestrating agents in production was pretty hard. You know, simple ideas could quickly turn very complex. Like what are those guardrails? What are those handoffs, et cetera. So that came out of like learning from customers and was initially shipped as a like low key experiment, I'd say.

But we were kind of taken by surprise at how much momentum there was around this concept. And so we decided to learn from that and embrace it. To be like, OK, maybe we should just embrace that as a core primitive of the OpenAI platform. And that's kind of what led to the Agents SDK. And I think now, as Nikud mentioned, it's like adding all of these new capabilities to it, like leveraging the handoffs that we had, but tracing also.

And I think what's very compelling for developers is like instead of having one agent to rule them all and you stuff like a lot of tool calls in there that can be hard to monitor. Now you have the tools you need to kind of like separate the logic, right? And you can have a triage agent that based on an intent goes to different kind of agents.

And then on the OpenAI dashboard, we're releasing a lot of new user interface logs as well. So you can see all of the tracing UIs. Essentially, you'll be able to troubleshoot like what exactly happened in that workflow when the triage agent did a handoff to a secondary agent and the third and see the tool calls, et cetera. So we think that the agents SDK combined with the tracing UIs will definitely help users and developers build better agentic workflows.

And just before we wrap, are you thinking of connecting this with also the RFT API? Because I know you already have, you kind of store my text completions and then I can do fine tuning of that. Is that going to be similar for agents where you're storing kind of like my traces and then help me improve the agents? Yeah, absolutely. Like you got to tie the traces to the evals product so that you can generate good evals. Once you have good evals and graders and

and tasks, you can use that to do reinforcement fine tuning. And lots of details to be figured out over here, but that's the vision. And I think we're going to go after it pretty hard and hope we can make this whole workflow a lot easier for developers.

Awesome. Thank you so much for the time. I'm sure you'll be busy on Twitter tomorrow without the developer feedback. Yeah, thank you so much for having us. And as always, we can't wait to see what developers will build with these tools and how we can learn as quickly as we can from them to make them even better over time. Awesome. Thank you, guys. Thank you. Thank you both.

⚡️The new OpenAI Agents Platform 25:38 Share

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

Deep Dive

Shownotes Transcript

⚡️The new OpenAI Agents Platform