We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode ChatGPT Gets Powerful Update: Smarter Problem Solving

ChatGPT Gets Powerful Update: Smarter Problem Solving

2025/5/28
logo of podcast AI Education

AI Education

AI Deep Dive AI Chapters Transcript
People
J
Jaeden Schafer
Topics
Jaeden Schafer: OpenAI发布了新的GPT 4.1模型,但最初并未在ChatGPT平台上推出,这引发了一些争议。我将分析该模型的功能、发布延迟的原因以及相关的安全问题。我认为GPT 4.1专门为数学和编码设计,OpenAI在该领域面临着激烈的竞争,谷歌的Gemini也在积极发展GitHub集成。编码领域非常有价值,以至于OpenAI即将以30亿美元收购顶级AI编码公司Windsurf。我认为收购Windsurf的时间安排可能促使他们将新的GPT 4.1模型发布在ChatGPT上。不过,OpenAI推出在特定任务上不如旧模型的新模型,导致用户需要混合搭配使用不同模型,这在营销上并不理想。有人认为OpenAI延迟发布GPT 4.1是因为安全问题,并批评其降低了AI模型透明度标准,但OpenAI回应称,GPT 4.1不是前沿模型,因此不需要像其他模型一样进行严格的安全报告。我个人不主张对代码生成模型进行过多的安全审查,更希望尽快使用该模型。总的来说,GPT 4.1的发布时机正值竞争激烈之际,OpenAI正试图完成对Windsurf的30亿美元收购,同时Gemini与GitHub的结合以及Claude code的普及都加剧了竞争。编码工具领域的竞争激烈,最终谁能胜出令人期待。

Deep Dive

Chapters
This chapter introduces the new GPT-4.1 model in ChatGPT, highlighting its recent release and the features of the AI Box platform, a multi-model AI access platform.
  • GPT-4.1 is a new model released in April but added to ChatGPT recently.
  • AI Box Playground provides access to various AI models for a monthly fee.
  • AI Box offers features like media storage, model comparison, and prompt testing.

Shownotes Transcript

Translations:
中文

OpenAI has added a new model to chat GPT and that is the model GPT 4.1. Now this is a brand new model that they just, you know, came out with or dropped. This is actually released back in April, but they never ever added it to the chat GPT platform. So today on the podcast, I'm gonna be breaking down what the new model does, why they didn't release it before. Some of the controversy around this release, some people speculate it was due to safety reasons and other things. It's officially live. We're going to be getting into all of that.

Before we do, I wanted to say that my startup AI Box has officially launched our very first product, which is an AI Box Playground. This is essentially a place where you can get access to all of the top AI models on one platform, and you're able to test all of them for $20 a month. So you don't need subscriptions to the 20 to 40 different top AI companies anymore. You can pay $20 a month, get access to everything, and use them on a per need basis. You essentially get tokens every month, and you can use those towards whatever you want.

We have access to audio models like 11 Labs, access to some of the top image models like OpenAI, of course, and a lot of other ones that you may not have used or heard of that actually are really impressive. And we also have something called the media storage. This is a place where every file that you ever create gets stored there. You can go back and easily find what conversations you had all of your time.

you had and what prompts you used in order to generate different things like images or audio, you can click a little button in the media storage on whatever your, um, you know, media is that you created, go and view the actual chat that used to generate it. There's a ton of cool features in here, like comparing back and forth between different AI models, getting

this, you know, multiple AI models to run the same prompt, for example, and comparing things side by side. So a lot of really cool features we've added in here. If you have more ideas, we are rapidly developing and adding new features. So we would love to hear from you on what you'd want to see. You could check it out in the description. It is AI box dot AI. All right, let's get into this new model from open AI.

The thing that I found really interesting here, and I'll just break the news, essentially this GPT 4.1 is specifically designed for math and for coding. So this is something that OpenAI seems like they are really, I don't want to say struggling with, but it's essentially the one area that it's running away from them. They're getting their main competitors. Claude is kind of smoking them with Claude Code. Everyone's using that. Even Google Gemini is making some big grounds they just recently launched.

announced that the new Google Gemini chatbot can now integrate and more easily analyze GitHub projects. So it's really building directly into GitHub, which is owned by Microsoft, who heavily invested in OpenAI, but yet Gemini is making some big moves in that space. So this code area is really, really valuable. A lot of companies are looking at it. So much so that OpenAI is actually about to acquire for $3 billion one of the top

AI coding companies, which is called Windsurf. It's pretty much the most popular one. Cursor is probably the second most popular and has about a $1 billion valuation based off its last round of funding. But Windsurf for $3 billion is looking to be acquired by OpenAI. And they're pulling a bunch of moves here. Now,

I think that kind of the acquisition with windsurf and the timeline for that is probably what's pushed them to make this new GPT 4.1 model live on chat GPT. So if you go over to chat GPT, you can go and hit the dropdown. What's interesting is it doesn't actually show in kind of their

priority AI models, you got to click on their more model section. That's where you're going to see GPT 4.5, which is a quote unquote research preview. And then you see GPT 4.1 and 4.1 mini. Now, what a lot of people have asked is like, okay, why the heck would I be using GPT 4.1 when I could just use GPT 4.5? Isn't 4.5 better than 4.1 or 4.1 mini? And it's actually interesting. OpenAI specifically said that for coding tasks, GPT 4.1 is going to be better than

than GPT 4.5. This is getting to like kind of a weird place where we're coming out with newer models, quote unquote, or allegedly newer models, more advanced models that are worse at certain tasks than older models. So it's like they got, you know, this old model could do X, Y, and Z really well, but the new model can do mostly everything better, but not this specific thing. And so it kind of gets to a weird place for open AI where you're mixing and matching what

model you have. That's why they have their dropdown with, you know, four different models to choose from. And then in their more model section, you've got three more. So really, if you're on chat, you have seven options to choose from of what you're going to talk to. I've talked at length about how this is a terrible marketing thing and how other models, other companies are doing a great job. XAI, for example, with Grok just has, you know, you can use the old version of Grok or you can use Grok 3. Now they have new features within Grok

uh three which is like you know i do like a deep research kind of a dive or like they have like a think button where it gives it more compute and it really thinks and i found great results with that that is more what i would like to see from open ai even if it is completely switching the model i just want an easy ui now they've created some ui inside of the search box but i think it's a little ridiculous they have a search button for the internet which is fine they have a deep research

which is if you want like a really extensive document. And I understand that, that one, I think keep it. Then they have a create image. Now, in my opinion, if you're coming here to create an image and you know you can create images,

You should just say what you want it to create an image of and it should just know and automatically generate. And it actually does that, but maybe they're just trying to prompt new people to tell them that they can create an image and they can just type it here. So maybe it's kind of a marketing thing. But in any case, it's not like incredibly useful. I mean, it's redundant. You could just talk to the model and tell it to create an image and you don't need a button that specifically does that. But in any case, if you click the create image button, it just adds the text automatically.

into the chat that says create image. And now you're good to go. And actually, maybe that's not a bad idea to tell people that you can create image. I might actually steal that for AI box. So, uh, you know, for all my flaming donut, don't get mad at me. If you go over to AI box and see me add that to my, um, to my search bar. All right. So here's, what's actually embedded and what's interesting about this new GPT 4.1 model. So this

This came out back in April, but it was only released for developers on the API platform, meaning average people on chatgpt.com could not use it. Only if you had a developer account with an API access token to OpenAI, embedding it into kind of like your software or project that you're building. This is only for developers. And you could say like, hey, that's fine. It's a code tool. Only developers need code tools. Developers know how to get access. But in reality, I think a lot of people, even developers are using...

might be using directly like clot or other platforms and they don't maybe want to have to go through that headache in order to use it on kind of like a special portal that they might make. So it's just getting embedded in software. Now, why did they do this? Why didn't they just roll it out on chat GPT.com like everyone else? That is where the controversy comes in. So some people are saying,

This is due to safety concerns and that they didn't release a proper safety report. So they essentially got a bunch of criticism for this and they claimed that opening up like so a bunch of researchers that were talking about this claim that opening I was lowering their standards around transparency in their AI model opening. I argued that despite it.

being faster, GPT 4.1 being faster compared to GPT 4.0. The model was not a frontier model and because of that it didn't need the same safety reporting as some of the more capable models. So OpenAI's response was like, "Yeah, we didn't release the safety report like you're kind of criticizing us for, but that's just because this isn't really our frontier model. It's kind of just like our side model. We're just letting developers use. It doesn't really need as much vetting." Now,

If I'm being 100% honest, I'm not actually advocating for more safety reviews on these models for code generation model. I'm not super concerned about that. That's just not my wheelhouse. I'd rather get the model sooner than focus a ton on safety. So that's just me personally.

But at the end of the day, it's kind of interesting that that was opening eyes response. So what exactly can this do? According to Sheki Amdo, this new model is gonna help software developers who are using ChatGPT to write or debug code. Those are kind of the two specific

things. And it is, essentially is better at instruction following compared to GPT-4.0. And it's also faster than the O-series reasoning model. So it's not necessarily a reasoning model. It's much faster. It's better at code. It's kind of interesting because some people like the reasoning models for code and evidently they've kind of moved away from it in this particular update. So that is, I think, definitely an interesting kind of fact. So

This is what they specifically said. They said GPT 4.1 doesn't introduce new modalities or ways of interacting with the model and doesn't surpass O3 in intelligence. This means that the safety considerations, while substantial, are different than frontier models. That's kind of their head of safety explaining why they didn't do a lot of safety testing on this.

this. As I mentioned, this is very interesting timing for this model to come out because we have a ton of competition. We have, of course, OpenAI now trying to get their $3 billion acquisition of Windsurf pushed through. But we also have

a ton of other players putting out coding tools. We have cursor that allegedly I think opening, I might've made a bid to acquire them as well. It didn't go through. So then it went with windsurf. There's kind of rumors, but then we also have of course, Gemini connecting more deeply with GitHub. We have Claude code, which is

run away with most developers and is increasing in popularity. And so I think there's just a ton of competition and it's going to be fascinating to see who is the ultimate winner in this space. All right. Thank you so much for tuning in. Make sure to go check out the AI box platform. If you are interested in

getting a platform that can let you chat with all of the text, image, and audio models all inside one chat, switch between all the models in the same chat, use different models that are good at different things. Like we've been talking about in this podcast today, some models are great at code. Even some older open AI models are great at code and some of them are worse at code. In the AI box platform, you have the ability to go and start a new chat. And we specifically have

GPT 4.1 that we've been talking about in this episode and 4.1 mini and 4.1 nano. We have all of these here. You could test them all out if you're interested in coding, or you can use all of the other Chatship team models and Anthropic and DeepSeek and Google and Meta and Microsoft and Mistral and NVIDIA, all of them. All right. So go check it out. AIbox.ai. Thanks so much for tuning into the podcast today. I will catch you in the next episode.