We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode ChatGPT Gets Powerful Update: CodeMath Capabilities
People
无发言人
Topics
我介绍了OpenAI新发布的GPT 4.1模型,并分析了其发布背后存在的争议。这个模型实际上在四月份就已经发布,但一直没有添加到ChatGPT平台。我将深入探讨这个新模型的功能、延迟发布的原因,以及围绕此次发布的一些争议,包括一些人猜测是出于安全考虑。现在它终于正式上线了。 我的初创公司AI Box推出了一个AI Playground,让用户可以通过一个平台访问所有顶级的AI模型,每月只需20美元。用户不再需要订阅20到40个不同的AI公司,只需支付20美元即可访问所有模型,并根据需要使用。每个月都会获得tokens,可以用于任何想使用的模型。 AI Box还提供媒体存储功能,用户创建的每个文件都会存储在那里。可以轻松地找到之前的对话和提示,以便重新生成图像或音频。AI Box还有很多很酷的功能,比如比较不同的AI模型,或者让多个AI模型运行相同的提示,并进行并排比较。 GPT 4.1专门为数学和编码设计,OpenAI在这个领域面临激烈的竞争。谷歌的Gemini通过与GitHub的集成,在这个领域取得了显著进展。OpenAI正计划以30亿美元收购顶级AI编码公司Windsurf。我认为收购Windsurf以及GPT 4.1模型的发布可能存在关联。 GPT 4.1在ChatGPT中不易找到,OpenAI声称其在编码任务上优于GPT 4.5。这导致了一个奇怪的局面,即较新的模型在某些任务上不如旧模型。ChatGPT提供了多种模型选择,但这在营销上可能不是最佳策略。我更希望OpenAI能提供更简洁的用户界面,类似Grok的深度研究和思考按钮。 GPT 4.1最初仅向API开发者发布,普通ChatGPT用户无法使用。GPT 4.1延迟向公众发布是由于安全问题和缺乏透明度。OpenAI认为GPT 4.1不是前沿模型,因此不需要像其他模型一样进行严格的安全报告。我个人不主张对代码生成模型进行过多的安全审查,更希望尽快使用该模型。 GPT 4.1可以帮助软件开发者编写或调试代码,并且在指令遵循方面优于GPT-4.0。GPT 4.1速度更快,更擅长代码,但可能不是一个推理模型。GPT 4.1的发布时机正值代码工具竞争激烈之际。OpenAI面临来自Cursor、Gemini和Claude等竞争对手的压力。代码工具领域的竞争将非常激烈,最终赢家拭目以待。

Deep Dive

Chapters
This chapter introduces the new GPT-4.1 model from OpenAI, its release in April, and its initial absence from the ChatGPT platform. It also highlights the launch of AI Box, a platform offering access to various AI models.
  • GPT-4.1 is a new model released in April but not initially added to ChatGPT.
  • AI Box is a new platform providing access to multiple AI models for a monthly fee.
  • GPT-4.1 is specifically designed for math and coding tasks.

Shownotes Transcript

Translations:
中文

OpenAI has added a new model to chat GPT and that is the model GPT 4.1. Now this is a brand new model that they just, you know, came out with or dropped. This is actually released back in April, but they never ever added it to the chat GPT platform. So today on the podcast, I'm gonna be breaking down what the new model does, why they didn't release it before. Some of the controversy around this release, some people speculate it was due to safety reasons and other things. It's officially live. We're going to be getting into all of that.

Before we do, I wanted to say that my startup AI Box has officially launched our very first product, which is an AI Box Playground. This is essentially a place where you can get access to all of the top AI models on one platform, and you're able to test all of them for $20 a month. So you don't need subscriptions to the 20 to 40 different top AI companies anymore. You can pay $20 a month, get access to everything, and use them on a per need basis. You essentially get tokens every month, and you can use those towards whatever you want.

We have access to audio models like 11 Labs, access to some of the top image models like OpenAI, of course, and a lot of other ones that you may not have used or heard of that actually are really impressive. And we also have something called the media storage. This is a place where every file that you ever create gets stored there. You can go back and easily find what conversations you had all of your time.

you had and what prompts you used in order to generate different things like images or audio, you can click a little button in the media storage on whatever your, um, you know, media is that you created, go and view the actual chat that used to generate it. There's a ton of cool features in here, like comparing back and forth between different AI models, getting

this, you know, multiple AI models to run the same prompt, for example, and comparing things side by side. So a lot of really cool features we've added in here. If you have more ideas, we are rapidly developing and adding new features. So we would love to hear from you on what you'd want to see. You could check it out in the description. It is AI box dot AI. All right, let's get into this new model from open AI.

The thing that I found really interesting here, and I'll just break the news, essentially this GPT 4.1 is specifically designed for math and for coding. So this is something that OpenAI seems like they are really, I don't want to say struggling with, but it's essentially the one area that it's running away from them. They're getting their main competitors. Claude is kind of smoking them with Claude Code. Everyone's using that. Even Google Gemini is making some big grounds they just recently launched.

announced that the new Google Gemini chatbot can now integrate and more easily analyze GitHub projects. So it's really building directly into GitHub, which is owned by Microsoft, who heavily invested in OpenAI, but yet Gemini is making some big moves in that space. So this code area is really, really valuable. A lot of companies are looking at it. So much so that OpenAI is actually about to acquire for $3 billion one of the top

AI coding companies, which is called Windsurf. It's pretty much the most popular one. Cursor is probably the second most popular and has about a $1 billion valuation based off its last round of funding. But Windsurf for $3 billion is looking to be acquired by OpenAI. And they're pulling a bunch of moves here. Now,

I think that kind of the acquisition with windsurf and the timeline for that is probably what's pushed them to make this new GPT 4.1 model live on chat GPT. So if you go over to chat GPT, you can go and hit the dropdown. What's interesting is it doesn't actually show in kind of their

priority AI models, you got to click on their more model section. That's where you're going to see GPT 4.5, which is a quote unquote research preview. And then you see GPT 4.1 and 4.1 mini. Now, what a lot of people have asked is like, okay, why the heck would I be using GPT 4.1 when I could just use GPT 4.5? Isn't 4.5 better than 4.1 or 4.1 mini? And it's actually interesting. OpenAI specifically said that for coding tasks, GPT 4.1 is going to be better than

than GPT 4.5. This is getting to like kind of a weird place where we're coming out with newer models, quote unquote, or allegedly newer models, more advanced models that are worse at certain tasks than older models. So it's like they got, you know, this old model could do X, Y, and Z really well, but the new model can do mostly everything better, but not this specific thing. And so it kind of gets to a weird place for open AI where you're mixing and matching what

model you have. That's why they have their dropdown with, you know, four different models to choose from. And then in their more model section, you've got three more. So really, if you're on chat, you have seven options to choose from of what you're going to talk to. I've talked at length about how this is a terrible marketing thing and how other models, other companies are doing a great job. XAI, for example, with Grok just has, you know, you can use the old version of Grok or you can use Grok 3. Now they have new features within Grok

uh three which is like you know i do like a deep research kind of a dive or like they have like a think button where it gives it more compute and it really thinks and i found great results with that that is more what i would like to see from open ai even if it is completely switching the model i just want an easy ui now they've created some ui inside of the search box but i think it's a little ridiculous they have a search button for the internet which is fine they have a deep research

which is if you want like a really extensive document. And I understand that, that one, I think keep it. Then they have a create image. Now, in my opinion, if you're coming here to create an image and you know you can create images,

You should just say what you want it to create an image of and it should just know and automatically generate. And it actually does that, but maybe they're just trying to prompt new people to tell them that they can create an image and they can just type it here. So maybe it's kind of a marketing thing. But in any case, it's not like incredibly useful. I mean, it's redundant. You could just talk to the model and tell it to create an image and you don't need a button that specifically does that. But in any case, if you click the create image button, it just adds the text automatically.

into the chat that says create image. And now you're good to go. And actually, maybe that's not a bad idea to tell people that you can create image. I might actually steal that for AI box. So, uh, you know, for all my flaming donut, don't get mad at me. If you go over to AI box and see me add that to my, um, to my search bar. All right. So here's, what's actually embedded and what's interesting about this new GPT 4.1 model. So this

This came out back in April, but it was only released for developers on the API platform, meaning average people on chatgpt.com could not use it. Only if you had a developer account with an API access token to OpenAI, embedding it into kind of like your software or project that you're building. This is only for developers. And you could say like, hey, that's fine. It's a code tool. Only developers need code tools. Developers know how to get access. But in reality, I think a lot of people, even developers are using...

might be using directly like clot or other platforms and they don't maybe want to have to go through that headache in order to use it on kind of like a special portal that they might make. So it's just getting embedded in software. Now, why did they do this? Why didn't they just roll it out on chat GPT.com like everyone else? That is where the controversy comes in. So some people are saying,

This is due to safety concerns and that they didn't release a proper safety report. So they essentially got a bunch of criticism for this and they claimed that opening up like so a bunch of researchers that were talking about this claim that opening I was lowering their standards around transparency in their AI model opening. I argued that despite it.

being faster, GPT 4.1 being faster compared to GPT 4.0. The model was not a frontier model and because of that it didn't need the same safety reporting as some of the more capable models. So OpenAI's response was like, "Yeah, we didn't release the safety report like you're kind of criticizing us for, but that's just because this isn't really our frontier model. It's kind of just like our side model. We're just letting developers use. It doesn't really need as much vetting." Now,

If I'm being 100% honest, I'm not actually advocating for more safety reviews on these models for code generation model. I'm not super concerned about that. That's just not my wheelhouse. I'd rather get the model sooner than focus a ton on safety. So that's just me personally.

But at the end of the day, it's kind of interesting that that was opening eyes response. So what exactly can this do? According to Sheki Amdo, this new model is gonna help software developers who are using ChatGPT to write or debug code. Those are kind of the two specific

things. And it is, essentially is better at instruction following compared to GPT-4.0. And it's also faster than the O-series reasoning model. So it's not necessarily a reasoning model. It's much faster. It's better at code. It's kind of interesting because some people like the reasoning models for code and evidently they've kind of moved away from it in this particular update. So that is, I think, definitely an interesting kind of fact. So

This is what they specifically said. They said GPT 4.1 doesn't introduce new modalities or ways of interacting with the model and doesn't surpass O3 in intelligence. This means that the safety considerations, while substantial, are different than frontier models. That's kind of their head of safety explaining why they didn't do a lot of safety testing on this.

this. As I mentioned, this is very interesting timing for this model to come out because we have a ton of competition. We have, of course, OpenAI now trying to get their $3 billion acquisition of Windsurf pushed through. But we also have

a ton of other players putting out coding tools. We have cursor that allegedly I think opening, I might've made a bid to acquire them as well. It didn't go through. So then it went with windsurf. There's kind of rumors, but then we also have of course, Gemini connecting more deeply with GitHub. We have Claude code, which is

run away with most developers and is increasing in popularity. And so I think there's just a ton of competition and it's going to be fascinating to see who is the ultimate winner in this space. All right. Thank you so much for tuning in. Make sure to go check out the AI box platform. If you are interested in

getting a platform that can let you chat with all of the text, image, and audio models all inside one chat, switch between all the models in the same chat, use different models that are good at different things. Like we've been talking about in this podcast today, some models are great at code. Even some older open AI models are great at code and some of them are worse at code. In the AI box platform, you have the ability to go and start a new chat. And we specifically have

GPT 4.1 that we've been talking about in this episode and 4.1 mini and 4.1 nano. We have all of these here. You could test them all out if you're interested in coding, or you can use all of the other Chatship team models and Anthropic and DeepSeek and Google and Meta and Microsoft and Mistral and NVIDIA, all of them. All right. So go check it out. AIbox.ai. Thanks so much for tuning into the podcast today. I will catch you in the next episode.