We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

OpenAI Boosts Tech Use with Smarter Problem Solving

2025/6/13

Artificial Intelligence: AI News, ChatGPT, OpenAI, LLM, Anthropic, Claude, Google AI

AI Deep Dive AI Chapters Transcript

People

主

主持人

专注于电动车和能源领域的播客主持人和内容创作者。

Topics

主持人：OpenAI发布了新的GPT 4.1模型，但最初并未在ChatGPT平台上发布，这引发了一些争议。我将分析这个新模型的功能，以及OpenAI为何选择延迟发布。一些人猜测延迟发布是出于安全考虑，但OpenAI对此有不同的解释。我将深入探讨这些问题，并分享我对GPT 4.1的看法。这个模型专门为数学和编码设计，OpenAI似乎在这个领域面临着激烈的竞争。谷歌的Gemini和Claude等竞争对手都在不断进步，这促使OpenAI采取行动。OpenAI即将以30亿美元收购AI编码公司Windsurf，这表明了OpenAI对编码领域的重视。我认为收购Windsurf可能是OpenAI发布GPT 4.1的原因之一。用户可以在ChatGPT的“更多模型”部分找到GPT 4.1，但OpenAI的模型策略有些混乱，新模型在某些任务上不如旧模型。我更希望OpenAI能提供更简洁的用户界面，即使这意味着切换模型。GPT 4.1最初只对API平台的开发者发布，普通ChatGPT用户无法使用，这可能是出于安全考虑。但我个人不主张对代码生成模型进行过多的安全审查，更希望尽快使用该模型。总的来说，GPT 4.1发布的时间点很有趣，因为市场上有很多竞争对手，编码工具领域的竞争激烈，最终谁能胜出令人期待。主持人：作为AI Box的创始人，我很高兴向大家介绍AI Box Playground。这是一个集成了各种顶级AI模型的平台，用户只需支付每月20美元的订阅费，即可在一个平台上访问和测试这些模型。AI Box Playground不仅提供多种AI模型，还提供媒体存储功能，方便用户查找历史记录。我们正在快速开发新功能，欢迎用户提供建议。AI Box平台提供多种文本、图像和音频模型，方便用户在同一聊天中切换，并使用擅长不同任务的模型。AI Box平台提供GPT 4.1等多种模型，欢迎用户体验。

Deep Dive

Shownotes Transcript

Translations:

中文

OpenAI has added a new model to chat GPT and that is the model GPT 4.1. Now this is a brand new model that they just, you know, came out with or dropped. This is actually released back in April, but they never ever added it to the chat GPT platform. So today on the podcast, I'm gonna be breaking down what the new model does, why they didn't release it before. Some of the controversy around this release, some people speculate it was due to safety reasons and other things. It's officially live. We're going to be getting into all of that.

Before we do, I wanted to say that my startup AI Box has officially launched our very first product, which is an AI Box Playground. This is essentially a place where you can get access to all of the top AI models on one platform, and you're able to test all of them for $20 a month. So you don't need subscriptions to the 20 to 40 different top AI companies anymore. You can pay $20 a month, get access to everything, and use them on a per need basis. You essentially get tokens every month, and you can use those towards whatever you want.

We have access to audio models like 11 Labs, access to some of the top image models like OpenAI, of course, and a lot of other ones that you may not have used or heard of that actually are really impressive. And we also have something called the media storage. This is a place where every file that you ever create gets stored there. You can go back and easily find what conversations you had all of your time.

you had and what prompts you used in order to generate different things like images or audio, you can click a little button in the media storage on whatever your, um, you know, media is that you created, go and view the actual chat that used to generate it. There's a ton of cool features in here, like comparing back and forth between different AI models, getting

this, you know, multiple AI models to run the same prompt, for example, and comparing things side by side. So a lot of really cool features we've added in here. If you have more ideas, we are rapidly developing and adding new features. So we would love to hear from you on what you'd want to see. You could check it out in the description. It is AI box dot AI. All right, let's get into this new model from open AI.

The thing that I found really interesting here, and I'll just break the news, essentially this GPT 4.1 is specifically designed for math and for coding. So this is something that OpenAI seems like they are really, I don't want to say struggling with, but it's essentially the one area that it's running away from them. They're getting their main competitors. Claude is kind of smoking them with Claude Code. Everyone's using that. Even Google Gemini is making some big grounds they just recently launched.

announced that the new Google Gemini chatbot can now integrate and more easily analyze GitHub projects. So it's really building directly into GitHub, which is owned by Microsoft, who heavily invested in OpenAI, but yet Gemini is making some big moves in that space. So this code area is really, really valuable. A lot of companies are looking at it. So much so that OpenAI is actually about to acquire for $3 billion one of the top

AI coding companies, which is called Windsurf. It's pretty much the most popular one. Cursor is probably the second most popular and has about a $1 billion valuation based off its last round of funding. But Windsurf for $3 billion is looking to be acquired by OpenAI. And they're pulling a bunch of moves here. Now,

I think that kind of the acquisition with windsurf and the timeline for that is probably what's pushed them to make this new GPT 4.1 model live on chat GPT. So if you go over to chat GPT, you can go and hit the dropdown. What's interesting is it doesn't actually show in kind of their

priority AI models, you got to click on their more model section. That's where you're going to see GPT 4.5, which is a quote unquote research preview. And then you see GPT 4.1 and 4.1 mini. Now, what a lot of people have asked is like, okay, why the heck would I be using GPT 4.1 when I could just use GPT 4.5? Isn't 4.5 better than 4.1 or 4.1 mini? And it's actually interesting. OpenAI specifically said that for coding tasks, GPT 4.1 is going to be better than

than GPT 4.5. This is getting to like kind of a weird place where we're coming out with newer models, quote unquote, or allegedly newer models, more advanced models that are worse at certain tasks than older models. So it's like they got, you know, this old model could do X, Y, and Z really well, but the new model can do mostly everything better, but not this specific thing. And so it kind of gets to a weird place for open AI where you're mixing and matching what

model you have. That's why they have their dropdown with, you know, four different models to choose from. And then in their more model section, you've got three more. So really, if you're on chat, you have seven options to choose from of what you're going to talk to. I've talked at length about how this is a terrible marketing thing and how other models, other companies are doing a great job. XAI, for example, with Grok just has, you know, you can use the old version of Grok or you can use Grok 3. Now they have new features within Grok

uh three which is like you know i do like a deep research kind of a dive or like they have like a think button where it gives it more compute and it really thinks and i found great results with that that is more what i would like to see from open ai even if it is completely switching the model i just want an easy ui now they've created some ui inside of the search box but i think it's a little ridiculous they have a search button for the internet which is fine they have a deep research

which is if you want like a really extensive document. And I understand that, that one, I think keep it. Then they have a create image. Now, in my opinion, if you're coming here to create an image and you know you can create images,

You should just say what you want it to create an image of and it should just know and automatically generate. And it actually does that, but maybe they're just trying to prompt new people to tell them that they can create an image and they can just type it here. So maybe it's kind of a marketing thing. But in any case, it's not like incredibly useful. I mean, it's redundant. You could just talk to the model and tell it to create an image and you don't need a button that specifically does that. But in any case, if you click the create image button, it just adds the text automatically.

into the chat that says create image. And now you're good to go. And actually, maybe that's not a bad idea to tell people that you can create image. I might actually steal that for AI box. So, uh, you know, for all my flaming donut, don't get mad at me. If you go over to AI box and see me add that to my, um, to my search bar. All right. So here's, what's actually embedded and what's interesting about this new GPT 4.1 model. So this

This came out back in April, but it was only released for developers on the API platform, meaning average people on chatgpt.com could not use it. Only if you had a developer account with an API access token to OpenAI, embedding it into kind of like your software or project that you're building. This is only for developers. And you could say like, hey, that's fine. It's a code tool. Only developers need code tools. Developers know how to get access. But in reality, I think a lot of people, even developers are using...

might be using directly like clot or other platforms and they don't maybe want to have to go through that headache in order to use it on kind of like a special portal that they might make. So it's just getting embedded in software. Now, why did they do this? Why didn't they just roll it out on chat GPT.com like everyone else? That is where the controversy comes in. So some people are saying,

This is due to safety concerns and that they didn't release a proper safety report. So they essentially got a bunch of criticism for this and they claimed that opening up like so a bunch of researchers that were talking about this claim that opening I was lowering their standards around transparency in their AI model opening. I argued that despite it.

being faster, GPT 4.1 being faster compared to GPT 4.0. The model was not a frontier model and because of that it didn't need the same safety reporting as some of the more capable models. So OpenAI's response was like, "Yeah, we didn't release the safety report like you're kind of criticizing us for, but that's just because this isn't really our frontier model. It's kind of just like our side model. We're just letting developers use. It doesn't really need as much vetting." Now,

If I'm being 100% honest, I'm not actually advocating for more safety reviews on these models for code generation model. I'm not super concerned about that. That's just not my wheelhouse. I'd rather get the model sooner than focus a ton on safety. So that's just me personally.

But at the end of the day, it's kind of interesting that that was opening eyes response. So what exactly can this do? According to Sheki Amdo, this new model is gonna help software developers who are using ChatGPT to write or debug code. Those are kind of the two specific

things. And it is, essentially is better at instruction following compared to GPT-4.0. And it's also faster than the O-series reasoning model. So it's not necessarily a reasoning model. It's much faster. It's better at code. It's kind of interesting because some people like the reasoning models for code and evidently they've kind of moved away from it in this particular update. So that is, I think, definitely an interesting kind of fact. So

This is what they specifically said. They said GPT 4.1 doesn't introduce new modalities or ways of interacting with the model and doesn't surpass O3 in intelligence. This means that the safety considerations, while substantial, are different than frontier models. That's kind of their head of safety explaining why they didn't do a lot of safety testing on this.

this. As I mentioned, this is very interesting timing for this model to come out because we have a ton of competition. We have, of course, OpenAI now trying to get their $3 billion acquisition of Windsurf pushed through. But we also have

a ton of other players putting out coding tools. We have cursor that allegedly I think opening, I might've made a bid to acquire them as well. It didn't go through. So then it went with windsurf. There's kind of rumors, but then we also have of course, Gemini connecting more deeply with GitHub. We have Claude code, which is

run away with most developers and is increasing in popularity. And so I think there's just a ton of competition and it's going to be fascinating to see who is the ultimate winner in this space. All right. Thank you so much for tuning in. Make sure to go check out the AI box platform. If you are interested in

getting a platform that can let you chat with all of the text, image, and audio models all inside one chat, switch between all the models in the same chat, use different models that are good at different things. Like we've been talking about in this podcast today, some models are great at code. Even some older open AI models are great at code and some of them are worse at code. In the AI box platform, you have the ability to go and start a new chat. And we specifically have

GPT 4.1 that we've been talking about in this episode and 4.1 mini and 4.1 nano. We have all of these here. You could test them all out if you're interested in coding, or you can use all of the other Chatship team models and Anthropic and DeepSeek and Google and Meta and Microsoft and Mistral and NVIDIA, all of them. All right. So go check it out. AIbox.ai. Thanks so much for tuning into the podcast today. I will catch you in the next episode.

OpenAI Boosts Tech Use with Smarter Problem Solving 10:16 Share

Artificial Intelligence: AI News, ChatGPT, OpenAI, LLM, Anthropic, Claude, Google AI

Deep Dive

Shownotes Transcript

OpenAI Boosts Tech Use with Smarter Problem Solving