We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

What to Use Different AI Models For

2025/5/14

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive Transcript

People

Unknown

通过Ramsey Network的播客节目，提供实用财务建议和生活指导。

Topics

我发现人们在使用大型语言模型时面临的一大挑战是选择哪个模型。OpenAI提供的模型选择指导不足以帮助人们理解不同用例适用于哪些模型。人们在工作流程中越来越依赖这些工具，但我们仍然不清楚哪些用例适合哪些模型。因此，我将讨论OpenAI如何构建模型，并根据不同业务用户的具体类别（从个体经营者到企业）查看每个模型的示例用例，然后进行总结。今天只关注OpenAI的不同模型，不讨论Claude、Gemini或Grok。我将从个人用户的角度来看待这个问题，而不是从为企业构建软件的开发人员的角度。

Deep Dive

Shownotes Transcript

Today on the AI Daily Brief, when to use different AI models and what to use them for. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Thanks to today's sponsors, Blitzy.com for T-Slabs and Super Intelligent. And to get an ad-free version of the show, go to patreon.com slash AI Daily Brief. ♪

Welcome back to the AI Daily Brief. Today was a slightly slow news day. Doesn't happen very often in this space. And so I decided to use it for a show that I've been thinking about for a little while. One of the big challenges for people with LLMs is which models to select.

If you go into ChatGPT, for example, up in the top left hand corner, there's a model selector where you can choose GPT 4.0, GPT 4.5, 0.3, 0.4 mini, 0.4 mini high, or even 0.1 pro mode. Now these terms obviously mean nothing to most people. And as much as OpenAI gives little tiny guidance here, for example, it says GPT 4.0, great for most tasks, GPT 4.5, good for writing and exploring ideas. These aren't really sufficient to help people understand what different use cases are suited to different types of models.

It doesn't take much searching to find examples of people's confusion here. For example, on X, Eduardo Borges writes, I follow AI improvements closely. I'm familiar with most models like Claude, Mistral, Lama, GPT, Gemini, Grok, etc. And I currently have no idea which models to use on OpenAI. It feels like they're pranking us. I tried asking ChatGPT for an answer and it got even more complicated.

Now, I think most people default to something really basic like Shilmon out here who writes, I use 4090% of the time and 0310% of the time. Do people use the other ones?

And while these questions may seem small, the reality is, use of these tools is becoming completely endemic in a professional setting. ChatGPT recently peaked at over 800 million weekly active users. In a recent KPMG Pulse survey, we saw the number of people who are using tools like ChatGPT daily jump from 22 to 58%. Point is, these tools are becoming a key part of our workstream, and yet we still don't know exactly which use cases belong with what models.

And that's why I was very excited to see about a week ago, OpenAI published a post on their help center about exactly this. It was specifically aimed at enterprise and was called When to Use Each Model.

So what we're going to do today is go through how they frame it, look at some sample use cases for each of those models, both generally speaking as well as organized by specific categories of business users from solopreneurs to SMEs to mid-market companies to enterprises. And then we're going to do a little bit of a summary and my sort of crib notes for how much you really have to care about all these different things. Couple quick caveats before we dig into this model-by-model assessment.

First of all, I'm basing this off of OpenAI's recent post, and so I'm only focusing on the different OpenAI models. This is not an argument that you shouldn't care about Claude or Gemini or Grok, and to the extent that people find it valuable, I'll happily do an episode about when I use those different models for different purposes, but for today, I'm just going to focus on OpenAI's suite.

The second thing to note is that I'm coming at this from an individual user perspective, rather than from the perspective of developers who are building software for an enterprise. So the considerations are likely going to be different as developers think about how to wire together different systems to optimize for both cost and performance. We also aren't going to get that much into what models to use for different types of agentic workflows. Again, just for the purposes of this show, we're really going to be focused on that individual employee kind of use case.

So specifically how you might use these things as an individual. Now let's start with the daily workhorse, GPT-4.0. And daily use is exactly what OpenAI describes as this being good for. They write that GPT-4.0 excels at everyday tasks.

brainstorming, summarizing, emails, creative content. Fully Multimodal supports almost all capabilities: GPT's, data analysis, search, image generation, Canvas, advanced voice, and inputs, documents, images, CSV files, audio, and video. Basically, this is the default model. It's the one that you're going to use day in and day out for your most common tasks.

The example prompts that OpenAI provides for GPT-4.0 include summarizing meeting notes into key action items, drafting a follow-up email after a project kickoff, proofreading a report, and brainstorming a launch plan in real time. Now, as we'll see, I actually do not agree with the last one, brainstorming a launch plan. At this point, I think that pretty much all brainstorming, anything having to do with strategy or planning should be moved over to O3, but we'll get into that in a minute.

On those other example prompts, summarizing meeting notes, drafting follow-up emails, proofreading reports, that is exactly what GPT-4.0's bread and butter is. Now, in addition to kind of generalist use cases like the ones just mentioned, the other thing to note about 4.0 is its multimodal capabilities.

So for example, if you are out in the world and you want to take a photo of something as a potential input, GPT-4.0 is going to be your model. Likewise, bringing it back to an enterprise use case, let's say that you've maybe done some chicken scratch drawings of UI for a new website or an application you're designing, and you want to translate that into something more. Again, that's going to be for GPT-4.0. So here, the two operative words are generalist and multimodal.

So thinking about this from the standpoint of different types of users, as an individual, you're going to do things like feeding it transcripts of my podcast to get summaries that can be shared as part of show notes. And when it comes to employees at SMEs, mid-markets, or enterprises, honestly, a lot of the use cases look very similar. It's going to be things like ingesting call recordings and slides and creating tailored follow-up decks for prospects.

Basic marketing use cases, it's going to be things like creating standard operating procedure documents for internal knowledge management. Ultimately, 4.0 is your workhorse model for a lot of day-in, day-out knowledge work.

Then we get over to GPT-4.5. And I think what's complicated about the naming convention here is that in this case, 4.5 doesn't just mean strictly better. It means better at certain things. And the specific things it's better for, in short, are creative writing tasks. OpenAI says that GPT-4.5 is ideal for creative tasks. Emotional intelligence, clear communication, creativity, and a more collaborative, intuitive approach to brainstorming. The example prompts they give...

Create an engaging LinkedIn post about AI trends. Write a product description for a new feature launch. Develop a customer apology letter with an empathetic tone. So the way that I think about this is that effectively any time I need writing to be outward facing and good rather than just completely perfunctory, I'm turning to GPT 4.5. If I am ever doing things like coming up with a set of different possible names for a blog post or an article, I'm using 4.5.

if I'm actually having it try to write in a particular style. Once again, 4.5. I think 4.5 should be the default for marketers who are using it for any sort of external facing copy, be that email copy, but especially social media posts or any longer form article. In fact, pretty much the only writing that I don't have GPT 4.5 do and let 4.0 handle is stuff that is just complete rote summarization where the quality of the words doesn't so much matter. All that matters is it capturing the key ideas.

So for example, taking a transcript of a podcast and doing a summary does a fine enough job with that. There's no reason not to use 4.5 necessarily. It's just kind of more creative horsepower than that particular task needs.

So some ways that different types of companies might use this, solopreneurs are really anyone who's using it as an individual. If they're trying to do any sort of thought leadership writing, that's going to be a task for 4.5. For SMEs, 4.5 is going to be really good at tasks that require empathy. So for example, generating empathetic HR templates, think performance review feedback, new hiring notes, any of that sort of communication 4.5 is going to do well with.

When you get into the mid-market and enterprise, especially for companies that have global CX teams, 4.5 could be really good at things like taking brand voice guidelines and drafting localization-ready customer service macros. Again, what 4.5 is good for is external-facing writing where the quality of the words actually matters. Today's episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context.

Which, if you don't know exactly what that means yet, do not worry, we're going to explain, and it's awesome. So Blitze is used alongside your favorite coding copilot as your batch software development platform for the enterprise, and it's meant for those who are seeking dramatic development acceleration on large-scale codebases. Traditional copilots help developers with line-by-line completions and snippets,

But Blitze works ahead of the IDE, first documenting your entire codebase, then deploying more than 3,000 coordinated AI agents working in parallel to batch build millions of lines of high-quality code for large-scale software projects. So then whether it's codebase refactors, modernizations, or bulk development of your product roadmap, the whole idea of Blitze is to provide enterprises dramatic velocity improvement.

To put it in simpler terms, for every line of code eventually provided to the human engineering team, Blitze will have written it hundreds of times, validating the output with different agents to get the highest quality code to the enterprise in batch. Projects then that would normally require dozens of developers working for months can now be completed with a fraction of the team in weeks, empowering organizations to dramatically shorten development cycles and bring products to market faster than ever.

If your enterprise is looking to accelerate software development, whether it's large-scale modernization, refactoring, or just increasing the rate of your STLC, contact Blitzy at blitzy.com, that's B-L-I-T-Z-Y dot com, to book a custom demo, or just press get started and start using the product right away. Today's episode is brought to you by Vertise Labs, the AI-native digital consulting firm specializing in product development and AI agents for small to medium-sized businesses.

Now, guys, this is a market that we have seen so much interest for, so much demand for, and many times great AI dev shops and builders out there just have so much business from the high end of the mid market and big enterprises that this is a group of buyers that gets neglected. Now, for Vertise, AI native means that they don't just build AI, they use it in every step of their process. They embed agents in their workflows so that they better know how to help you embed agents in your workflows.

And indeed, what they specialize in is building AI agents and agentic workflows that augment knowledge work, from customer support to internal ops, so that your team can focus on higher value work. Vertise wants to ensure that this is not just another co-pilot, but something that works end-to-end, translating business problems into working software in weeks, not quarters.

They have found that their clients typically see a 60% reduction in time and cost, with significantly higher output than traditional technology partners. So if you are a founder, a CTO, a business leader, or you've just got a product idea to launch, check out verticelabs.io. That's V-E-R-T-I-C-E labs dot I-O. Today's episode is brought to you by Superintelligent. Now,

Now, you have heard me talk about agent readiness audits probably numerous times at this point. This is our system that uses voice agents and a hybrid human AI analysis process to benchmark your agent readiness and map your agent opportunities.

and give you some really pointed, actionable next steps to move further down the path in your agentic journey. But we're coming up on the slow time of the year, and if you want to use this time to get out ahead of peers and competitors, we're excited to announce something we're calling Agent Summer. The idea here isn't that complicated. It's basically just an accelerated program to get you agentified and fast.

First of all, it's going to include an agent readiness audit, figuring out where your biggest agent opportunities are. Next, we're going to support both your internal change management process, helping you figure out AI policy, data readiness, things like that, as well as doing action planning around the agent opportunities that are most relevant for you. And finally, we're going to connect you to the right vendors to actually go and deliver this.

Now for this, we want to work with a very small handful of companies that really want to move. We're going to be bundling more than $50,000 of services for something that starts closer to $30,000. And so if you want to use this summer to jump ahead on your company's agent journey, email agent at besuper.ai with summer in the subject line, claim one of these limited spots, and let's go have an agent summer.

Now let's actually jump ahead to a couple of models that you're probably not using all that much, at least as an individual. First up, we've got O4 Mini. And again, one of the challenges with OpenAI's naming conventions is that you hear O4 and you assume it must be better than O3, right? Well, it's very clear that O4 Mini and O4 Mini High were planted by OpenAI for a very specific set of tasks that really do slant technically.

OpenAI says that O4 Mini is good for fast, technical tasks, quick STEM-related queries, programming, and visual reasoning. The example prompts they give are extracting key data points from a CSV file, providing a quick summary of a scientific article, or quick fix a Python traceback. O4 Mini High is the same domain, except where you need more detail rather than speed. So they say this is for detailed technical tasks such as advanced coding, math, and scientific explorations.

O4-mini-hi is programmed to think longer for higher accuracy. So the example prompts they give are solving a complex math equation and explaining the steps, drafting SQL queries for data extraction, or explaining a scientific concept in layman's terms.

So how might this manifest inside of different types of companies? Well, a solopreneur who's managing all of their own processes, including their own tech, might use a model like O4 Mini to help fix bugs as a sort of fix-my-site helper. For example, spotting issues with a WordPress CSS glitch or something like that.

An SME might use O4 Mini as a sort of IT help desk assistant. A mid-market might have their data ops team use it to churn out ad hoc Python ETL scripts. An enterprise might use it to power a continuous code review bot to flag security issues across thousands of small daily pull requests. For companies, O4 Mini is designed for a lot of usage. In OpenAI's enterprise plan, it has 300 requests per day, as opposed to, for example, O4 Mini High, which has 100 requests per day, or O3, which has 100 requests per week.

I think it's fairly safe to say, though, that if your role isn't particularly technical, you're likely not going to be using O4 Mini very much. So for all intents and purposes, you can kind of ignore it and its partner, O4 Mini High, which is going to index even more specific on the technical role complexity, being really something that, for example, data scientists are going to use.

Likewise, let's just touch briefly on O1 Pro Mode. It is included in this list because it's available as part of their enterprise plan. And OpenAI says that it is for complex reasoning. So for example, drafting a detailed risk analysis memo for an EU data privacy rollout, generating a multi-page research summary on emerging technologies, creating an algorithm for financial forecasting using theoretical models.

It's a very small number of requests that enterprises get per month for O1 Pro. And for individuals, it's not even included in the main model selector. You have to go click more models and it's framed as a legacy reasoning expert.

Now, it's not impossible that there might be some use cases where O1 Pro Mode is useful for a particular type of output, such as a long-form high-stakes document. So for an SME, things like drafting a long ISO 27001 compliance handbook, for the mid-market, producing a deep patent landscape review for a potential acquisition, or for an enterprise, developing some super extensive impact assessment that has to cite specific rules and regulations from different jurisdictions.

Maybe the most germane question is when to use O1 Pro as opposed to O3. They're both reasoning models. O3 is theoretically a more advanced reasoning model. So where would you want to use the legacy O1 Pro mode? And the short answer is O1 Pro mode is optimized for work that's really long or where accuracy is really important.

O1 is extremely slow, and the whole idea of it is that it sacrifices speed for a more exhaustive internal reasoning pass, meaning that it's tuned for accuracy and depth. So for things like regulatory filings, safety-critical engineering reviews, litigation briefs, risk assessments, anything where accuracy really matters, that's the time to consider O1 Pro.

Now the other side is that O1 devotes extra compute to maintaining a coherent through line over significant outputs. So while O1 Pro and O3 both have the same 200k token context window, O1 is designed to output a much bigger set of tokens in a single go. So if you're talking about something that needs tens of thousands of words of output, for example, O1 Pro might be a consideration.

These projects are going to be fewer and farther between, which is of course why OpenAI only gives even enterprises five queries per user per month. But that's sort of the idea. Now for our purposes, the other model besides 4.0 and 4.5 that you're likely to use most often is 0.3. OpenAI's current state-of-the-art, at least in full version, reasoning model.

OpenAI says that O3 is good for complex, multi-step tasks. Basically, this is a generalist reasoning model. So O3 is going to be good for things like strategic planning, detailed analyses, extensive coding, advanced math, etc.

The example prompts they give are developing a risk analysis for market expansion, drafting a business strategy outline based on competitive data, running a multi-step analysis on a CSV file, forecasting the next quarter and plotting the trend, or reviewing pipeline metrics, and searching for new top of funnel strategies. A solopreneur might use OpenAI then for something like building an investor-ready financial model. SMEs might use it to run a supply chain simulation weighing sourcing options, tariffs, and currency risk for their next product line.

Basically, if 4.0 is the generalist workhorse, O3 is the more advanced knowledge workhorse. O3 is now actually the model that I spend the most time with and has completely revolutionized my interaction with ChatGPT. Before, I was a very frequent user. Obviously, there are lots of summarization things that even the base model like 4.0 just speeds up and makes better.

And when it comes to writing, there are certain types of documents that I care little enough about that I'll go with the 4.5 version. But 03 is the first time that I've actually found ChatGPT to be capable of robust enough strategic thinking that I can use it as a real thought partner.

One more note on O3. If you are using the deep research tool, that is by definition O3. So the way that OpenAI has that set up is that deep research takes advantage of O3's reasoning and planning capabilities to be able to take your research assignment, strategize on how to do it, go out and find all the sources, and then ultimately consolidate them into whatever type of output you're looking for. I will also say that this is the one area where I have sometimes found O3s

03's writing to be better than 4.5. Specifically, when I have compared the process of giving 4.5 the text from a handful of articles and asked them to write a short summarization essay from it versus asking Deep Research to go create a research dossier about the same topic, referencing a couple of the same articles, but also letting it go find whatever it's going to find, and

And then using that dossier to summarize and write a short article, I have in many cases found that the O3 output is better than the 4.5, perhaps just because it has better information to draw from because of the deep research process. But that's one thing to consider if you are using it for that sort of output. In any case, given how valuable deep research is, I wanted to make sure it was clear that that is in fact O3. And so let's try to wrap up by honing in on the three models that I think you're going to use most and

and what I think you're going to use them for. This is your sort of cheat sheet if you just want to do this fast. If you have a boring use case, something like meeting summarization, that's almost certainly going to be inside 4.0. OpenAI calls it everyday tasks, but you know it when you see it. This is stuff that's low stakes, but takes time, and you just want off your plate so you can be focused on other parts of your work. Anything that falls into that category is likely going to be a fit for 4.0.

4.5, on the other hand, just like OpenAI says, it is for creative tasks, specifically, I think, when the output is writing. So if you are doing any sort of thought leadership supported by AI, you're absolutely most definitely going to want to use GPT 4.5, not GPT 4.0. It just does a much better job.

And there are a lot of other types of business use cases where the quality of the words that are output also matter as well. Again, meeting summarization, who cares? You're just trying to make sure that the main ideas are there. But when you are, for example, writing HR documents, even if you're not trying to be creative in a traditional way, the quality of the output really matters. It's going to interact with human emotions. And that's what 4.5 is going to be good for.

And then when it comes to anything that involves strategic thinking, brainstorming, planning, in general, your reasoning models are just going to, by nature, do better with that. And I would go even a step farther and say that O3 is really the first model that I've seen that is extremely competent at these sort of use cases. There's a growing conventional wisdom that people who view AI as a collaborator rather than just a tool are finding themselves using it more effectively.

I think that that's true, and I think that O3 actually makes that viable. Now, the one other thing that I want to say about O3 is that in addition to just being a better thought partner, because it's actually thinking and reasoning in a different way than 4.0 and 4.5, it also has more coherent structured outputs. If you've used O3, you've probably noticed that it uses a lot more charts and other sort of visual hierarchies that simply communicate ideas more quickly.

One interesting question that has come up recently is, if I have a big report where I want the structured outputs of O3, but the creative and writing quality of 4.5, which do I use? For the specific use case that I was just talking with someone about on that front, the answer that ended up making sense for them was to use O3 for the overall, because ultimately what mattered most was the structure of the information that was coming out of the report, not just the poetry of it. They rewrote an intro section using 4.5 to capture a

This is, of course, a bit of a moving target. It's going to continue to change as these models evolve. And of course, like I said, I haven't even gotten into Claude and Grok and Gemini and all the other options. But by and large, this is how I'm using OpenAI's models and how I'm seeing them work for other companies as well. That, however, is going to do it for today's AI Daily Brief. Appreciate you listening or watching as always. And until next time, peace.

What to Use Different AI Models For 22:21 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

What to Use Different AI Models For