We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Which AI Models You Should Be Using Right Now

2025/2/2

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript

People

Ethan Mollick

No available information on Ethan Mollick.

NLW

知名播客主持人和分析师，专注于加密货币和宏观经济分析。

Topics

Ethan Mollick: 我经常被问到应该使用哪个AI模型,这个问题没有简单的答案,因为不同的模型有不同的优势和劣势。我根据自身经验总结了一些目前可用的最佳AI模型,并对它们的特性和适用场景进行了详细的分析。我建议大多数用户可以从Anthropic的Claude、谷歌的Gemini和OpenAI的ChatGPT中选择,这三款模型都提供了易于使用的应用程序,并且能够访问其最新的AI模型。为了获得最佳性能,应该使用前沿模型,而不是较小的模型,即使后者运行速度更快且成本更低。不同公司提供的不同型号的AI模型性能各不相同,选择时需要仔细权衡。ChatGPT拥有目前最好的多模态实时模式,允许AI与用户进行自然交互,并访问实时信息。最近几个月,推理模型取得了重大进展,它们能够在回答问题之前进行思考,从而得到更好的结果。OpenAI的O1系列是目前最强大的推理模型,DeepSeek v. 3.1.Real也是一个值得关注的开源推理模型。并非所有AI模型都能访问网络并搜索最新信息,这在需要最新信息或事实核查时是一个重要的考虑因素。大多数生成图像的LLM使用单独的图像生成工具,但多模态图像创建正在改变这种情况。一些AI模型能够直接执行代码,这使得它们能够执行更复杂的任务,例如创建交互式工具。AI模型在数据分析方面各有优势,ChatGPT的代码解释器在统计分析方面表现最佳,Claude擅长解释,Gemini擅长图表制作。AI模型处理不同格式文档的能力各不相同,Gemini、GPT-4和Claude能够处理包含图像和图表PDF文件,而DeepSeek只能读取文本。大多数主要AI提供商现在都提供某种形式的隐私保护模式,但处理敏感数据时仍需谨慎。每个平台都提供不同的方式来定制AI以满足用户的特定需求。选择AI模型时,除了功能和性能外,还要考虑其个性和使用体验。ChatGPT功能丰富,提供多种AI模型,可以满足不同的需求,但使用方式可能比较复杂。Gemini具有强大的模型家族、良好的搜索集成和易于使用的用户界面,以及顶级的图像和视频生成能力。Claude虽然功能较少,但其模型Claude 3.5 Sonnet表现出色,常常展现出其他模型不具备的聪明和洞察力。DeepSeek是一个功能强大的全能模型,具有出色的推理能力;Grok与X平台集成,并且正在快速发展;Copilot结合了微软和OpenAI的模型,但其模型缺乏透明度。总而言之,选择合适的AI模型需要根据自身需求和偏好进行权衡,没有一个放之四海而皆准的最佳选择。 NLW: 我个人在选择AI模型时,会根据具体的使用场景和需求进行选择。对于需要高级语音模式的交互,目前ChatGPT是最佳选择。Notebook LM的音频概述功能目前尚无替代品。我将DeepSeek模型主要用于我不介意中国政府访问的数据。我更喜欢Midjourney的图像美学,但在实际使用中,Ideogram更实用,因为它能够很好地处理文本。我通常在Claude和ChatGPT之间切换,用于写作任务,并根据具体情况选择更合适的模型。对于头脑风暴,ChatGPT表现最佳;而对于创意合作,推理模型表现更好。目前,大多数基础模型都能很好地完成大部分任务,选择哪个模型取决于个人偏好和具体需求。

Deep Dive

Chapters

This chapter explores the evolution of AI models, highlighting the rapid advancements and the challenges in choosing the right model for individual users. It emphasizes the importance of using frontier models for better capabilities and introduces three primary choices: Claude, Gemini, and ChatGPT.

Rapid advancements in AI models make choosing difficult.
Frontier models offer superior capabilities due to scaling laws.
Three good choices are Claude, Gemini, and ChatGPT for most users.

Shownotes Transcript

Translations:

中文

Today on the AI Daily Brief, which AI you should be using right now. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. ♪

Hello, friends. Welcome back to another Long Reads episode of the AI Daily Brief. The big theme of this week has, of course, been deep seek. There were a ton of op-eds about deep seek and American competitiveness, but I actually wanted to take this Long Read Sunday in a slightly different direction. I have a feeling some of us might be a little deep seeked out. Plus, this conversation is still roughly aligned, just maybe not as directly.

One big part of the DeepSeek discussion has been how the R1 model stacks up to its competitors, specifically OpenAI's O1. A lot of what we've been covering this week is how companies have been shifting to using R1. And so I thought it'd be fun to read Professor Ethan Malek's Which AI to Use Now, an updated opinionated guide. Ethan calls this the question he gets asked most. What AI should you actually use? Not five years from now, not in some hypothetical future, but today.

About every six months, he updates this list. And so this is the fresh off the press, inclusive of DeepSeek version. So what we're going to do is turn it over to AI.me to read Ethan's piece. And then I'm going to come back and talk about how I think about and use different models. One note on AI.me, I've been getting more feedback of the voice model having some problematic ticks recently. I'm going to actually do another round of training an AI version of my voice and see if it improves things, given that this is the one that I've been using for over a year now.

Still, for this episode at least, we are on that old model, so if there is anything weird, I apologize in advance. But let's turn it over to Ethan's piece, and then I will come back and discuss my answers to this question. Which AI to use now? An updated opinionated guide.

While my last post explored the race for artificial general intelligence, a topic recently thrust into headlines by Apollo program-scale funding commitments to building new AIs, today I'm tackling the one question I get asked most. What AI should you actually use? Not five years from now. Not in some hypothetical future. Today.

Every six months or so, I have written an opinionated guide for individual users of AI, not specializing in any one type of use but as a general overview. Writing this is getting more challenging. AI models are gaining capabilities at an increasingly rapid rate, new companies are releasing new models, and nothing is well documented or well understood. In fact, in the few days I have been working on this draft, I had to add an entirely new model and update the chart below multiple times due to new releases.

As a result, I may get something wrong or you may disagree with my answers, but that is why I consider it an opinionated guide. Though as a reminder, I take no money from AI Labs, so it is my opinion.

To pick an AI model for you, you need to know what they can do. I decided to focus here on the major AI companies that offer easy-to-use apps that you can run on your phone, and which allow you to access their most up-to-date AI models. Right now, to consistently access a frontier model with a good app, you are going to need to pay around $1.20/month, at least in the US, with a couple exceptions. Yes, there are free tiers, but you'll generally want paid access to get the most capable versions of these models.

We are going to go through things in detail, but for most people, there are three good choices right now. Claude from Anthropic, Google's Gemini, and OpenAI's ChatGPT. There are also a trio of models that might make sense for specialized users. Grok by Elon Musk's X.AI is an excellent model that is most useful if you are a Big X user.

Microsoft's CoPilot offers many of the features of ChatGPT and is accessible to users through Windows. And the new DeepSeek, a Chinese model that is remarkably capable and free. I'll talk about some caveats and other options at the end. For most people starting to use AI, the most important goal is to ensure that you have access to a frontier model with its own app.

Frontier models are the most advanced AIs, and thanks to the scaling law, where bigger models get disproportionately smarter, they're far more capable than older versions. That means they make fewer mistakes, and they often can provide more useful features. The problem is that most of the AI companies push you towards their smaller AI models if you don't pay for access, and sometimes even if you do.

Generally, smaller models are much faster to run, slightly less capable, and also much cheaper for the AI companies to operate. For example, GPT-4 O-mini is the smaller version of GPT-4, and Gemini Flash is the smaller version of Gemini. Often, you want to use the full models where possible, but there are exceptions when the smaller model is actually more advanced, and everything has terrible names. Right now, for Claude, you want to use Claude 3.5 Sonnet,

which consistently outperforms its larger sibling, Claude 3 Opus. For Gemini, you want to use Gemini 2.0 Flash, though full Gemini 2.0 is expected very soon. And for ChatGPT, you want to use GPT-4-0, except when tackling complex problems that benefit from O1's reasoning capabilities. While this can be confusing, it is also a side effect of how quickly these companies are updating their AIs and their features.

Imagine an AI that can converse with you in real time, seeing what you see, hearing what you say, and responding naturally. That's live mode, though it goes by various names. This interactive capability represents a powerful way to use AI. To demonstrate, I use ChatGPT's advanced voice mode to discuss my game collection. This entire interaction, which you can hear with sound on, took place on my phone. You are actually seeing three advances in AI working together.

First, multimodal speech lets the AI handle voice natively, unlike most AI models that use separate systems to convert between text and speech. This means it can theoretically generate any sound, though OpenAI limits this for safety. Second, multimodal vision lets the AI see and analyze real-time video. Third, internet connectivity provides access to current information. The system isn't perfect. When pulling the board game ratings from the internet, it got one right but mixed up another with its expansion pack.

Still, the seamless combination of these features creates a remarkably natural interaction, like chatting with a knowledgeable, if not always 100% accurate, friend who can see what you're seeing. Right now, only ChatGPT offers a full multimodal live mode for all paying customers. It's the little icon all the way to the right of the prompt bar. ChatGPT is full of little icons. But Google has already demonstrated a live mode for its Gemini model, and I expect we will see others soon.

For those who are watching the AI space, by far the most important recent advance in the last few months has been the development of reasoning models. As I explained in my post about O1, it turns out that if you let an AI think about a problem before answering, you get better results. The longer the model thinks, generally, the better the outcome. Behind the scenes, it's cranking through a whole thought process you never see, only showing you the final answer. Interestingly, when you peek behind that curtain, you find these AIs think in ways that feel eerily human.

That was the thinking process of DeepSeek v. 3.1.Real, one of only a few reasoning models that have been released to the public. It is also an unusual model in many ways. It is an excellent model from China. It is open source so anyone can download and modify it. And it is cheap to run and is currently offered for free by its parent company, DeepSeek. Google also offers a reasoning version of its Gemini 2.0 Flash. However, the most capable reasoning models right now are the O1 family from OpenAI.

These are confusingly named, but in order of capability, there are O1 Mini, O1, and O1 Pro. A new series of models, O3, OpenAI could not get the rights to the O2 name, making things even more baffling, is expected at any moment. And O3 Mini is likely to be a very good model. Reasoning models aren't chatty assistants. They're more like scholars. You'll ask a question, wait while they think, sometimes minutes, and get an answer.

You want to make sure that the question you give them is very clear and has all the context they need. For very hard questions, especially in academic research, math, or computer science, you will want to use a reasoning model. Otherwise, a standard chat model is fine.

Not all AIs can access the web and do searches to learn new information past their original training. Currently, Gemini, Grok, DeepSeek, Copilot, and ChatGPT can search the web actively, while Claude cannot. This capability makes a huge difference when you need current information or fact-checking, but not all models use their internet connections fully, so you will still need to fact-check.

Most of the LLMs that generate images do so by actually using a separate image generation tool. They do not have direct control over what that tool does, they just send a prompt to it and then show you the picture that results. That is changing with multimodal image creation, which lets the AI directly control the images it makes.

For right now, Gemini's Imogen 3 leads the pack, but honestly, they'll all handle your basic otter holding a sign saying this is underscore as it sits on a pink unicorn float in the middle of a pool just fine. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in.

Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company.

Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vantage to manage risk and improve security in real time.

If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents buy industry horizontal agent platforms, ADCs.

Agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.

That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.

If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI Quarterly Pulse Survey.

Did you know that 67% of business leaders expect AI to fundamentally transform their businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI, they're leading the charge with practical solutions and real-world applications.

For instance, over half of the organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation, they're helping shape it. Learn more about how KPMG is driving AI innovation at kpmg.com slash US.

All AIs are pretty good at writing code, but only a few models, mostly Claude and ChatGPT, but also Gemini to a lesser extent, have the ability to execute the code directly. Doing so lets you do a lot of exciting things. For example, this is the result of telling O1 using the Canvas feature, which you need to turn on by typing slash Canvas.

Create an interactive tool that visually shows me how correlation works, and why correlation alone is not a great descriptor of the underlying data in many cases. Make it accessible to non-math people and highly interactive and engaging.

Further, when models can code and use external files, they are capable of doing data analysis. Want to analyze a dataset? ChatGPT's Code Interpreter will do the best job on statistical analyses. Claude does less statistics but often is best at interpretation. And Gemini tends to focus on graphing. None of them are great, with Excel files full of formulas and tabs yet, but they do a good job with structured data. It is very useful for your AI to take in data from the outside world.

Almost all of the major AIs include the ability to process images. The models can often infer a huge amount from a picture. Far fewer models do video, which is actually processed as images at one frame every second or two. Right now, that can only be done by Google's Gemini, though ChatGPT can see video in live mode. And while all the AI models can work with documents, they aren't equally good at all formats. Gemini GPT-4-0 but not O1, and Cloud can process PDFs with images and charts, while DeepSeek can only read the text.

No model is particularly good at Excel or PowerPoint, though Microsoft Copilot does a bit better here, as you might expect, though that will change soon. The different models also have different amounts of memory, context windows, with Gemini having by far the most capable of holding up to 2 million words at once. A year ago, privacy was a major concern when choosing an AI model. The early versions of these systems would save your chats and use them to improve their models. That's changed dramatically.

Every major provider, except DeepSeek, now offers some form of privacy-focused mode. ChatGPT lets you opt out of training, and Claude says it will not train on your data as does Gemini. The exception is if you're handling truly sensitive data like medical records. In those cases, you'll still want to look into enterprise versions of these tools that offer additional security guarantees and meet regulatory requirements.

Each platform offers different ways to customize the AI for your use cases. ChatGPT lets you create custom GPTs tailored to specific tasks and includes an optional feature to remember facts from previous conversations. Gemini integrates with your Google Workspace, and Claude has custom styles and projects.

As you can see, there are lots of features to pick from, and on top of that, there is the issue of vibes. Each model has its own personality and way of working, almost like a person. If you happen to like the personality of a particular AI, you may be willing to put up with fewer features or less capabilities. You can try out the free versions of multiple AIs to get a sense for that. That said, for most people, you probably want to pick among the paid versions of ChatGPT, Claude, or Gemini. ChatGPT currently has the best live mode in its advanced voice mode.

The other big advantage of ChatGPT is that it does everything, often in somewhat confusing ways. OpenAI has AI models specialized in hard problems, O1 series, and models for chat, GPT-4-0. Some models can write and run complex software programs, though it is hard to know which. There are systems that remember past interactions and scheduling systems. Movie-making tools and early software agents.

It can be a lot, but it gives you opportunities to experiment with many different AI capabilities.

It is also worth noting that ChatGPT offers a $1.200-slash-month tier, whose main advantage is access to very powerful reasoning models. Gemini does not yet have as good a live mode, but that is supposed to be coming soon. For now, Gemini's advantage is a family of powerful models, including reasoners, very good integration with search, and a pretty easy-to-use user interface, as you might expect from Google. It also has top-flight image and video generation. Also excellent is deep research, which I wrote about at length in my last post.

Claude has the smallest number of features of any of these three systems, and really only has one model you care about: Claude 3.5 Sonnet. But Sonnet is very, very good. It often seems to be clever and insightful in ways that the other models are not. A lot of people end up using Claude as their primary model as a result, even though it is not as feature-rich.

While it is very new, you might also consider DeepSeek if you want a very good all-around model with excellent reasoning. If you subscribe to X, you get Grok for free. And the team at X.ai are scaling up capabilities incredibly quickly with a soon-to-be-released new model, Grok 3, promising to be the largest model ever trained. And if you have Copilot, you can use that, as it includes a mix of Microsoft and OpenAI models, though I find the lack of transparency over which models it is using when to be somewhat confusing.

There are also many services, like Poe, that offer access to multiple models at the same time if you want to experiment. In the time it took you to read this guide, a new AI capability probably launched and two others got major upgrades. But don't let that paralyze you. The secret isn't waiting for the perfect AI. It's diving in and discovering what these tools can actually accomplish. Jump in, get your hands dirty, and find what clicks. It will help you understand where AI can help you, where it can't, and what is coming next.

All right. So as always, lots of good thoughts from Ethan. As he points out, some of your choices are kind of made for you. If you're interested in the sort of interaction that advanced voice mode offers, you just use advanced voice mode at this point. That won't be the case forever, but it mostly is the case now. Similarly, there are certain products that just don't have equivalents outside of the company setting the tone.

Notebook LM, specifically the audio overviews, are the biggest example for me. That's not to say that people aren't trying. Eleven Labs, for example, which is a company that I love, has a feature to create this sort of podcast out of other content. But Notebook LM as an all-inclusive experience is really just something different.

But what about for the day-in, day-out stuff? Help with writing, help writing social media, brainstorming, generating images, kind of the bread-and-butter Gen AI uses that at this point you don't even think twice about. First of all, let's talk about R1. I have tried out the DeepSeek app, although only for uses that I don't care about the idea of the Chinese government having access to. For me, though, I think the place that I'll run into DeepSeek models most often is when they are transposed and embedded in some other tool that I use, such as in Perplexity. I

I haven't had a chance yet to do a full test of how R1 works in Perplexity as compared to other models, but given how often I use Perplexity, I'm sure that's something that I'll do at some point soon.

When it comes to image generators, I continue to prefer the aesthetics of Midjourney. No big surprise there. But in terms of actual practical usage, Ideogram is by far my most used tool. The reason for that is my specific use cases. Most of the images that I generate these days for some part of my work need or at least strongly benefit from text. Ideogram handles that incredibly well, and whatever they lack in terms of stylistic fidelity, the audio more than makes out for in functionality.

I've also found that Ideogram is really good at listening to your very inelegant plotting prompt and figuring out what you're trying to go for. Mid-Journey gives you incredibly fine-grained controls that if you either A, have a vision for exactly what you want, or B, just want to let it be artistic, it thrives in both of those situations. But when I'm blundering around trying to get a comparison with a cartoon of two people who are corporate workers dressed normally in one hand and then being a robot in another pane so I can put that in a pitch deck, Ideogram just does it better for me.

And so what about the core stuff? When it comes to actual writing tasks, I am still in the mode of throw the prompt through both Claude and ChatGPT and see which I prefer for a particular context. I tend to default first to Claude, in part because I've set up a number of voice personas that I can instantly toggle, taking that out of the prompting process. But for any given prompt, it's sort of six of one, half dozen of the other of which I prefer.

Certainly when it comes to brainstorming, which is my number one use case, ChatGPT has the crown for me. I'm still figuring out the ins and outs of all of the different O-series models, but I do tend to find that even though ChatGPT 4.0 might write sometimes better than the O-models, when it comes to a creative partner or a brainstorming partner, the reasoning models do, I think, outperform.

I will say that none of these preferences right now are so strong that I wouldn't be open to changing. In fact, quite the opposite, I am constantly jumping between things to see what's going to be best. The good news for those of you who don't have unlimited time to experiment and who don't have the ability to write off these tools as a business expense, at this point, pretty much any of these base models that you choose are going to do a lot of what you want really well.

We're clearly just on the cusp with something new with these reasoning models, and there's a lot to be excited about coming this year on that front. But that's where I am currently. Let me know in the comments what you use, if you found any tricks or tips. For now, though, that is going to do it for the AI Daily Brief. Appreciate you listening or watching as always. And until next time, peace.

Which AI Models You Should Be Using Right Now 20:37 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

Which AI Models You Should Be Using Right Now