Today on the AI Daily Brief, which AI you should be using right now. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. ♪
Hello, friends. Welcome back to another Long Reads episode of the AI Daily Brief. The big theme of this week has, of course, been deep seek. There were a ton of op-eds about deep seek and American competitiveness, but I actually wanted to take this Long Read Sunday in a slightly different direction. I have a feeling some of us might be a little deep seeked out. Plus, this conversation is still roughly aligned, just maybe not as directly.
One big part of the DeepSeek discussion has been how the R1 model stacks up to its competitors, specifically OpenAI's O1. A lot of what we've been covering this week is how companies have been shifting to using R1. And so I thought it'd be fun to read Professor Ethan Malek's Which AI to Use Now, an updated opinionated guide. Ethan calls this the question he gets asked most. What AI should you actually use? Not five years from now, not in some hypothetical future, but today.
About every six months, he updates this list. And so this is the fresh off the press, inclusive of DeepSeek version. So what we're going to do is turn it over to AI.me to read Ethan's piece. And then I'm going to come back and talk about how I think about and use different models. One note on AI.me, I've been getting more feedback of the voice model having some problematic ticks recently. I'm going to actually do another round of training an AI version of my voice and see if it improves things, given that this is the one that I've been using for over a year now.
Still, for this episode at least, we are on that old model, so if there is anything weird, I apologize in advance. But let's turn it over to Ethan's piece, and then I will come back and discuss my answers to this question. Which AI to use now? An updated opinionated guide.
While my last post explored the race for artificial general intelligence, a topic recently thrust into headlines by Apollo program-scale funding commitments to building new AIs, today I'm tackling the one question I get asked most. What AI should you actually use? Not five years from now. Not in some hypothetical future. Today.
Every six months or so, I have written an opinionated guide for individual users of AI, not specializing in any one type of use but as a general overview. Writing this is getting more challenging. AI models are gaining capabilities at an increasingly rapid rate, new companies are releasing new models, and nothing is well documented or well understood. In fact, in the few days I have been working on this draft, I had to add an entirely new model and update the chart below multiple times due to new releases.
As a result, I may get something wrong or you may disagree with my answers, but that is why I consider it an opinionated guide. Though as a reminder, I take no money from AI Labs, so it is my opinion.
To pick an AI model for you, you need to know what they can do. I decided to focus here on the major AI companies that offer easy-to-use apps that you can run on your phone, and which allow you to access their most up-to-date AI models. Right now, to consistently access a frontier model with a good app, you are going to need to pay around $1.20/month, at least in the US, with a couple exceptions. Yes, there are free tiers, but you'll generally want paid access to get the most capable versions of these models.
We are going to go through things in detail, but for most people, there are three good choices right now. Claude from Anthropic, Google's Gemini, and OpenAI's ChatGPT. There are also a trio of models that might make sense for specialized users. Grok by Elon Musk's X.AI is an excellent model that is most useful if you are a Big X user.
Microsoft's CoPilot offers many of the features of ChatGPT and is accessible to users through Windows. And the new DeepSeek, a Chinese model that is remarkably capable and free. I'll talk about some caveats and other options at the end. For most people starting to use AI, the most important goal is to ensure that you have access to a frontier model with its own app.
Frontier models are the most advanced AIs, and thanks to the scaling law, where bigger models get disproportionately smarter, they're far more capable than older versions. That means they make fewer mistakes, and they often can provide more useful features. The problem is that most of the AI companies push you towards their smaller AI models if you don't pay for access, and sometimes even if you do.
Generally, smaller models are much faster to run, slightly less capable, and also much cheaper for the AI companies to operate. For example, GPT-4 O-mini is the smaller version of GPT-4, and Gemini Flash is the smaller version of Gemini. Often, you want to use the full models where possible, but there are exceptions when the smaller model is actually more advanced, and everything has terrible names. Right now, for Claude, you want to use Claude 3.5 Sonnet,
which consistently outperforms its larger sibling, Claude 3 Opus. For Gemini, you want to use Gemini 2.0 Flash, though full Gemini 2.0 is expected very soon. And for ChatGPT, you want to use GPT-4-0, except when tackling complex problems that benefit from O1's reasoning capabilities. While this can be confusing, it is also a side effect of how quickly these companies are updating their AIs and their features.
Imagine an AI that can converse with you in real time, seeing what you see, hearing what you say, and responding naturally. That's live mode, though it goes by various names. This interactive capability represents a powerful way to use AI. To demonstrate, I use ChatGPT's advanced voice mode to discuss my game collection. This entire interaction, which you can hear with sound on, took place on my phone. You are actually seeing three advances in AI working together.
First, multimodal speech lets the AI handle voice natively, unlike most AI models that use separate systems to convert between text and speech. This means it can theoretically generate any sound, though OpenAI limits this for safety. Second, multimodal vision lets the AI see and analyze real-time video. Third, internet connectivity provides access to current information. The system isn't perfect. When pulling the board game ratings from the internet, it got one right but mixed up another with its expansion pack.
Still, the seamless combination of these features creates a remarkably natural interaction, like chatting with a knowledgeable, if not always 100% accurate, friend who can see what you're seeing. Right now, only ChatGPT offers a full multimodal live mode for all paying customers. It's the little icon all the way to the right of the prompt bar. ChatGPT is full of little icons. But Google has already demonstrated a live mode for its Gemini model, and I expect we will see others soon.
For those who are watching the AI space, by far the most important recent advance in the last few months has been the development of reasoning models. As I explained in my post about O1, it turns out that if you let an AI think about a problem before answering, you get better results. The longer the model thinks, generally, the better the outcome. Behind the scenes, it's cranking through a whole thought process you never see, only showing you the final answer. Interestingly, when you peek behind that curtain, you find these AIs think in ways that feel eerily human.
That was the thinking process of DeepSeek v. 3.1.Real, one of only a few reasoning models that have been released to the public. It is also an unusual model in many ways. It is an excellent model from China. It is open source so anyone can download and modify it. And it is cheap to run and is currently offered for free by its parent company, DeepSeek. Google also offers a reasoning version of its Gemini 2.0 Flash. However, the most capable reasoning models right now are the O1 family from OpenAI.
These are confusingly named, but in order of capability, there are O1 Mini, O1, and O1 Pro. A new series of models, O3, OpenAI could not get the rights to the O2 name, making things even more baffling, is expected at any moment. And O3 Mini is likely to be a very good model. Reasoning models aren't chatty assistants. They're more like scholars. You'll ask a question, wait while they think, sometimes minutes, and get an answer.
You want to make sure that the question you give them is very clear and has all the context they need. For very hard questions, especially in academic research, math, or computer science, you will want to use a reasoning model. Otherwise, a standard chat model is fine.
Not all AIs can access the web and do searches to learn new information past their original training. Currently, Gemini, Grok, DeepSeek, Copilot, and ChatGPT can search the web actively, while Claude cannot. This capability makes a huge difference when you need current information or fact-checking, but not all models use their internet connections fully, so you will still need to fact-check.
Most of the LLMs that generate images do so by actually using a separate image generation tool. They do not have direct control over what that tool does, they just send a prompt to it and then show you the picture that results. That is changing with multimodal image creation, which lets the AI directly control the images it makes.
For right now, Gemini's Imogen 3 leads the pack, but honestly, they'll all handle your basic otter holding a sign saying this is underscore as it sits on a pink unicorn float in the middle of a pool just fine. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in.
Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company.
Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vantage to manage risk and improve security in real time.
If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents buy industry horizontal agent platforms, ADCs.
Agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.
That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.
If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI Quarterly Pulse Survey.
Did you know that 67% of business leaders expect AI to fundamentally transform their businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI, they're leading the charge with practical solutions and real-world applications.
For instance, over half of the organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation, they're helping shape it. Learn more about how KPMG is driving AI innovation at kpmg.com slash US.
All AIs are pretty good at writing code, but only a few models, mostly Claude and ChatGPT, but also Gemini to a lesser extent, have the ability to execute the code directly. Doing so lets you do a lot of exciting things. For example, this is the result of telling O1 using the Canvas feature, which you need to turn on by typing slash Canvas.
Create an interactive tool that visually shows me how correlation works, and why correlation alone is not a great descriptor of the underlying data in many cases. Make it accessible to non-math people and highly interactive and engaging.
Further, when models can code and use external files, they are capable of doing data analysis. Want to analyze a dataset? ChatGPT's Code Interpreter will do the best job on statistical analyses. Claude does less statistics but often is best at interpretation. And Gemini tends to focus on graphing. None of them are great, with Excel files full of formulas and tabs yet, but they do a good job with structured data. It is very useful for your AI to take in data from the outside world.
Almost all of the major AIs include the ability to process images. The models can often infer a huge amount from a picture. Far fewer models do video, which is actually processed as images at one frame every second or two. Right now, that can only be done by Google's Gemini, though ChatGPT can see video in live mode. And while all the AI models can work with documents, they aren't equally good at all formats. Gemini GPT-4-0 but not O1, and Cloud can process PDFs with images and charts, while DeepSeek can only read the text.
No model is particularly good at Excel or PowerPoint, though Microsoft Copilot does a bit better here, as you might expect, though that will change soon. The different models also have different amounts of memory, context windows, with Gemini having by far the most capable of holding up to 2 million words at once. A year ago, privacy was a major concern when choosing an AI model. The early versions of these systems would save your chats and use them to improve their models. That's changed dramatically.
Every major provider, except DeepSeek, now offers some form of privacy-focused mode. ChatGPT lets you opt out of training, and Claude says it will not train on your data as does Gemini. The exception is if you're handling truly sensitive data like medical records. In those cases, you'll still want to look into enterprise versions of these tools that offer additional security guarantees and meet regulatory requirements.
Each platform offers different ways to customize the AI for your use cases. ChatGPT lets you create custom GPTs tailored to specific tasks and includes an optional feature to remember facts from previous conversations. Gemini integrates with your Google Workspace, and Claude has custom styles and projects.
As you can see, there are lots of features to pick from, and on top of that, there is the issue of vibes. Each model has its own personality and way of working, almost like a person. If you happen to like the personality of a particular AI, you may be willing to put up with fewer features or less capabilities. You can try out the free versions of multiple AIs to get a sense for that. That said, for most people, you probably want to pick among the paid versions of ChatGPT, Claude, or Gemini. ChatGPT currently has the best live mode in its advanced voice mode.
The other big advantage of ChatGPT is that it does everything, often in somewhat confusing ways. OpenAI has AI models specialized in hard problems, O1 series, and models for chat, GPT-4-0. Some models can write and run complex software programs, though it is hard to know which. There are systems that remember past interactions and scheduling systems. Movie-making tools and early software agents.
It can be a lot, but it gives you opportunities to experiment with many different AI capabilities.
It is also worth noting that ChatGPT offers a $1.200-slash-month tier, whose main advantage is access to very powerful reasoning models. Gemini does not yet have as good a live mode, but that is supposed to be coming soon. For now, Gemini's advantage is a family of powerful models, including reasoners, very good integration with search, and a pretty easy-to-use user interface, as you might expect from Google. It also has top-flight image and video generation. Also excellent is deep research, which I wrote about at length in my last post.
Claude has the smallest number of features of any of these three systems, and really only has one model you care about: Claude 3.5 Sonnet. But Sonnet is very, very good. It often seems to be clever and insightful in ways that the other models are not. A lot of people end up using Claude as their primary model as a result, even though it is not as feature-rich.
While it is very new, you might also consider DeepSeek if you want a very good all-around model with excellent reasoning. If you subscribe to X, you get Grok for free. And the team at X.ai are scaling up capabilities incredibly quickly with a soon-to-be-released new model, Grok 3, promising to be the largest model ever trained. And if you have Copilot, you can use that, as it includes a mix of Microsoft and OpenAI models, though I find the lack of transparency over which models it is using when to be somewhat confusing.
There are also many services, like Poe, that offer access to multiple models at the same time if you want to experiment. In the time it took you to read this guide, a new AI capability probably launched and two others got major upgrades. But don't let that paralyze you. The secret isn't waiting for the perfect AI. It's diving in and discovering what these tools can actually accomplish. Jump in, get your hands dirty, and find what clicks. It will help you understand where AI can help you, where it can't, and what is coming next.
All right. So as always, lots of good thoughts from Ethan. As he points out, some of your choices are kind of made for you. If you're interested in the sort of interaction that advanced voice mode offers, you just use advanced voice mode at this point. That won't be the case forever, but it mostly is the case now. Similarly, there are certain products that just don't have equivalents outside of the company setting the tone.
Notebook LM, specifically the audio overviews, are the biggest example for me. That's not to say that people aren't trying. Eleven Labs, for example, which is a company that I love, has a feature to create this sort of podcast out of other content. But Notebook LM as an all-inclusive experience is really just something different.
But what about for the day-in, day-out stuff? Help with writing, help writing social media, brainstorming, generating images, kind of the bread-and-butter Gen AI uses that at this point you don't even think twice about. First of all, let's talk about R1. I have tried out the DeepSeek app, although only for uses that I don't care about the idea of the Chinese government having access to. For me, though, I think the place that I'll run into DeepSeek models most often is when they are transposed and embedded in some other tool that I use, such as in Perplexity. I
I haven't had a chance yet to do a full test of how R1 works in Perplexity as compared to other models, but given how often I use Perplexity, I'm sure that's something that I'll do at some point soon.
When it comes to image generators, I continue to prefer the aesthetics of Midjourney. No big surprise there. But in terms of actual practical usage, Ideogram is by far my most used tool. The reason for that is my specific use cases. Most of the images that I generate these days for some part of my work need or at least strongly benefit from text. Ideogram handles that incredibly well, and whatever they lack in terms of stylistic fidelity, the audio more than makes out for in functionality.
I've also found that Ideogram is really good at listening to your very inelegant plotting prompt and figuring out what you're trying to go for. Mid-Journey gives you incredibly fine-grained controls that if you either A, have a vision for exactly what you want, or B, just want to let it be artistic, it thrives in both of those situations. But when I'm blundering around trying to get a comparison with a cartoon of two people who are corporate workers dressed normally in one hand and then being a robot in another pane so I can put that in a pitch deck, Ideogram just does it better for me.
And so what about the core stuff? When it comes to actual writing tasks, I am still in the mode of throw the prompt through both Claude and ChatGPT and see which I prefer for a particular context. I tend to default first to Claude, in part because I've set up a number of voice personas that I can instantly toggle, taking that out of the prompting process. But for any given prompt, it's sort of six of one, half dozen of the other of which I prefer.
Certainly when it comes to brainstorming, which is my number one use case, ChatGPT has the crown for me. I'm still figuring out the ins and outs of all of the different O-series models, but I do tend to find that even though ChatGPT 4.0 might write sometimes better than the O-models, when it comes to a creative partner or a brainstorming partner, the reasoning models do, I think, outperform.
I will say that none of these preferences right now are so strong that I wouldn't be open to changing. In fact, quite the opposite, I am constantly jumping between things to see what's going to be best. The good news for those of you who don't have unlimited time to experiment and who don't have the ability to write off these tools as a business expense, at this point, pretty much any of these base models that you choose are going to do a lot of what you want really well.
We're clearly just on the cusp with something new with these reasoning models, and there's a lot to be excited about coming this year on that front. But that's where I am currently. Let me know in the comments what you use, if you found any tricks or tips. For now, though, that is going to do it for the AI Daily Brief. Appreciate you listening or watching as always. And until next time, peace.