As we prepare for a big week with events from Anthropic, Google, and Microsoft, we get into five different ways to think about broader AI competition. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Thanks to today's sponsors, Blitzy.com, Vertice Labs, and Super Intelligent. And to get an ad-free version of the show, go to patreon.com slash ai daily brief.
Hello, friends. Quick note. Like I said, we have a very big week coming up. There is going to be a lot of news. Today is a preview episode of what's going on, plus a potentially interesting framework for how to think about AI competition. This crowded out the room for some of the headlines, but we will be back with a normal episode with both headlines and a main episode tomorrow. I
I anticipate we'll probably be talking about what was announced at Microsoft Build for the main. But in any case, that is the story with today's episode. So without any further ado, let's dive in. Welcome back to the AI Daily Brief. Well, friends, we have a very big week of AI news coming up with us, or at least I should say, the context and setting where we would presumably have a big week of AI news. So
So what's going on? Well, first of all, we have Microsoft Build, which kicks off today in Seattle and runs throughout the week. Then starting tomorrow, we have Google I.O., which is their annual developer conference. Then at the end of the week on May 22nd, we have Anthropic's first developer conference, Code with Claude.
All of this, by the way, is why OpenAI tried to front-run some of this by launching Codex last week so that they didn't miss an entire cycle of new model announcements. In any case, what we're going to do today is take a look at the state of where these companies are, see what we might glean about what might be coming, and the context that I want to use for that is thinking about the competition for AI across five different vectors.
Consumer Enterprise Benchmarks Coding Agents. This is far from scientific or even comprehensive. It's just how I kind of think about things. I will note that I think competition for developers is a really important thing, but I think it's kind of endemic across all of these and is especially honed in on coding and agents.
And in any case, we're going to come back to that. But to start, let's talk about where each of these companies are. And let's actually do a quick summary of the Codex announcement, again, since it was chosen as a way to front run the rest of this news.
So what is Codex? Talked about it a little bit on Friday, but basically it is a coding agent. It's powered by a new model called Codex One that's a version of the O3 reasoning model that is optimized specifically for software engineering. OpenAI claims that Codex One produces cleaner code than O3, is better at following instructions, and will run iterative tests on its code until it gets the results it wants. Codex is built for power. It can handle multiple tasks simultaneously, and people can use their computers or browsers even while it's running.
Codex is built into ChatGPT, which is different than Cloud Code or Cursor or other things like that, which immediately gives it really wide distribution. For example, Box's Aaron Levy points out, OpenAI Codex works on the mobile app. We're entering a wild world where you can have AI agents coding anything while on your phone. The ability to just have unlimited AI agents executing tasks on your behalf in the background is going to utterly change knowledge work.
Now, I wasn't using Codex, but I will say that I had a moment this weekend where I was walking around a nearby hiking trail, and I had lovable vibe coding something just from Chrome on my phone, and I had OpenAI's deep research working on a different thing. So even though it's nascent and the UI is not exactly optimized, I think Aaron's exactly right here.
The reaction to Codex is a little muted. Santiago writes, Literally everyone is freaking out over Codex like they didn't do the exact same thing for Devin, Cursor, DeepSeek, and every GPT drop since 2.0. The hype cycle resets every three weeks and we all start everything all over again. This is what we'll see over the next few days. OpenAI employees will claim they've been using Codex for a while and it's writing all their code. A few people will tell the story of how they casually asked Codex to finish an old project and it did it all and it was perfect. AI influencers will litter our feeds with 90% of people don't know this Codex trick and Cursor is dead threads.
etc, etc, etc. Now, interestingly, I actually don't think that this has been what's going on. I do think that you should mentally filter out every AI think boy post along the lines of these threads that he's talking about. But I don't think people are really freaking out over a codex. I think they're excited. I think they're trying to figure it out. Riley Brown, for example, who's building VibeCode wrote a long post and shared a video called How to Test Codex as a Vibe Coder Non-Technical. He says with Codex, you can spin up AI coding agents that edit your code.
This is like Devin, not Cursor. You can run these AI agents in parallel. He then goes through how he used it with other vibe coding apps, including V0, and shared a video of his work. At the end, he said, P.S. This is probably not an efficient way to do it. This is just how I tested it. This is the type of post that I've seen a lot of surrounding Codex. I think so far, it's just too fast to know how it's going to fit in this whole ecosystem. Although certainly it is validation that this ecosystem is incredibly important.
Another interesting thought came from Josh Tobin, who does agents at OpenAI, who said, My hot take is that Codex increases the value of being technical. If you can describe precisely what you want to build, you can get a massive amount done in parallel. That's fundamentally a technical skill. Professor Ethan Malek writes, Codex is neat, but I really wish that OpenAI had gone the extra step of making it accessible to non-coders. Not that non-coders should expect to make complex or high-quality applications with today's software engineering agents, but democratizing making of small tools can make a big difference.
Anyways, Codex is out. It's now a part of this ecosystem, and I'm sure we'll see it start to integrate and interact with all these other tools soon.
But what about the companies that OpenAI was trying to front run? What can we expect from them? Now, obviously, this is a big company that has a lot to talk about even beyond Copilot and AI, but Copilot is obviously expected to be a core part of the story. Right now, it's not exactly clear though if there's some big thing to announce or if we're just seeing the continued deepening of Copilot into all of Microsoft's products. Business Standard writes, Microsoft's Copilot AI assistant is expected to take center stage at the upcoming event.
The company has been steadily embedding Copilot across its key platforms, Windows, Office, and Azure, and further updates are anticipated this week. New features such as semantic search abilities and settings, File Explorer, and the Windows search bar are likely on the way. Additionally, Microsoft may announce enhancement to Copilot Agents, a feature introduced in April designed to streamline complex multi-step tasks using AI.
Now, Business Standard also expects Windows 11 and Azure to get some airtime, particularly around their AI dimensions, such as the recall feature in Windows 11. But fascinatingly, they also call out model context protocol as a major potential part of this. If we see any sort of emphasis on MCP, like if it makes it into Satya Nadella's keynote, that would certainly suggest that Microsoft is really interested in competing around agents.
I think for me, what I'm watching with Microsoft is just how they position themselves in general in this AI battle. They're very clearly not competing, at least right now, to push the boundaries of what's possible from a model standpoint, but they're still the default for enterprises. And so what they do potentially carries more heft in terms of what's available.
One of the infuriating things for people who work inside companies that are Microsoft shops is the disparity between the tools they can use in their personal life and what they have available. So to the extent that Microsoft can close those gaps, that would be a very powerful thing. Remember though, Microsoft is thinking in a big zoomed out way. We've talked a lot recently about their work trend index for 2025, where they declared that this was the year that the frontier firm is born. The frontier firm you might remember is where every employee becomes an agent boss, managing swarms of AI agents,
And so I'm going to be watching closely to see how Microsoft is painting a vision of how we get from where we are now to that.
And then there's Google. Unlike Microsoft, Google is still absolutely competing to be front and center and pushing the boundaries when it comes to actual AI models. And even if one argues that they are still behind where one might have imagined Google would be relative to these upstarts, given how much AI talent they've had for so long, it's hard to argue that they've had anything but an extremely excellent year since last year's I.O.,
Last year at this time, I know this sounds forever ago, but it was just one year ago, Gemini was doing things like suggesting glue as a pizza topping. But since then, Google has staged an enormous comeback. Gemini 2.5 Pro is a benchmark leader, with many people discussing how it, for the first time, pushed Anthropix Cloud.
Off the top of the heap when it came to coding use cases, Gemini's product range is competitive at every price point. Their agent previews have been impressive. The question is one of users. The company touts 1.5 billion users for AI overviews, but that's just embedded in Google search, not really a telling statistic.
They say they have 150 million subscribers through their Google One service, which is a 50% jump from last February, but that's also a shared product with their data services. They claim 350 million monthly active Gemini users, but that could include a fair number of pre-installed handsets. The double-edged sword of a company having big existing distribution is that there's some skepticism of how rich and deep the use actually is.
This is why people don't really pay attention when Zuckerberg touts how many people are using Meta's AI, because it's just in your face inside Instagram and Messenger and WhatsApp. And even at that 350 million number, that's still way behind ChatGPT. Now, it's clear that Google is not just going to concede the battle for consumer to open AI.
In fact, in the lead up to I.O., we've had a host of big releases. The company launched an updated version of Gemini 2.5 Pro that significantly improves its coding ability. We got a fascinating next generation tease from DeepMind about a coding agent that can optimize algorithms. They claim that the agent has cut Google's compute by 0.7% globally through code optimization. We've also seen some big updates to fan favorite Notebook LM, including the launch of a standalone app.
Now, all of those things could have been fodder for a major unveiling at the conference, but Google decided to roll them out early. Meanwhile, the pre-show coverage is really dabbling around the edges. It's sort of focused on new gadgets and features for Chrome and Android. TechRadar, for example, bundles everything into, quote, a ton of Gemini AI news. But it doesn't really seem like they're clear on what that might be.
Look, as I said, Google has done significant work over the last year to improve their place in the AI fight. And I'm very excited to see what they push out at I.O. and beyond. I do think that they find themselves sort of uncomfortably between pure consumer and pure enterprise. On one end of the spectrum, we have OpenAI, who's racing up to 800 million users thanks to Ghibli images, and
and is just super focused on consumer, although honestly making progress on enterprise in a way that we'll talk about in a minute. And on the other end of the spectrum, we have Microsoft, which just feels like they have total lock-in among enterprise users. Google sits somewhere in between. They're the enterprise choice for consumer-type smaller companies, SMEs, mid-markets. But I wonder if that middle space is actually making it harder for them to prioritize which AI stuff to care about.
And then there's Anthropic. Back at the beginning of April, Anthropic announced that they were hosting their first ever developer conference called Code with Claude. They wrote, Now they've given out almost no information aside from that.
And what we know for sure, and certainly if you've listened to this show, is that Anthropic has really cemented itself as the core choice for models to power coding tools and coding agents. The company continues to grow, and if it weren't for the just utter juggernaut of OpenAI, their numbers would be getting way, way more attention. In terms of expectations for what's coming this week, it's all about the new and updated models.
The information reported last week that according to their sources, who were people who had used these new models, Anthropic was going to announce new versions of its two largest models, Claude Sonnet and Claude Opus, and that these models were supposed to be able to go back and forth between thinking and reasoning and tool use. Writes the information, "...the key point, if one of these models is using a tool to try and solve a problem but gets stuck, it can go back to reasoning mode to think about what's going wrong and self-correct."
Also from the information, for people who use the new models to generate code, the models will automatically test the code they create to make sure it's running correctly. If there's a mistake, the models can stop and think about what might have gone wrong and then correct it. Continuing, they write, the new anthropic models are thus supposed to handle more complex tasks with less input and corrections from their human customers. That's useful in domains like software engineering, where you might want to provide a model with high-level instructions like make this app faster and let it run on its own to test out various ways of achieving that goal without a lot of hand-holding.
Now, if we get all of that from Anthropic, I think people will be extremely excited. And I think it shows just how important right now the battle around coding is as a core part of the larger AI competition. Today's episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context.
Which, if you don't know exactly what that means yet, do not worry, we're going to explain, and it's awesome. So Blitze is used alongside your favorite coding copilot as your batch software development platform for the enterprise, and it's meant for those who are seeking dramatic development acceleration on large-scale codebases. Traditional copilots help developers with line-by-line completions and snippets,
But Blitze works ahead of the IDE, first documenting your entire codebase, then deploying more than 3,000 coordinated AI agents working in parallel to batch build millions of lines of high-quality code for large-scale software projects. So then whether it's codebase refactors, modernizations, or bulk development of your product roadmap, the whole idea of Blitze is to provide enterprises dramatic velocity improvement.
To put it in simpler terms, for every line of code eventually provided to the human engineering team, Blitze will have written it hundreds of times, validating the output with different agents to get the highest quality code to the enterprise in batch. Projects then that would normally require dozens of developers working for months can now be completed with a fraction of the team in weeks, empowering organizations to dramatically shorten development cycles and bring products to market faster than ever.
If your enterprise is looking to accelerate software development, whether it's large-scale modernization, refactoring, or just increasing the rate of your STLC, contact Blitzy at blitzy.com, that's B-L-I-T-Z-Y dot com, to book a custom demo, or just press get started and start using the product right away. Today's episode is brought to you by Vertise Labs, the AI-native digital consulting firm specializing in product development and AI agents for small to medium-sized businesses.
Now guys, this is a market that we have seen so much interest for, so much demand for, and many times great AI dev shops and builders out there just have so much business from the high end of the mid-market and big enterprises that this is a group of buyers that gets neglected. Now for Vertise, AI native means that they don't just build AI, they use it in every step of their process. They embed agents in their workflows so that they better know how to help you embed agents in your workflows.
And indeed, what they specialize in is building AI agents and agentic workflows that augment knowledge work from customer support to internal ops so that your team can focus on higher value work. Vertise wants to ensure that this is not just another co-pilot, but something that works end-to-end, translating business problems into working software in weeks, not quarters.
They have found that their clients typically see a 60% reduction in time and cost, with significantly higher output than traditional technology partners. So if you are a founder, a CTO, a business leader, or you've just got a product idea to launch, check out verticelabs.io. That's V-E-R-T-I-C-E labs dot I-O.
Today's episode is brought to you by Superintelligent. Now, you have heard me talk about agent readiness audits probably numerous times at this point. This is our system that uses voice agents and a hybrid human AI analysis process to benchmark your agent readiness and map your agent opportunities and
and give you some really pointed, actionable next steps to move further down the path in your agentic journey. But we're coming up on the slow time of the year, and if you want to use this time to get out ahead of peers and competitors, we're excited to announce something we're calling Agent Summer. The idea here isn't that complicated. It's basically just an accelerated program to get you agentified and fast.
First of all, it's going to include an agent readiness audit, figuring out where your biggest agent opportunities are. Next, we're going to support both your internal change management process, helping you figure out AI policy, data readiness, things like that, as well as doing action planning around the agent opportunities that are most relevant for you. And finally, we're going to connect you to the right vendors to actually go and deliver this.
Now for this, we want to work with a very small handful of companies that really want to move. We're going to be bundling more than $50,000 of services for something that starts closer to $30,000. And so if you want to use this summer to jump ahead on your company's agent journey, email agent at besuper.ai with summer in the subject line, claim one of these limited spots, and let's go have an agent summer.
Well, let's actually talk just briefly about each of these different areas of competition. Like I said, coding is a huge one. This is one of the most essential in breakout use cases.
It's a use case that enables other use cases. It has dimensions both for technical people and developers because it's accelerating what they're able to do. And the tools that they use to do whatever it is that they're doing tend to find their way into the tools that people use to interact with what they're doing. But there's also the whole vibe coding piece of this, where we're also simultaneously seeing a massive expansion of who can participate in that sort of creation.
And so all of a sudden, it's not just the developers, or at least the traditional developers who have a stake in which of these models is best for coding, but also this new legion of vibe coders and solopreneurs.
Right now, we'll have to see if this new codex model starts to knock Anthropix models off the top of the heap. Interestingly, a couple of weeks ago, when Google announced Gemini 2.5 Pro I.O., there was a lot of chatter about how the benchmarks were better than things like Claude, and there was this whole question about whether we had a new king of AI coding. And while there's certainly been some positive buzz since then, by and large, I don't think that we've seen habits really shift.
Now, again, it's only been a couple of weeks, but given that we are about to get another update from Anthropic, it seems likely to me that that company retains their top dog status, at least for developers. But who knows? This is going to be one of the most, if not the most dynamic area of this competition. It's also closely related to this other area of competition in agents. And when I'm talking about agents, I'm actually talking about two different things simultaneously, or I should say probably at least two different things.
One of them are, of course, the end agents themselves, and the other are the platforms for building agents. Now on the platform side, this is the other area where Anthropic really has cemented its lead status. We did a whole show built off of latent spaces post YMCP1. That's a good primer on how essential model context protocol has become to the emerging field of agent building. But there are other areas of the agent infrastructure stack that other people are trying to compete for as well.
For example, in the beginning of April, we got Google announcing the agent-to-agent protocol, which is basically an agent communication protocol. You can tell how far MCP had come because Google, when they announced it, wrote, A2A is an open protocol that complements Anthropix model context protocol, which provides helpful tools and context to agents. So they are basically trying to build a different part of the agent infrastructure stack that MCP is not addressing. As we are watching these announcements from this week,
I would say watch to see what Anthropic says about MCP, watch to see what Google says about A2A, as well as any other agent infrastructure plays, and see if and how Microsoft talks about bringing any of these things into their ecosystem for enterprise builders, especially through Azure.
Enterprise and consumer actually make up another part of this competition. I talked before about how Microsoft sort of has a default pole position. And of course, they use their partnership with OpenAI to anchor that position in the early days of generative AI. Interestingly, it does appear that OpenAI is making up some major ground with the enterprise. Now, these stats are recent, but they do represent a particular slice of the market. This comes from Ramp's AI Index, which basically estimates business adoption of AI products by
by using RAMP's card and bill pay data. RAMP is not necessarily used by the biggest enterprises in the country, so this is going to represent more SMEs, startups, and some small mid-markets. But at least in this cohort, OpenAI is flourishing. There has been a massive increase in the percentage of U.S. companies that are using OpenAI's business subscription, from a little over 15% at the end of last year, all the way up to 32.4% now.
Anthropic has also jumped from around 4% at the end of the year, doubling to 8% now, but obviously still very far behind OpenAI. And Google has absolutely fallen off a cliff. Now, again, I want to caution that this is a very specific slice of the market. It doesn't represent everything. But the point for our purposes is that as we are thinking about AI competition, enterprise is a very particular subset of that competition.
Now on consumer, we touched on it before, but here OpenAI just continues to be the absolute total leader. It continues to be the case that for many normies, chat GPT and AI are synonymous. And OpenAI has recently had a burst of new users thanks to things like their new image model and the Ghibli-style image generation meme, which exploded all over X and other platforms. And basically it sounds like OpenAI is somewhere around 800 million weekly active users right now. And we don't know exactly what that number is.
and how much it's peeled off since the Ghibli trend ended, but it's still so much bigger than anything else out there. What's more, OpenAI is very clearly doubling down on their consumer lead, announcing that they were bringing in Instacart CEO Fiji Simo as their new CEO of applications, basically their CEO for the actual business stuff.
Now, interestingly, coming back to agents, I said that there were two aspects of agent competition. One was the infrastructure, things like MCP and A2A, but the other side is the end agents themselves. And of the big labs, so far, OpenAI and Google seem like the two that really want to compete with end agent products, and OpenAI even more strongly than Google.
I think that these companies understand that the moat is in owning the customer relationship and that there's going to be a huge amount of commoditization, volatility, and switching when it comes to models. And so I think that when OpenAI is thinking about agents, they're not just trying to be the models that power agents. They're also thinking about actually owning the agents themselves.
having the best deep research agent, having the best computer use agent and operator, having the best coding agent now. My sense is that that's a battle that they're trying to have, and it is actually directly related to their leadership in consumer.
Now, lastly, for the sake of completeness, as we think about AI competition, if you were just going by news articles, you might think that it was all about benchmarks. However, as I record this, I don't think that benchmarks have ever had a lower place in the consideration of users. Back a few months ago, in a Reddit thread in the Lama community, a poster wrote something called, I'm starting to think AI benchmarks are useless. Across every possible task I can think of, Claude beats all other models by a wide margin. I have three AI agents that I've built that are tasked with researching, writing, and outreaching to clients.
Claude absolutely wipes the floor with every other model. Yet Claude is usually beaten benchmarks by OpenAI and Google models. They then get into speculating on why, but this is definitely broadly the perception. Now, interestingly, I think that we might be hitting sort of a floor or a nadir on how much people don't care about benchmarks. And I think that part of where we might go is starting to see more specific, discrete benchmarks or evals for particular use cases. For example, very randomly as I was posting this, I
I saw that Tiny founder Andrew Wilkinson had wrote, I just saved around $5,000 by drafting a legal agreement with Gemini 2.5 Pro, which ranks number one on LegalBench. Now, the rest of his tweet is about the disruption coming for lawyers. But what's interesting is that he very clearly cared about accuracy benchmarks.
that did influence his choice as a builder. And so I think that to the extent that benchmarks can actually be useful for entrepreneurs and developers, they have some utility, it's just really not going to be around general consumers or even general enterprises, I think, switching between models because they scored higher on a benchmark.
Ultimately, proof is in the pudding and it's all about practice. So summing up, we have a very big week coming. Microsoft Build, Google I.O., Anthropix Code with Claude, OpenAI trying to needle in and get their stamp on the conversation before. And if you want to just keep a crib sheet of where they stand relative to AI competition, I'd encourage you to think about it in these dimensions. Again, consumer, coding, agents, enterprise, and benchmarks.
For now, though, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching, as always. And until next time, peace.