Today on the AI Daily Brief, Google's latest flagship model. And before that in the headlines, OpenAI updates how it shows O3 Mini's chain of thought. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes.
Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. Apologies for the voice, but we persevere. One of the really interesting things about DeepSeek launching was that it seemed pretty inevitable that it was going to put competitive pressure on the big American labs to do things in slightly different ways. Obviously, one part of that pressure was going to be financial, with labs having to try to bring the cost down to match DeepSeek, but it also seemed like there was likely to be some UI copying that would go on as well.
We have now officially seen that with OpenAI updating the visible chain of thought of their reasoning models. All free and paid-to-your users will have a more complete view of the thinking from models like O1 and O3 Mini as they produce a response. OpenAI told TechCrunch, We're introducing an updated chain of thought for O3 Mini designed to make it easier for people to understand how the model thinks. With this update, you'll be able to follow the model's reasoning, giving you more clarity and confidence in its responses.
This was very notably one of the things that regular people really liked about DeepSeek's R1 when it came to the App Store. Caspian, at SplitByComma, shared a conversation where someone had said, DeepSeek is so cute too, it shares its thought process. To which someone responded, it's super cute the way it talks to itself. But in addition to just being cute, it obviously builds a lot of trust in being able to understand how it gets to the answer that it ultimately produces. OpenAI had previously been displaying briefer summaries on the chain of thought. This was likely done for competition reasons.
with researchers having found that it's extremely cheap and easy to duplicate reasoning functionality through distillation. Still, full reasoning samples won't be shown for O3 Mini. Officially, OpenAI says this is a safety precaution, commenting, "...to improve clarity and safety, we've added an additional post-processing step where the model reviews the raw chain of thought, removes any unsafe content, and then simplifies any complex ideas."
However, Chief Product Officer Kevin Wheel did suggest that it was probably still an anti-distillation measure. During a Reddit AMA last week, he said, We're working on showing a bunch more than we show today, TBD on all. Showing all chain of thought leads to competitive distillation, but we also know people, at least power users, want it, so we'll find the right way to balance it. Staying on the OpenAI theme, Project Stargate is close to selecting more data center sites in Texas and turning to other states.
Bloomberg reports that OpenAI is, quote, far along in the process of picking several locations in Texas for massive data center projects, according to a company spokesperson. The first site is already under construction in Abilene, Texas, and reportedly staff from OpenAI and SoftBank are evaluating potential sites in 13 more states, visiting locations in Pennsylvania, Wisconsin, and Oregon this week.
Each site is expected to function at around 1 gigawatt of capacity. That's roughly the output of a nuclear reactor and six times larger than the rumored capacity of XAI's Colossus supercluster.
Chris Lehane, OpenAI's Vice President of Global Affairs, said the joint venture had been bombarded with enormous interest from elected officials across the country following their White House announcement. OpenAI has begun openly courting proposals as of last week to accelerate their site selection process. Lehane promoted the project to state governments and municipalities, stating, "You become an AI hub, and that starts to bring in developers, that starts to bring in other companies, that starts to bring in folks who want to be part of that broader ecosystem."
Next up, our oh-my-god-I-can't-believe-it's-taken-this-long story for the day. The long-awaited AI Alexa could be nearing release. According to Reuters sources, Amazon has scheduled a press conference later this month to unveil the new, smarter version of Alexa. They said the press invites have been sent for an event on February 26th. AI Alexa is a new, smarter version of Alexa.
AI Alexa has been one of the most anticipated AI products since it was first shown off in late 2023. Honestly, it's been one of the most anticipated products since we saw ChatGPT. An LLM powering Alexa is just obviously going to give such a better experience. However, there has been a string of delays and reports on the trouble development process. Last October, Bloomberg ran a feature listing a host of issues from hallucinations to high cost to bureaucratic friction at the company. And the explanation that really stood out was that the prototypes just weren't very good at Alexa's core functions.
Bloomberg wrote, top engineers and testers involved with the effort say the AI-enhanced assistant can still drone on with irrelevant or superfluous information and struggles with humdrum tasks it previously excelled at like turning on and off the lights.
Frankly, this Reuters reporting doesn't instill a lot of confidence that the product is truly ready for primetime. They wrote, executives have scheduled a meeting known as a go-no-go for February 14th. There they will make a final decision on the street readiness of Alexa's generative AI revamp, according to the people and an internal planning document. So who knows, maybe this will actually just be another delay.
Lastly today, about six months after OpenAI co-founder John Shulman left that company to join Anthropic, he has now left Anthropic to join former OpenAI CTO Mira Murady's new startup. We still don't have any idea what the startup, which is called Thinking Machines Lab, is going to do, but they are definitely scooping some serious talent, and so it is worth keeping a close eye on.
For now, though, that is going to do it for today's AIDA Learreef Headlines Edition. Next up, the main episode. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in.
Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company.
Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vantage to manage risk and improve security in real time.
If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents by industry, horizontal agent platforms, agent-based platforms.
agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.
That's why Super Intelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.
If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI Quarterly Pulse Survey.
Did you know that 67% of business leaders expect AI to fundamentally transform their businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI, they're leading the charge with practical solutions and real-world applications.
For instance, over half of the organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation, they're helping shape it. Learn more about how KPMG is driving AI innovation at kpmg.com slash US.
Welcome back to the AI Daily Brief. Apologies and bear with me for the voice. Hopefully we are back to normal by the beginning of next week. Today we are talking about Google's latest model release, which of course is inescapably going to be compared to DeepSeek as the model du jour. We're going to discuss both how the model stacks up, what people's reactions to it are, and what it all means for this question of pre-training versus scaling at the point of inference.
So, TL;DR, Google has released their latest flagship model, Gemini 2.0 Pro. The model is still labeled as experimental and is available for early testing through Google AI Studio and Vertex AI. In addition, Google has made their leading reasoning model, Gemini 2.0 Flash Thinking, generally available to all app users. And viewing this through the lens of that competition with DeepSeek, Gemini's move to put Flash Thinking in their mainline product has had them follow suit to make a reasoning model easily available for regular users.
During the DeepSeek news cycle a week or two ago, many pointed out that Google's model was basically on par in both cost and performance, but having the model hidden away in AI Studio obviously hurt distribution and broader awareness. A couple areas where it feels like Google has a meaningful advantage? Neither DeepSeek R1 nor OpenAI's O3 Mini accept image or document uploads as inputs, and Gemini also has native integrations with Google Maps, YouTube, and Search, so can start to handle some basic agentic functions related to those apps.
When it comes to Gemini 2.0 Pro, the company's flagship model is optimized for coding and complex prompts. Google continues to hammer their industry-leading token context window, now up to 2 million tokens, which allows it to take about seven books as input, or significant chunks of even the largest codebase. Standard benchmarking ranks the model below OpenAI's 01 or 03 mini on high and medium settings.
It's also below Gemini Flash 2.0 thinking. The AI for Success account, however, highlighted the issue with comparing these models, posting, "'Why on earth would you compare Gemini 2.0 Pro with O3? Gemini 2.0 Pro isn't a reasoning model.'" Looks like Google should have just launched a Gemini 2.0 Pro reasoning model instead of this. It's still the best non-reasoning model available. So that is the model, but what's more interesting is what it says about the state of frontier AI more broadly.
This is the first flagship model Google released since concerns were raised about pre-training hitting a wall last November. Presumably, this was the updated version of Gemini that sources told Bloomberg was not living up to internal expectations. Binduretti was very quick to declare that Google had hit the wall as the model lagged behind advanced reasoning models and the benchmarks. She wrote, "The new Gemini 2.0 Pro underperforms 03, 01, and R1. Pre-training seems to have hit a wall and all the gains are coming from scaling inference.
Gemini 2.0 Pro falls behind its much smaller Gemini 2.0 Flash thinking version. Overall, it doesn't shine on coding either and is still behind Sonnet 3.5.
Logan Kilpatrick, product lead at Google AI Studio, seemed to disagree, commenting, no wall in sight. Then again, it's always risky to talk too much about benchmarks versus real-world use. On that front, Professor Ethan Malek seemed reasonably impressed, writing, I gave Gemini 2.0 the prompt to create something I can paste into P5.js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future. The result was pretty good, a viewport with stars whizzing past, a few well-animated dials and controls.
Ethan noted that Gemini 1.5 Pro couldn't do it. Others tried the real-world physics test of animating a bouncing ball inside a rotating hexagon. Again, pretty good and better than the results from Gemini 2.0 Flash Thinking. I Rule the World MO, the Strawberry Leaker account, posted the results of their Pac-Man game coding test, writing, Okay, I have to be fair. Other than what I suspect was full O3, this is easily the best one-shot Pac-Man game I've hit.
AI scientist Mark Watson wrote, I just tried Gemini 2.0 for the first time with a complex Python coding request. The code was beautiful and worked. Might have been close to the most impressive code generation I have seen. At least on par with O3 Mini High that also did a perfect job with the same coding problem. That was just one test, but I'm delighted with both models. On the other hand, the model fails the strawberry test thinking the word has only two R's.
Now, the silly strawberry test kind of highlights one of the issues that we face when it comes to determining how good new models are. Basically, benchmarks at this point are pretty soaked, and they really don't tell us all that much about model performance. When GPT-4 was released, it was a gigantic step up from GPT-3, and 01 brought a similar feeling of progress by adding the new modality of reasoning. But now models are broadly good enough to produce great results for all of the standard use cases. Ultimately, Gemini 2.0 Pro seems to be very good, but not a step change in performance.
Then again, at this point, AI consumers don't seem to be looking for a step change in performance. Indeed, perhaps the main axis of competition recently is speed and cost.
There are some users who want the best reasoning model with long inference times to produce PhD-level reports. By and large, AI developers are looking for the cheapest and fastest API they can find to power their app ideas. In that world, adding a few points to a benchmark score is far less important than cutting the price in half. Google have been competing hard on that front with their smaller models. So we'll see if their full release helps capture additional market share.
My sense is that at this point, unless we see some major, major capability changes, most all of these incremental improvements are going to be much less impactful from a news cycle standpoint than they might have been a year or two ago. That's going to do it for today's AI Daily Brief. Appreciate you listening or watching as always. And until next time, peace.