Today on the AI Daily Brief, just how fast is AI evolving? The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Hello, friends. Back with another long reads episode of the AI Daily Brief. Today, we turn once again to Professor Ethan Malek's One Useful Thing blog, reading a piece from now a couple of weeks ago called Prophecies of the Flood, What to Make of the Statements of the AI Labs.
The setup and context of the piece is something that we've been talking about a lot on this show for the last month or so, which is the sense that the tide is rising to continue the water analogy, that AGI is getting closer and closer, the capabilities are increasing, and that something big is on the horizon. As usual, what we're going to do today is turn it over to an Eleven Labs version of myself to read the piece, and then I will come back and give you some thoughts of my own to close it out.
Prophecies of the Flood Recently, something shifted in the AI industry. Researchers began speaking urgently about the arrival of super-smart AI systems, a flood of intelligence, not in some distant future, but imminently. They often refer to AGI, artificial general intelligence, defined, albeit imprecisely, as machines that can outperform expert humans across most intellectual tasks. This availability of intelligence on demand will, they argue, change society deeply and will change it soon.
There are plenty of reasons to not believe insiders as they have clear incentives to make bold predictions. They're raising capital, boosting stock valuations, and perhaps convincing themselves of their own historical importance. They're technologists, not prophets, and the track record of technological predictions is littered with confident declarations that turned out to be decades premature. Even setting aside these human biases, the underlying technology itself gives us reason for doubt.
Today's large language models, despite their impressive capabilities, remain fundamentally inconsistent tools, brilliant at some tasks while stumbling over seemingly simpler ones. This jagged frontier is a core characteristic of current AI systems, one that won't be easily smoothed away. Plus, even assuming researchers are right about reaching AGI in the next year or two, they are likely overestimating the speed at which humans can adopt and adjust to a technology.
Changes to organizations take a long time. Changes to systems of work, life, and education are slower still. And technologies need to find specific uses that matter in the world, which is itself a slow process.
We could have AGI right now and most people wouldn't notice. Indeed, some observers have suggested that has already happened, arguing that the latest AI models like CLAW 3.5 are effectively AGI-1. Yet dismissing these predictions as mere hype may not be helpful. Whatever their incentives, the researchers and engineers inside AI labs appear genuinely convinced they're witnessing the emergence of something unprecedented.
Their certainty alone wouldn't matter, except that increasingly public benchmarks and demonstrations are beginning to hint at why they might believe we're approaching a fundamental shift in AI capabilities. The water, as it were, seems to be rising faster than expected. The event that kicked off the most speculation was the reveal of a new model by OpenAI called O3 in late December. No one outside of OpenAI has really used this system yet, but it is the successor to O1, which is already very impressive too.
The O3 model is one of the new generation of reasoners, AI models that take extra time to think before answering questions, which greatly improves their ability to solve hard problems.
OpenAI provided a number of startling benchmarks for 03 that suggest a large advance over 01 and indeed over where we thought the state-of-the-art in AI was. Three benchmarks in particular deserve a little attention. The first is the called the Graduate-Level Google-Proof QANDA Test, GPQA, and it is supposed to test high-level knowledge with a series of multiple-choice problems that even Google can't help you with.
PhDs with access to the internet got 34% of the questions right on this test outside their specialty, and 81% right inside their specialty. When tested, O3 achieved 87% beating human experts for the first time. The second is frontier math, a set of private math problems created by mathematicians to be incredibly hard to solve. And indeed, no AI ever scored higher than 2% until O3, which got 25% right.
The final benchmark is Arc-AGI, a rather famous test of fluid intelligence that was designed to be relatively easy for humans, but hard for AIs. Again, O3 beat all previous AIs as well as the baseline human level on the test, scoring 87.5%. All of these tests come with significant caveats, but they suggest that what we previously considered unpassable barriers to AI performance may actually be beaten quite quickly.
As AIs get smarter, they become more effective agents, another ill-defined term, see a pattern, that generally means an AI given the ability to act autonomously towards achieving a set of goals. I have demonstrated some of the early agentic systems in previous posts, but I think the past few weeks have also shown us that practical agents, at least for narrow but economically important areas, are now viable.
A nice example of that is Google's Gemini with Deep Research, accessible to everyone who subscribes to Gemini, which is really a specialized research agent. I gave it a topic like research a comparison of ways of funding startup companies from the perspective of founders for high growth ventures. And the agentic system came up with a plan, read through 173 websites, and compiled a report for me with the answer a few minutes later.
The result was a 17-page paper with 118 references. But is it any good? I've taught the introductory entrepreneurship class at Wharton for over a decade, published on the topic, started companies myself, and even wrote a book on entrepreneurship, and I think this is pretty solid. I didn't spot any obvious errors, but you can read it yourself if you would like here. The biggest issue is not accuracy, but that the agent is limited to public non-paywalled websites and not scholarly or premium publications.
It also is a bit shallow and does not make strong arguments in the face of conflicting evidence. So not as good as the best humans, but better than a lot of reports that I see.
Still, this is a genuinely disruptive example of an agent with real value. Researching and report writing is a major task of many jobs. What deep research accomplished in three minutes would have taken a human many hours, though they might have added more nuanced analysis. Anyone writing a research report should probably try deep research and see how it works as a starting place, even though a good final report will still require a human touch.
I had a chance to speak with the leader of the Deep Research Project, where I learned that it is just a pilot project from a small team. I thus suspect that other groups and companies that were highly incentivized to create narrow but effective agents would be able to do so. Narrow agents are now a real product rather than a future possibility.
There are already many coding agents, and you can use experimental open source agents that do scientific and financial research. Narrow agents are specialized for a particular task, which means they are somewhat limited. That raises the question of whether we soon see generalist agents where you can just ask the AI anything, and it will use a computer and the internet to do it.
Simon-Willison thinks not, despite what Sam Altman has argued. We will learn more as the year progresses, but if general agentic systems work reliably and safely, that really will change things, as it allows smart AIs to take action in the world. Agents and very smart models are the core elements needed for transformative AI, but there are many other pieces as well that seem to be making rapid progress. This includes advances in how much AIs can remember, context windows, and multimodal capabilities that allow them to see and speak.
It can be helpful to look back a little to get a sense of progress. For example, I have been testing the prompt "otter on a plane using Wi-Fi" for image and video models since before ChatGPT came out. In October 2023, that prompt got you this terrifying monstrosity. Less than 18 months later, multiple image creation tools nail the prompt. The result is that I have had to figure out something more challenging. This is an example of benchmark saturation, where old benchmarks get beaten by the AI.
I decided to take a few minutes and see how far I could get with Google's VO2 video model in producing a movie of the otter's journey. The video you see below took less than 15 minutes of active work, although I had to wait a bit for the videos to be created. Take a look at the quality of the shadows and light. I especially appreciate how the otter opens the computer at the end. And to up the ante even further, I decided to turn the saga of the otter into a 1980s-style science fiction anime featuring otters in space and a period-appropriate theme song,
Thanks to Suno, again, very little human work was involved.
Given all of this, how seriously should we take the claims of the AI labs that a flood of intelligence is coming? Even if we only consider what we've already seen, the O3 benchmarks shattering previous barriers, narrow agents conducting complex research, and multimodal systems creating increasingly sophisticated content. We're looking at capabilities that could transform many knowledge-based tasks. And yet the labs insist this is merely the start, that far more capable systems and general agents are imminent.
What concerns me most isn't whether the labs are right about this timeline. It's that we're not adequately preparing for what even current levels of AI can do, let alone the chance that they might be correct. While AI researchers are focused on alignment, ensuring AI systems act ethically and responsibly, far fewer voices are trying to envision and articulate what a world awash in artificial intelligence might actually look like. This isn't just about the technology itself. It's about how we choose to shape and deploy it.
These aren't questions that AI developers alone can or should answer. They're questions that demand attention from organizational leaders who will need to navigate this transition, from employees whose work lives may transform, and from stakeholders whose futures may depend on these decisions. The flood of intelligence that may be coming isn't inherently good or bad, but how we prepare for it, how we adapt to it, and most importantly, how we choose to use it, will determine whether it becomes a force for progress or disruption.
The time to start having these conversations isn't after the water starts rising. It's now. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in.
Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company.
Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vantage to manage risk and improve security in real time.
If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents buy industry horizontal agent platforms, ADCs.
Agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.
That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.
If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI Quarterly Pulse Survey.
Did you know that 67% of business leaders expect AI to fundamentally transform their businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI, they're leading the charge with practical solutions and real-world applications.
For instance, over half of the organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation, they're helping shape it. Learn more about how KPMG is driving AI innovation at kpmg.com slash US. All right, back to the real non-AI NLW here.
As usual, Ethan does a great job, I think, of summing up a lot of what's going on as well as a lot of the sentiment out there.
It has definitely been the case that a vibe has shifted. The labs seem more and more comfortable and even eager to talk about how quickly AGI is coming. Reasoning models are the watchword of the moment. And there have also been some advances that make it feel like not only are these things coming, but they're likely to be very widely accessible. The Chinese model DeepSeq, which has everyone in such a tizzy here because of how close it performs to open AI models at a tiny fraction of the cost,
has everyone thinking even more about what the implications of incredibly cheap and abundant intelligence really are.
Also, since this piece was released, we got the release of OpenAI's Operator, which, while still limited in what it can do, is the sort of generalist agent that Ethan is talking about. There was an interesting interview with venture capitalist Chris Saka earlier this week with Tim Ferriss, where Saka became the latest person to articulate just how disruptive this wave of new intelligence and cheap and abundant intelligence could really be when it comes to people's jobs and livelihoods.
As I've said before, I think that this transition, while hugely full of potential, will require nothing less than a total re-evaluation of the social contract, a new way of thinking about work, a new way of thinking about expectations, a new way of thinking about how we judge her own value, and so much more.
I don't know if things are moving faster or if it just feels like it. I do think that things that have been theoretical for some number of years are now moving into production. I think that this year we're going to see more and more people actually deploying agents in a way that makes the assistant era of AI look quaint. And I agree with Ethan wholeheartedly that the time to be having these conversations about what we want out of a society that has AI embedded is now. I don't think we're turning back the tide, but that doesn't mean that we have no agency in the world that's being created.
Big ponderous thoughts for your weekend. And with that, we will close the AI Daily Brief. Appreciate you listening as always. And until next time, peace.