We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Just How Fast is AI Evolving?

2025/1/26

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript

People

播

播客主持人

播客主持人，专注于英语学习和金融话题讨论，组织了英语学习营，并深入探讨了比特币和美元的关系。

文

文章作者（通过播客朗读）

Topics

文章作者（通过播客朗读）：我观察到人工智能领域发生了转变，研究人员越来越相信超级智能AI系统即将到来，这将深刻且迅速地改变社会。虽然存在一些不确定性，例如当前大型语言模型的能力不稳定以及人类对新技术的适应速度可能被高估，但公开的基准测试和演示表明，AI能力正在发生根本性转变，其发展速度可能比预期更快。新一代的推理模型在解决难题方面能力显著提高，例如OpenAI的O3模型在多个基准测试中超越了人类专家和之前的AI模型。随着AI变得更智能，它们也成为了更有效的智能体，能够自主行动以实现既定目标，并在一些经济领域已经具有实用价值，例如Google的Gemini with Deep Research。狭义的智能体已经成为现实产品，能够显著提高工作效率，未来可能出现通用的智能体。除了智能体和智能模型，其他一些AI技术也在快速发展，例如记忆能力、上下文窗口和多模态能力。AI的快速发展，特别是O3模型的突破、狭义智能体的应用以及多模态系统的进步，将改变许多知识型工作。我们应该关注的是如何应对当前和未来AI的能力，而不是仅仅关注AI实验室的预测时间线。我们应该积极思考如何应对AI带来的机遇和挑战，并决定如何利用AI的力量。播客主持人：我同意文章作者的观点，AI领域确实发生了转变，AI实验室对AGI到来速度的预测越来越大胆，并且一些进步使得AI技术更容易被广泛使用。廉价且丰富的智能资源将对人们的工作和生活产生重大影响。新一代的智能技术将对人们的工作和生活方式产生颠覆性影响。AI的快速发展需要我们重新评估社会契约，并重新思考工作、预期和价值判断等问题。AI技术正在从理论走向实际应用，我们应该积极讨论如何在AI时代构建理想的社会。2025年AI领域的一个明确趋势是智能体的出现，企业应该开始尝试使用智能体。大型企业明年将开始尝试使用智能体，并需要做好准备工作。

Deep Dive

Chapters

This chapter explores the increasing belief among AI researchers that AGI is imminent, despite reasons for doubt such as the inconsistency of current AI models and the slow pace of human adoption of new technologies. It also considers the possibility that AGI may already exist but hasn't been widely noticed.

Increased urgency among researchers about the arrival of super-smart AI systems.
AGI defined as machines outperforming expert humans across most intellectual tasks.
Reasons to doubt insider predictions: incentives, technological track record, inconsistency of current AI models.
Slow pace of human adoption and adjustment to new technologies.

Shownotes Transcript

Translations:

中文

Today on the AI Daily Brief, just how fast is AI evolving? The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Hello, friends. Back with another long reads episode of the AI Daily Brief. Today, we turn once again to Professor Ethan Malek's One Useful Thing blog, reading a piece from now a couple of weeks ago called Prophecies of the Flood, What to Make of the Statements of the AI Labs.

The setup and context of the piece is something that we've been talking about a lot on this show for the last month or so, which is the sense that the tide is rising to continue the water analogy, that AGI is getting closer and closer, the capabilities are increasing, and that something big is on the horizon. As usual, what we're going to do today is turn it over to an Eleven Labs version of myself to read the piece, and then I will come back and give you some thoughts of my own to close it out.

Prophecies of the Flood Recently, something shifted in the AI industry. Researchers began speaking urgently about the arrival of super-smart AI systems, a flood of intelligence, not in some distant future, but imminently. They often refer to AGI, artificial general intelligence, defined, albeit imprecisely, as machines that can outperform expert humans across most intellectual tasks. This availability of intelligence on demand will, they argue, change society deeply and will change it soon.

There are plenty of reasons to not believe insiders as they have clear incentives to make bold predictions. They're raising capital, boosting stock valuations, and perhaps convincing themselves of their own historical importance. They're technologists, not prophets, and the track record of technological predictions is littered with confident declarations that turned out to be decades premature. Even setting aside these human biases, the underlying technology itself gives us reason for doubt.

Today's large language models, despite their impressive capabilities, remain fundamentally inconsistent tools, brilliant at some tasks while stumbling over seemingly simpler ones. This jagged frontier is a core characteristic of current AI systems, one that won't be easily smoothed away. Plus, even assuming researchers are right about reaching AGI in the next year or two, they are likely overestimating the speed at which humans can adopt and adjust to a technology.

Changes to organizations take a long time. Changes to systems of work, life, and education are slower still. And technologies need to find specific uses that matter in the world, which is itself a slow process.

We could have AGI right now and most people wouldn't notice. Indeed, some observers have suggested that has already happened, arguing that the latest AI models like CLAW 3.5 are effectively AGI-1. Yet dismissing these predictions as mere hype may not be helpful. Whatever their incentives, the researchers and engineers inside AI labs appear genuinely convinced they're witnessing the emergence of something unprecedented.

Their certainty alone wouldn't matter, except that increasingly public benchmarks and demonstrations are beginning to hint at why they might believe we're approaching a fundamental shift in AI capabilities. The water, as it were, seems to be rising faster than expected. The event that kicked off the most speculation was the reveal of a new model by OpenAI called O3 in late December. No one outside of OpenAI has really used this system yet, but it is the successor to O1, which is already very impressive too.

The O3 model is one of the new generation of reasoners, AI models that take extra time to think before answering questions, which greatly improves their ability to solve hard problems.

OpenAI provided a number of startling benchmarks for 03 that suggest a large advance over 01 and indeed over where we thought the state-of-the-art in AI was. Three benchmarks in particular deserve a little attention. The first is the called the Graduate-Level Google-Proof QANDA Test, GPQA, and it is supposed to test high-level knowledge with a series of multiple-choice problems that even Google can't help you with.

PhDs with access to the internet got 34% of the questions right on this test outside their specialty, and 81% right inside their specialty. When tested, O3 achieved 87% beating human experts for the first time. The second is frontier math, a set of private math problems created by mathematicians to be incredibly hard to solve. And indeed, no AI ever scored higher than 2% until O3, which got 25% right.

The final benchmark is Arc-AGI, a rather famous test of fluid intelligence that was designed to be relatively easy for humans, but hard for AIs. Again, O3 beat all previous AIs as well as the baseline human level on the test, scoring 87.5%. All of these tests come with significant caveats, but they suggest that what we previously considered unpassable barriers to AI performance may actually be beaten quite quickly.

As AIs get smarter, they become more effective agents, another ill-defined term, see a pattern, that generally means an AI given the ability to act autonomously towards achieving a set of goals. I have demonstrated some of the early agentic systems in previous posts, but I think the past few weeks have also shown us that practical agents, at least for narrow but economically important areas, are now viable.

A nice example of that is Google's Gemini with Deep Research, accessible to everyone who subscribes to Gemini, which is really a specialized research agent. I gave it a topic like research a comparison of ways of funding startup companies from the perspective of founders for high growth ventures. And the agentic system came up with a plan, read through 173 websites, and compiled a report for me with the answer a few minutes later.

The result was a 17-page paper with 118 references. But is it any good? I've taught the introductory entrepreneurship class at Wharton for over a decade, published on the topic, started companies myself, and even wrote a book on entrepreneurship, and I think this is pretty solid. I didn't spot any obvious errors, but you can read it yourself if you would like here. The biggest issue is not accuracy, but that the agent is limited to public non-paywalled websites and not scholarly or premium publications.

It also is a bit shallow and does not make strong arguments in the face of conflicting evidence. So not as good as the best humans, but better than a lot of reports that I see.

Still, this is a genuinely disruptive example of an agent with real value. Researching and report writing is a major task of many jobs. What deep research accomplished in three minutes would have taken a human many hours, though they might have added more nuanced analysis. Anyone writing a research report should probably try deep research and see how it works as a starting place, even though a good final report will still require a human touch.

I had a chance to speak with the leader of the Deep Research Project, where I learned that it is just a pilot project from a small team. I thus suspect that other groups and companies that were highly incentivized to create narrow but effective agents would be able to do so. Narrow agents are now a real product rather than a future possibility.

There are already many coding agents, and you can use experimental open source agents that do scientific and financial research. Narrow agents are specialized for a particular task, which means they are somewhat limited. That raises the question of whether we soon see generalist agents where you can just ask the AI anything, and it will use a computer and the internet to do it.

Simon-Willison thinks not, despite what Sam Altman has argued. We will learn more as the year progresses, but if general agentic systems work reliably and safely, that really will change things, as it allows smart AIs to take action in the world. Agents and very smart models are the core elements needed for transformative AI, but there are many other pieces as well that seem to be making rapid progress. This includes advances in how much AIs can remember, context windows, and multimodal capabilities that allow them to see and speak.

It can be helpful to look back a little to get a sense of progress. For example, I have been testing the prompt "otter on a plane using Wi-Fi" for image and video models since before ChatGPT came out. In October 2023, that prompt got you this terrifying monstrosity. Less than 18 months later, multiple image creation tools nail the prompt. The result is that I have had to figure out something more challenging. This is an example of benchmark saturation, where old benchmarks get beaten by the AI.

I decided to take a few minutes and see how far I could get with Google's VO2 video model in producing a movie of the otter's journey. The video you see below took less than 15 minutes of active work, although I had to wait a bit for the videos to be created. Take a look at the quality of the shadows and light. I especially appreciate how the otter opens the computer at the end. And to up the ante even further, I decided to turn the saga of the otter into a 1980s-style science fiction anime featuring otters in space and a period-appropriate theme song,

Thanks to Suno, again, very little human work was involved.

Given all of this, how seriously should we take the claims of the AI labs that a flood of intelligence is coming? Even if we only consider what we've already seen, the O3 benchmarks shattering previous barriers, narrow agents conducting complex research, and multimodal systems creating increasingly sophisticated content. We're looking at capabilities that could transform many knowledge-based tasks. And yet the labs insist this is merely the start, that far more capable systems and general agents are imminent.

What concerns me most isn't whether the labs are right about this timeline. It's that we're not adequately preparing for what even current levels of AI can do, let alone the chance that they might be correct. While AI researchers are focused on alignment, ensuring AI systems act ethically and responsibly, far fewer voices are trying to envision and articulate what a world awash in artificial intelligence might actually look like. This isn't just about the technology itself. It's about how we choose to shape and deploy it.

These aren't questions that AI developers alone can or should answer. They're questions that demand attention from organizational leaders who will need to navigate this transition, from employees whose work lives may transform, and from stakeholders whose futures may depend on these decisions. The flood of intelligence that may be coming isn't inherently good or bad, but how we prepare for it, how we adapt to it, and most importantly, how we choose to use it, will determine whether it becomes a force for progress or disruption.

The time to start having these conversations isn't after the water starts rising. It's now. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in.

Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company.

Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vantage to manage risk and improve security in real time.

If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents buy industry horizontal agent platforms, ADCs.

Agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.

That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.

If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI Quarterly Pulse Survey.

Did you know that 67% of business leaders expect AI to fundamentally transform their businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI, they're leading the charge with practical solutions and real-world applications.

For instance, over half of the organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation, they're helping shape it. Learn more about how KPMG is driving AI innovation at kpmg.com slash US. All right, back to the real non-AI NLW here.

As usual, Ethan does a great job, I think, of summing up a lot of what's going on as well as a lot of the sentiment out there.

It has definitely been the case that a vibe has shifted. The labs seem more and more comfortable and even eager to talk about how quickly AGI is coming. Reasoning models are the watchword of the moment. And there have also been some advances that make it feel like not only are these things coming, but they're likely to be very widely accessible. The Chinese model DeepSeq, which has everyone in such a tizzy here because of how close it performs to open AI models at a tiny fraction of the cost,

has everyone thinking even more about what the implications of incredibly cheap and abundant intelligence really are.

Also, since this piece was released, we got the release of OpenAI's Operator, which, while still limited in what it can do, is the sort of generalist agent that Ethan is talking about. There was an interesting interview with venture capitalist Chris Saka earlier this week with Tim Ferriss, where Saka became the latest person to articulate just how disruptive this wave of new intelligence and cheap and abundant intelligence could really be when it comes to people's jobs and livelihoods.

As I've said before, I think that this transition, while hugely full of potential, will require nothing less than a total re-evaluation of the social contract, a new way of thinking about work, a new way of thinking about expectations, a new way of thinking about how we judge her own value, and so much more.

I don't know if things are moving faster or if it just feels like it. I do think that things that have been theoretical for some number of years are now moving into production. I think that this year we're going to see more and more people actually deploying agents in a way that makes the assistant era of AI look quaint. And I agree with Ethan wholeheartedly that the time to be having these conversations about what we want out of a society that has AI embedded is now. I don't think we're turning back the tide, but that doesn't mean that we have no agency in the world that's being created.

Big ponderous thoughts for your weekend. And with that, we will close the AI Daily Brief. Appreciate you listening as always. And until next time, peace.

Just How Fast is AI Evolving? 14:47 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

Just How Fast is AI Evolving?