We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

2025 AI Battlelines: Agents, Reasoning, and World Models

2024/12/21

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Insights AI Chapters Transcript

People

主

主持人

专注于电动车和能源领域的播客主持人和内容创作者。

谷

谷歌发言人

Topics

主持人：本文探讨了2025年AI领域的竞争格局，主要集中在推理模型、智能体和世界模型三个方面。首先，搜索引擎领域出现了新的竞争，Perplexity、ChatGPT和谷歌都在积极发展AI搜索功能。谷歌计划在其搜索中添加AI模式选项，允许用户通过语音和图片等方式进行搜索。未来搜索体验将多样化，针对不同类型的查询提供不同的搜索方式。然而，谷歌搜索引擎的垄断地位及其与AI聊天机器人的竞争可能面临法律障碍。其次，前Twitch CEO Emmett Shear创立了一家名为STEM AI的AI初创公司，致力于开发与人类行为和伦理道德对齐的AI。该公司获得了Andreessen Horowitz的投资，Emmett Shear本人也越来越关注AI安全和监管，他担心能够超越人类控制的自我改进模型，并建议采取国际合作和条约等措施进行监管。英特尔正在寻求收购其Altera可编程芯片部门的报价，Altera专注于为AI设备设计低功耗可编程芯片。再次，在推理模型方面，谷歌在其Gemini 2.0 Flash系列中添加了一个推理模型，该模型能够进行多模态理解、推理和编码，并展示其逻辑链。OpenAI即将发布第二代O1模型，命名为O3。网络社区对O3模型的发布表示期待。此外，AI领域另一个重要的竞争方向是智能体的部署。OpenAI宣布了ChatGPT的许多新集成，使其能够访问大量编码平台和笔记应用的数据。Salesforce发布了AgentForce 2.0，该平台能够处理复杂的多步骤任务，并正在面临来自其他公司（如Sierra）的竞争。最后，世界模型是一种不同于LLM的AI模型，它通过观察真实或模拟世界进行训练。Descartes公司开发的世界模型能够生成可玩的游戏，并获得了3200万美元的A轮融资。一个名为Genesis的物理模拟平台能够模拟各种材料和物理现象，并能生成各种模态的合成数据，显著提高机器人训练的速度和精度，并可能生成训练世界模型所需的大规模数据集。人们对Genesis平台的发布表示惊叹。总而言之，2025年AI领域的发展速度将不会慢于2024年和2023年。谷歌发言人：谷歌将先进的模型应用于搜索，以帮助用户更好地发现网络信息。 Andrej Karpathy：对谷歌推理模型能够展示其推理过程表示赞赏。 Kevin Wheel：ChatGPT将随着模型的强大而变得更加智能，能够主动完成任务。 Ethan Malek：OpenAI拥有许多AI领域的组件，未来将整合为一个单一产品。 Emmett Shear：对AGI缺乏恐惧要么是对未来发展速度的悲观，要么是对智能力量的严重缺乏想象力。最担心的是能够超越人类控制的自我改进模型，并建议采取国际合作和条约等措施进行监管。 Adam Goldstein：在塔夫茨大学莱文实验室从事生物系统建模研究。 Zhao Jian：Genesis平台能够模拟各种材料和物理现象，并能生成各种模态的合成数据。 Ben Duffy：Genesis平台能够在不到26秒的时间内训练出可在现实世界中部署的运动策略。 Linus：对Genesis平台的发布表示惊叹。 Bilawal Seedhoo：对Genesis平台的发布表示惊叹。 Mila：对Genesis平台的发布表示惊叹。

Deep Dive

Key Insights

Why is there competition around the definition of search in 2025?

Competition is emerging because AI-driven search experiences like Perplexity and ChatGPT are redefining how users interact with search, offering more conversational and answer-focused interfaces. Google is also planning to introduce an AI mode, potentially allowing voice and image inputs, which could shift the paradigm of traditional search.

What is Google's plan for integrating AI into its search engine?

Google is planning to introduce an AI mode option that mimics its Gemini AI chatbot, allowing users to toggle between traditional search and AI-driven conversational search. This mode may also support voice and photo inputs for mobile users.

What is the legal challenge facing Google's AI search mode?

A federal judge has ruled that Google's search engine is an illegal monopoly. The Department of Justice wants to prevent Google from leveraging its dominance to outcompete AI chatbot rivals, which could create legal barriers to the introduction of AI mode.

What is Emmett Shear's new AI startup, STEM AI, focused on?

STEM AI aims to develop AI software that aligns with human behavior, preferences, biology, morality, and ethics. The company is still in stealth mode, but its goals suggest a focus on AI safety and human-aligned AI development.

What is the significance of Intel courting bids for its Altera programmable chip arm?

Intel is seeking to sell Altera, a company specializing in low-power programmable chips for AI-enabled devices, as part of its restructuring efforts. Altera is being valued at $9 to $12 billion, a significant discount from the $17 billion Intel paid in 2015.

What are reasoning models in AI, and why are they important?

Reasoning models are AI systems designed to scale using strategies beyond just increasing compute and data. They emphasize logical thinking and problem-solving, with companies like OpenAI, Google, and Meta investing heavily in this approach to improve AI capabilities.

What is Google's Gemini 2.0 Flash Thinking Experimental model?

Gemini 2.0 Flash Thinking Experimental is a reasoning model from Google that excels in multimodal understanding, reasoning, and coding. It displays its chain of logic, making its thought process transparent, and is available for free on Google AI Studio.

What is OpenAI's O3 model, and how does it differ from O1?

O3 is OpenAI's second-generation reasoning model, designed to improve upon O1 by potentially scaling reasoning abilities without relying solely on longer inference times. It aims to show whether reasoning models can achieve significant improvements at the model layer.

What is the significance of OpenAI's new ChatGPT integrations?

OpenAI is expanding ChatGPT's capabilities by integrating it with various platforms like Apple Notes, Notion, and coding tools. This move positions ChatGPT to become more agentic, enabling it to perform actions beyond just answering questions.

What is Salesforce's AgentForce 2.0, and why is it important?

AgentForce 2.0 is Salesforce's updated agent platform, offering pre-built skills, workflow integrations, and improved reasoning capabilities. It allows companies to deploy customized agents for complex tasks, addressing competition from other AI-driven agent platforms.

What are world models in AI, and why are they gaining attention?

World models are AI systems trained by observing real or simulated environments, rather than large text corpora. They aim to understand physics and simulate real-world interactions, potentially leading to breakthroughs in robotics and AI applications.

What is Descartes' Minecraft-like game model, and why is it significant?

Descartes developed a model capable of generating a playable Minecraft-like game, showcasing the potential of world models. The company raised $32 million at a $500 million valuation, signaling investor interest in this emerging AI approach.

What is the Genesis physics simulation platform, and what are its applications?

Genesis is a comprehensive physics simulation platform capable of generating 4D dynamic worlds for robotics and physical AI applications. It can produce synthetic datasets for training AI, significantly speeding up robotics training and potentially scaling world models.

Chapters

The AI Daily Brief discusses the emerging competition in the search engine market, with players like Perplexity, OpenAI, and Google introducing AI-powered search options. This development is happening despite a federal judge ruling Google's search engine an illegal monopoly, raising questions about the future of search and the implications of antitrust laws in the age of AI.

Perplexity and OpenAI are adding features to their AI search offerings.
Google plans to add an AI mode to its search engine, potentially including voice and photo search.
A federal judge ruled Google's search engine an illegal monopoly, creating potential legal barriers to Google's AI search initiatives.

Shownotes Transcript

Translations:

中文

Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes.

One of the really fascinating phenomenon that is somehow kind of backseat relative to everything else going on in AI is that for the first time in a very long time, 20 years basically, there's competition around what search means. Perplexity is obviously one of the most beloved AI products right now and continues to add to their war chest as well as their feature set.

OpenAI is part of their 12 days of shipments, expanded access to ChatGPT search to the wider world. And now the information is reporting that Google is planning to add an AI mode option to their search as well.

The information sources someone who's working on the product who says that Google is planning to, quote, give its billions of search users the option to switch to an AI mode that looks nearly identical to its Gemini AI chatbot. Others have found indications that this shift won't just be about what you type into your computer, but also that there will be a way to talk to search. 9to5Google found some indications in the code that suggest you'll be able to use mobile inputs, including voice and photos, as a way to search.

A Google spokesperson was circumspect about all of this, saying, As our state-of-the-art models continue to advance, there's a huge opportunity to bring these new capabilities into search, helping people discover even more of the web.

And to some extent, this is absolutely obvious. It feels very likely that in the future, there will be simply multiple types of search experiences for different types of queries. The type of perplexity or chat GPT search where you're actually trying to get an answer to a question is going to become default for lots and lots of types of queries. I don't think that means that Google's traditional search has no help, but being able to toggle between the two could be really valuable.

The challenge, of course, is that a federal judge has already ruled that Google's search engine is an illegal monopoly. As the information writes, the Department of Justice has suggested that it wants to make it harder for Google to leverage its search engines to beat AI chatbot rivals, which could create a legal barrier to something like AI mode. This is tough. On the one hand, I absolutely want competition, but on the other, this is just sort of the obvious place to take search for Google, and artificially blocking them from doing so is basically just forcing them to lose.

Next up, former Twitch CEO and very briefly OpenAI CEO Emmett Shear is reportedly working on a new AI startup with some intriguing goals. You might remember that Shear was very briefly named as replacement for Sam Altman as CEO of OpenAI during the leadership controversy in November 2023. Interestingly, he was credited by the Wall Street Journal as clearing the path for Sam Altman's return by effectively immediately threatening to resign if he wasn't given evidence by the board to support Altman's removal.

TechCrunch is now reporting that Scheer has founded a company called STEM AI, with incorporation documents filed in June of last year. The company is still in stealth, so details are very limited, but what TechCrunch uncovered does sound interesting.

According to a trademark filed last year, STEM AI is developing software to create AI that, quote, understands, cooperates with, and aligns with human behavior, human preferences, human biology, human morality, and human ethics. More hints come from the presence of Adam Goldstein as a co-founder. After selling a travel website called Hitmonk in 2016, Goldstein became a visiting partner at Y Combinator. He also founded an incubator called Astonishing Labs to back bio-research startups.

According to his LinkedIn page, Goldstein spent a year at Tufts University's Levin Labs as a visiting scientist, where he, quote, "...developed new models for biological systems with a focus on cancer."

According to PitchBook, STEM received backing from Andreessen Horowitz back in August. And while that's all we know right now about the actual company, Scheer has been growing increasingly vocal about AI safety and regulation over the past month. In December, for example, he posted, almost all currently proposed regulation is a bad idea. He added, though, that ideas around regulating firms rather than AI models and increasing transparency are some of the few reasonable ideas currently being floated.

Back in November, he wrote, not being scared of AGI indicates either pessimism about the rate of future progress synthesizing digital intelligence or severe lack of imagination about the power of intelligence.

In June, around the time that California's SB 1047 legislation was being debated, he said on a podcast appearance that his greatest concern was self-improving models that could grow out of human control. He said at the time, I'm in favor of creating some kind of fire alarm, like maybe no AI is bigger than X. I think there's good options for international collaboration and treaties about some sort of AI test ban treaty. TLDR, Scheer is a good operator, and this is likely one to watch.

Lastly today, how the mighty have fallen. Intel is courting bids to buy out its Altera programmable chip arm. Altera specialized in the design of low-power programmable chips for use in AI-enabled devices. The company was spun off as a separate entity in February as Intel attempted to right the ship after a disappointing few years. Bloomberg reports interest from multiple private equity firms, including Francisco Partners, Silver Lake Management, Apollo Global Management, and Bain Capital. Intel is giving potential buyout partners until January to formalize their offers.

Deal terms presented in November range from taking a 20% to 30% stake in the company, all the way up to taking full control. Bloomberg reports that Altera is being valued in the range of $9 to $12 billion, a steep discount from the $17 billion Intel paid back in 2015. The move comes, of course, in the shadow of the departure of CEO Pat Gelsinger. After being brought in three years ago to get Intel back on track, Gelsinger retired from his position earlier this month at the request of the board.

That, however, is going to do it for today's AI Daily Brief Headlines Edition. Next up, the main episode. Today's episode is brought to you by Vanta. Whether you're starting or scaling your company's security program, demonstrating top-notch security practices and establishing trust is more important than ever.

Vanta automates compliance for ISO 27001, SOC 2, GDPR, and leading AI frameworks like ISO 42001 and NIST AI risk management framework, saving you time and money while helping you build customer trust. Plus, you can streamline security reviews by automating questionnaires and demonstrating your security posture with a customer-facing trust center all powered by Vanta AI. Over 8,000 global companies like Langchain, Leela AI, and Factory AI use Vanta to demonstrate AI trust and prove security in real time.

Learn more at vanta.com slash nlw. That's vanta.com slash nlw. If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents by industry, horizontal agent platforms, agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.

That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.

If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Welcome back to the AI Daily Brief. I love it when a set of stories converge in a way that really tells a bigger story than just an individual piece of news. And boy, is that the case today.

We are looking at what are shaping up to be the 2025 battle lines in AI, reasoning models, agents, and world models, all of which got some interesting news yesterday.

Now, of course, at this point, you probably don't need the background on reasoning models. But effectively, this is a new approach to scaling that uses different strategies than just throwing more compute and data in the pre-training. This is clearly where OpenAI is putting its emphasis. It released O1 Preview back in September. Subsequent to that, we've had Amazon announcing Nova and talking about a reasoning model in the lineup. Meta releasing Lama 3.3, also emphasizing its reasoning capabilities. Several Chinese labs have also released very competent reasoning models.

And now Google has joined the party.

A few days after their initial launch, Google has added a reasoning model to the Gemini 2.0 Flash lineup. The model is called Gemini 2.0 Flash Thinking Experimental. Hopefully this is just a working title, guys. And the way it describes itself is that it's the best in the lineup for, quote, multimodal understanding, reasoning, and coding. In demonstrations, it seems to perform well on puzzles involving both visual and text clues. And so far as novel features that make it stand out from the pack, the model shows its chain of logic so you can see what's going on under the hood.

OpenAI co-founder Andrej Karpathy wrote, The prominent and pleasant surprise here is that unlike O1, the reasoning traces of the model are shown. As a user, I personally really like this because the reasoning itself is interesting to see and read. The models actively think through different possibilities, ideas, debate themselves, etc. The case against showing these is that it's typically a concern of someone collecting the reasoning traces and training to imitate them on top of a different base model to gain reasoning ability possibly and to some extent. The

The model is also extremely fast compared to rivals and available for free on Google AI Studio. This in and of itself is pretty surprising as reasoning models so far have been extremely expensive to operate compared to their non-reasoning counterparts. One interesting thing here is that Google's naming convention implies that this is just a fine-tuned version of 2.0 Flash, or perhaps simply the base model with some system prompts to ask the model to think for longer and check its work before answering.

Compare that to 01, where OpenAI went out of their way to present it as an entirely new model. Sam Altman even framed the release as the beginning of a different branch of LLMs for the company. One of the big questions I think heading into next year is just how different these reasoning models are from their non-reasoning counterparts, and more particularly, whether they really do evolve in different ways going forward.

Now, speaking of OpenAI, the other big news for reasoning models is that OpenAI is preparing to release the second generation of their O1 model. Funny enough, speaking of weird naming conventions, according to the information, the model is going to be called O3 to avoid intellectual property disputes with British Telco O2. Sam Altman all but told us the model would be released today, so it's probably out by the time you're listening to this. The release could answer another big question surrounding reasoning models, which is whether they can show major improvements on the model layer.

At the time O1 came out, it was suspected that OpenAI was pivoting to reasoning because adding training data and compute to training runs was showing diminishing returns. In the following months, it was confirmed that noticeable improvements could be made to reasoning models by getting them to think longer. Assuming O3 is a brand new model and not a tweak of O1, it should reveal whether reasoning models themselves can scale an ability, or instead if all of the improvements are only possible by scaling up inference time.

The community is pretty excited to check it out. Chubby on X writes, O3 equals Orion. There is probably no more GPT 4.5 or 5. Everything is summarized in Orion, i.e. O3. Surely, Orion was fed with a lot of synthetic data from O1 and now has evolved into O3. Chubby also got at the competitive dynamics in the field, writing, time to take back the crown from Google. We have one more normal episode coming on Monday before we get into end of the year specials, so I will have a chance to follow up on what exactly came out on Friday.

Now, moving on to the next dimension of competition, the one that's even more obvious than reasoning models in some ways, is the race to deploy agents.

Yesterday, OpenAI announced a long list of new integrations for ChatGPT. The desktop application can now access data from a gigantic list of coding platforms, as well as Apple Notes, Notion, and Quip. For now, ChatGPT can only read these apps in context. It can't take actions within those programs. But Chief Product Officer Kevin Wheel made it clear that's where this is all going. He said, We've been putting a lot of effort into our desktop apps. As our models get increasingly powerful, ChatGPT will become more and more agentic.

That means we'll go beyond just questions and answers. ChatGPT will begin doing things for you. A few weeks ago, Wharton professor Ethan Malek posted, OpenAI has a lot of pieces on the board right now. Multimodal vision and voice, small, large, and reasoning models, image and video creation, code execution, mobile and desktop apps, web search, semi-agentic stuff. Very curious when it will be glued together into a singular thing.

Interestingly, going back to this quote from Kevin Wheel, it seems extremely notable to me that he says ChatGPT will begin doing things for you. This indicates pretty clearly to me that we have the singular thing, that it is and has always been ChatGPT. It's just that over time, ChatGPT is going to be a lot more and frankly different from what ChatGPT originally was.

A couple of days ago, we also got an update from Salesforce on their agent platform, AgentForce. Just three months after announcing AgentForce in September, the company announced AgentForce 2.0. They write, this release introduces a new library of pre-built skills and workflow integrations for rapid customization, the ability to deploy AgentForce in Slack, and advancements in agentic reasoning and RAG.

These advancements will enable companies to scale their workforce with customized agents capable of handling complex multi-step tasks with even more precision and accuracy. And if you need a sense of just how important this is to Salesforce, go check out the piece in the information, AI is Mark Benioff's friend and foe. It talks about how Salesforce is facing increasing competition from companies like Sierra, which are bringing agents to market, and in fact, in some cases, winning business away from Salesforce.

The final vector of competition that I want to discuss today is this new world model approach. These models are trained in a fundamentally different way to LLMs. Where LLMs are trained on a large corpus of text, image, and voice data, world models are trained by observing real or simulated worlds. We've seen a few working prototypes of this style of AI, with two big examples coming out of Fei-Fei Li's World Labs and another coming out of Google DeepMind. A third big player is Descartes, who released a model in October capable of generating a fully playable Minecraft-like game.

While the demo was buggy and rudimentary, it clearly made investors sit up and pay attention. TechCrunch reports that the company has now raised their Series A. The startup raised $32 million at a $500 million valuation. CEO and co-founder Dean Liedersdorf said, the company wants to compete at the highest level, building a, quote, fully vertically integrated AI research lab alongside enterprise and consumer products. He said the aim was to create what he's calling a kilocorn. In other words, a trillion dollar company. You gotta love ambition, man.

Part of the reason that people are so interested in world models, especially recently, is the sense that perhaps their understanding of physics could be something that allows them to make more fundamental breakthroughs. Still, in many ways, this class of models feels closer to where GPT-based LLMs were a few years ago, demonstrating some fascinating emergent properties, but still nowhere near the full scale that they're going to reach.

Along those lines, a group of researchers across 19 different universities have just revealed something they are calling a comprehensive physics simulation platform. Named Genesis, the researchers claim the platform is, quote, capable of simulating a wide range of materials and physical phenomenon. Researcher Zhao Jian writes, "...after a 24-month large-scale research collaboration involving over 20 research labs, a generative physics engine able to generate 4D dynamic worlds powered by a physical simulation platform designed for general-purpose robotics and physical AI applications."

We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI, and other applications.

So basically, Genesis as I understand it can be used as both a robotic simulation platform and as a photorealistic rendering platform. The platform accepts natural language prompts and can be used as a data engine to produce a range of different modalities of synthetic or simulated data. In the immediate term, it's a massive boost to robotics training in terms of speed and accuracy. It could lead to immediate improvements in that field and possibly even unlock more complex use cases.

Roboticist Ben Duffy commented, quote, with Genesis, you'll be able to train a locomotion policy that's deployable in real world in less than 26 seconds. That sentence tells us about a future we are not ready for. For reference, that's 430,000 times faster than the previous leading physical simulators to give a sense of how dramatic this change could be.

One of the other potentials is that a platform like this could produce the gigantic datasets required to scale up world models. Currently, they've been trained using either datasets from self-driving cars or by observing video games. There are a few projects strapping camera rigs to hikers in order to gather real-world data, but if this platform is as performant as the researchers claim, we could soon see near-infinite synthetic datasets available to train the next generation of world models.

Effectively, all of the response to this is some version of wow. Viewing their announcement video, AI evangelist Linus writes, this is all generated and simulated in 4D. Mind-blown emoji, mind-blown emoji, mind-blown emoji. Bilawal Seedhoo writes, think instant physics-accurate environments, camera paths, and character animations, all from natural language.

Mila writes, what the? This Genesis project is like something out of a sci-fi movie. I mean, generating entire 4D worlds with physics simulations? That's mind-blowing. I'm just sitting here stunned, trying to wrap my head around how this could change everything from robotics to video games. And I think, friends, if we had to sum this up, if you thought that 2025 was going to be any slower than 2024 and 2023 had been, boy, do you need to think again. That's going to do it for today's AI Daily Brief. Appreciate you listening, as always. Until next time, peace.

2025 AI Battlelines: Agents, Reasoning, and World Models 17:28 Share