We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
People
A
Andrej Karpathy
A
Andrew Bosworth
A
Aravind Srinivas
B
Brett Taylor
E
Ethan Malek
E
Ethan Mollick
No available information on Ethan Mollick.
G
Gavin Baker
J
Jimmy Apples
S
Siki Chen
无发言人
Topics
Aravind Srinivas: 我认为知识应该普及且有用,不应被高昂的订阅计划所限制。Perplexity的深度研究工具致力于以更低的价格提供服务,这得益于开源技术的支持。我们希望让更多人能够访问和利用这些工具,而不是让它们成为少数企业盈利的手段。 Siki Chen: 我认为,在拥有完整的模型推理能力之前,任何公司都无法构建比OpenAI更好的深度研究工具。原始的模型推理能力至关重要。尽管如此,从消费者的角度来看,更多的选择总是好的,我很高兴看到这个领域的竞争。

Deep Dive

Chapters
Perplexity launched its own version of Deep Research, a tool similar to OpenAI's, but at a significantly lower price point. While some claim Perplexity's version is as good as or better than OpenAI's, others argue that OpenAI's superior model reasoning capability remains unmatched.
  • Perplexity offers Deep Research at a fraction of OpenAI's price.
  • Perplexity's Deep Research uses agentic web search and iterative reasoning.
  • User opinions on which tool is superior are divided.

Shownotes Transcript

Translations:
中文

Today on the AI Daily Brief, Grok 3 kicks off what appears to be the beginning of model update season. Before that in the headlines, Perplexity launches their own version of deep research. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. ♪

OpenAI's deep research is one of the more exciting products that many people have gotten recently. In fact, if you go on Twitter or X, you can find people saying it's the most impressive product they've seen in years. It is, however, behind an extremely expensive paywall.

Right now, the only people who have access to deep research are those who are paying for OpenAI's $200 pro tier. And then in comes perplexity with their own version of deep research, with the same name, in fact, suggesting that they're trying to make this just a category of AI usage like chatbot. And it absolutely obliterates the OpenAI price point.

Free users have five queries per day, while pro users get up to 500 daily queries and have access to faster speeds. When asked how the company could offer this tool at this price, CEO Aravind Srinivas said, "...thankful for open source, we're going to keep making this faster and cheaper. Knowledge should be universally accessible and useful, not kept behind obscenely expensive subscription plans that benefit the corporates, but not in the interest of humanity."

So yes, if you're wondering, Sam Altman is now being shanked from below and from above, given the aggressiveness of that particular positioning. Perplexity's deep research works very similarly to how rival tools work, using a combination of agentic web search and iterative reasoning to generate in-depth research reports. They share a bunch of benchmarks, but honestly, I think for this type of product, everything is about how it actually performs. And for that, you're just going to have to go check it out yourself, which thankfully you can, given that they offer even free users some number of queries each day.

One user asked Perplexity to compare itself to rival deep research features, ultimately producing a multi-page analysis that summarized, Perplexity AI excels in speed and accessibility for casual researchers. OpenAI dominates in analytical depth for enterprise applications. Google integrates most seamlessly with existing productivity ecosystems, which honestly seems like a fairly decent write-up and summarization.

Now, if you go cruise around the internet, you can find people who are saying that Perplexity's version of the tool is every bit as good or even better than OpenAI's. But you also have a lot of sentiment like this one from Siki Chen, who writes, until you have access to full O3 or quad four or something, you simply are not going to build a better deep research than OpenAI. This is a use case where the raw model reasoning capability matters a lot. Still, from a consumer perspective, obviously more options is a good thing. And so glad to see some competition in this space.

Next up, an update on Ilya Sutskever, the former OpenAI co-founder, who is back out fundraising for his new company, Safe Superintelligence.

Previous reports had Ilya raising about a billion dollars at a $20 billion valuation, and it seems like that is now up to a $30 billion plus valuation. Bloomberg reports that Greenoaks Capital Partners will lead the round and plans to invest about half of it. And we still have no idea whether the valuation update from the original $5 billion reflects something new that Ilya has shown investors, or is just the premium that the market feels it has to pay for any Ilya product.

Now, while the startups like Perplexity race ahead, don't expect the next generation of AI-enabled home assistants anytime soon, as the big tech companies are struggling. Both Alexa and Siri have hit another round of delays. People got excited recently when it was reported that there was an Alexa AI event. But at a last-minute go-no-go meeting last week, apparently Amazon's executives decided that no-go was the answer. And the Washington Post is now reporting that AI Alexa won't be ready until March 31st or later.

The delay is reportedly due to Alexa giving inaccurate answers, which has been the scourge of this development cycle. Apple's AI Siri upgrade is also facing delays after plans were first unveiled all the way back last June at WWDC. Bloomberg reports that the project is facing engineering problems and software bugs, and that while Apple is, quote, racing to the finish line, some features planned for an April rollout may be delayed until May or even later.

One of the things that this highlights is that the margin of error and consumer forgiveness for AI hallucinations and incorrect answers when it comes to these sort of smart home devices is basically zero. And the risk of finding yourself on the wrong end of some viral clip on social media is really high, making these particular product rollouts a real challenge.

Lastly today, Meta is apparently planning a big investment in humanoid robots. The company will establish a new team within their Reality Labs hardware division, which is the group that has released the Meta Ray-Bans and the MetaQuest. The new plan is to develop Meta's hardware for humanoid robots designed to complete household tasks, initially focusing on developing sensors to be used by third-party startups.

In an internal leaked memo, Meta's CTO Andrew Bosworth said, the core technology we've already invested in and built across reality labs and AI are complementary to developing the advancements needed for robotics. We believe that expanding our portfolio to invest in this field will only accrue value to Meta AI and our mixed and augmented reality programs. I think we're still a little premature, but you are going to see a lot more of the intersection of robotics and AI this year and in the years to come.

For now, though, that is going to do it for today's AI Daily Brief Headlines Edition. Next up, the main episode. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in.

Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company.

Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vantage to manage risk and prove security in real time.

For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off. If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents buy industry horizontal agent platforms.

Agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode. That's

That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.

If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Hey, listeners, want to supercharge your business with AI?

In our fast-paced world, having a solid AI plan can make all the difference. Enabling organizations to create new value, grow, and stay ahead of the competition is what it's all about. KPMG is here to help you create an AI strategy that really works. Don't wait, now's the time to get ahead. Check out real stories from KPMG of how AI is driving success with its clients at kpmg.us slash AI. Again, that's www.kpmg.us slash AI.

Now, back to the show. Welcome back to the AI Daily Brief. Today, we are digging into the always juicy topic of model competition. Specifically, Elon Musk's XAI has released their long-awaited flagship model, Grok 3. In fact, the launch unveiled a family of models built around the Grok 3 architecture.

The flagship model competes against OpenAI's GPT-4.0, but there's also a mini version that's designed for speed. The company will also release reasoning versions of the model in each size shortly. Users, for example, will be able to engage something called Big Brain Mode to add more reasoning time for more difficult queries. And XAI also introduced a mode called Deep Search.

DeepSearch uses a form of rudimentary agent to search the web and Twitter slash X post to compile long-form reports, obviously now in a similar way to how deep research works with OpenAI. There is also a forthcoming voice mode, which will be rolled out in about a week, according to the announcements. Grok 3 is first available to Premium Plus subscribers on X, but M1 Astra and Apple Insider also claims that XAI will launch a Grok Pro tier at $30 a month or $300 per year.

It seems like that subscription might be required to use advanced features like the deep search voice mode and big brain mode. Now, as these new models come online, Elon announced that Grok 2 would be open sourced in the coming months. He said, Our general approach is that we will open source the last version when the next version is fully out. When Grok 3 is mature and stable, which is probably within a few months, then we'll open source Grok 2. Sam Altman has flagged that he's considered doing the same with OpenAI's older models as well, so maybe that becomes the new norm.

Now, one of the reasons that Grok 3 has been highly anticipated is that it's the first model that's trained on a larger scale data center. Last month, Elon claimed that the model was trained using 10 times the compute of Grok 2, which was achieved, of course, with the Colossus supercluster, the first training cluster capable of networking 100,000 NVIDIA H100s. Grok 3 was therefore viewed as the first real test of whether pre-training scaling had hit a wall with the last generation of models.

Now, of course, as is the case with every launch, people are poring over the benchmarks. When it comes to math, science, and coding benchmarks, Grok 3 Mini achieved parity with Gemini 2.0 Pro and DeepSeek v3, and the full-size Grok model, and of course this is according to XAI itself, outperformed on each test by a noticeable margin. Important to note, this was only comparing leading non-reasoning models with Grok 3 not putting up the same performance as OpenAI's O3 Mini on these tests.

For the reasoning models, both sizes of Grok 3 seem fairly competitive with O1 on low inference settings and Outperform 03 Mini on high inference settings. This would imply that the reasoning version of Grok 3 isn't on the same level as the full-size O3. Given that we don't have access to either model at this stage, we don't know for sure. XAI noted that Grok 3 reasoning is still in beta and will have further post-training before its full release.

There wasn't a huge boost from scaling pre-training, but the gains were there. Professor Ethan Malek writes, In essence, Grok 3 doesn't invalidate those scaling laws, but it could also suggest that much, much larger training clusters are needed to see paradigm-changing improvements.

One benchmark that many people took note of was Chatbot Arena, where users vote on which AI output they prefer. While the metric is inherently subjective, it gives a sense of how the models will perform in the market. Investor Gavin Baker writes, Grok3 is the first model ever to score over 1400 on Chatbot Arena and outperforms the best publicly available reasoning models from OpenAI and Google.

XAI was founded 13 years after DeepMind and 8 years after OpenAI and is now ahead of both. The SR71 Blackbird of AI labs. Baker did, of course, then note that he is a little biased as an XAI investor. AI Breakfast wrote, For everyday users, the chatbot arena is the only benchmark that matters. Grok 3 is officially the best LLM. Given the speed at which XAI achieved this, they will only widen the gap over time. A more complete review comes from Andrej Karpathy.

And although Karpathy was a co-founder at OpenAI, most people view his take as inherently unbiased given his lack of affiliation today and the general credibility that he has. He wrote a long review on X saying, I was given early access to Grok 3 earlier today, making me, I think, one of the first few who could run a quick vibe check. He goes into a long review, sharing some of his tests around thinking, exploring the deep search feature, trying a bunch of random LLM gotchas,

And ultimately, here's the conclusion he came to. He writes, Grok 3 plus thinking feels somewhere around the state-of-the-art territory of OpenAI's strongest model, so one pro at $200 a month, and slightly better than DeepSeek R1 and Gemini 2.0 flash thinking, which is quite incredible considering that the team started from scratch around a year ago. This timescale to state-of-the-art territory is unprecedented.

Do also keep in mind the caveats. The models are stochastic and may give slightly different answers each time, and it is very early. So we'll have to wait for a lot more evaluations over a period of the next few days to weeks. The early LLM arena results look quite encouraging indeed. For now, big congrats to the XAI team. They clearly have huge velocity and momentum. Now, the larger context around the Grok 3 launch is the ongoing feud between Elon and Sam Altman.

And indeed, it's very difficult to cover this. Elon especially is more divisive than he's ever been, and it is enormously difficult to find people who can separate whatever they think about Elon in general from their reviews of anything that he touches. Here's how Gary Mark has summed up where this leaves the competition.

which I think is fairly reflective of what others think as well. He writes,

Open AI leaker Jimmy Apples writes, Strong model, the main thing is the speed with which they caught up. I think it lives up to expectations, strong offering, good dollar value. He then prodded Sam Altman to release 4.5, which we know is coming soon. Earlier in the day when someone had told him to release 4.5 the same day to steal the show, Altman wrote, that wouldn't be very nice, ellipsis.

To me, one of the things that really stands out is just how absolutely saturated these benchmarks are and how little I find myself compelled by them when a new model comes out.

Ethan Mollick again got at this, writing, Another thing Grok3 highlights is the urgent need for better batteries of tests and independent testing authorities. Public benchmarks are both meh and saturated, leaving a lot of AI testing to be like food reviews, based on taste. If AI is critical to work, we need more. He continues, GPQA, Diamond, and MMLU and ArcAGI look nothing like actual work. He also adds, and this is something I completely agree with, I'm surprised no large IT consulting or even national standards agency hasn't stepped in with large-scale batteries of private tests.

especially given the hundreds of billions of dollars being invested. This is a hugely salient point. Ultimately, it doesn't matter for the vast majority of users how they do on these benchmarks. It matters how they perform in real work settings. And speaking of OpenAI and Elon's fight, the OpenAI board has now formally rejected Elon's $97 billion bid to take over the nonprofit. In a unanimous vote, the board decided that the takeover was, quote, not in the best interest of OpenAI's mission.

A statement from Chairman Brett Taylor said, OpenAI is not for sale and the board has unanimously rejected Mr. Musk's latest attempt to disrupt his competition. Any potential reorganization of OpenAI will strengthen our nonprofit and its mission to ensure AGI benefits all of humanity. OpenAI lawyers have insisted that Musk's bid doesn't set the price for the nonprofit, which will need to be paid during the conversion to a for-profit company.

Separately, the Financial Times reports that the company is considering granting special voting rights to the nonprofit board in an attempt to ensure that they aren't a target for a hostile takeover from Musk following the for-profit conversion.

Meanwhile, XAI itself is heading back to the well for another funding round. Bloomberg reports that the company is seeking to raise $10 billion at a $75 billion valuation, with sources saying that existing investors including Sequoia, Andreessen Horowitz, and Valor Equity Partners all participating in the talks, which are still at an early stage. A significant portion of the new funding seems as though it would pay for upgraded chips at XAI's data centers. On Friday, Bloomberg reported that the company was close to closing a $5 billion deal with Dell to provide servers powered by NVIDIA's Blackwell GB200 chips.

So ultimately, friends, where we are is that the proof is going to be in the pudding. A lot of folks over the next few weeks will be testing Grok 3 and seeing how it compares to the latest ChatGPT and Cloud models. But it also feels to me like this is the beginning of model update season, not the end, with both Anthropic and OpenAI promising new models coming soon. So we could have a lot of new developments in the near future, which is obviously nothing but good for all of us users. For now, though, that is going to do it for today's AI Daily Brief. Until next time, peace.