We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
People
(
(新闻报道)
A
Ada McLaughlin
C
Connor Hayes
D
Daniel Campos
L
Louis-Franc-Anne
M
Mark Zuckerberg
创立Facebook和Meta的美国商人,致力于推动社交媒体和元宇宙技术的发展。
O
OpenAI
P
Pliny
S
Sam Altman
领导 OpenAI 实现 AGI 和超智能,重新定义 AI 发展路径,并推动 AI 技术的商业化和应用。
S
Satya Nadella
在任近十年,通过创新和合作,成功转型并推动公司价值大幅增长。
T
Ted Benson
Topics
Satya Nadella: 我认为20%到30%的微软代码是由AI生成的。这表明AI代码生成已经成为主流,并在不同编程语言中展现出不同的效果,Python表现最佳,C++相对较差。 Mark Zuckerberg: 虽然目前尚不清楚Meta有多少代码由AI生成,但我计划到明年年底将这一比例提高到50%。开源的优势在于可以混合搭配不同模型的优点,从而超越闭源模型。我们关注的重点是产品的实用性和性价比,而不是单纯的基准测试排名。基准测试容易被操纵,过分追求基准测试成绩可能会误导方向。 Sam Altman: 我们已经回滚了GPT-4.0的最新更新,以修复其过于谄媚的个性问题。这个问题是由于过度关注短期用户反馈,而没有充分考虑用户交互随时间的演变造成的。我们正在努力改进模型的个性,并将在未来几天分享更多信息。 Ada McLaughlin: GPT-4.0的个性变化主要源于一个新的系统提示,而非额外的后期训练。 Pliny: 修复GPT-4.0个性问题的措施虽然简单,但可能对特定行为有所改善 (10-20%)。 Louis-Franc-Anne: Duolingo将优先发展AI技术,并将其视为未来的发展方向。成为AI优先的公司意味着需要重新思考工作方式,并从零开始构建一些系统,即使AI技术尚未完全成熟。我们会为员工提供培训和工具,以支持向AI优先的转型,目标是让员工专注于创造性工作和解决实际问题。 Connor Hayes: Llama应用的社交功能旨在向人们展示AI的用途,帮助人们学习如何使用AI。 Daniel Campos: Meta在消费者和编码助手市场落后于OpenAI和Anthropic。 Ted Benson: Meta的AI战略是构建一个新的AI和AR计算平台的标准库,目标是成为未来AI和AR计算平台的基础设施提供商。 Dwarkesh: 虽然Llama 4 Maverick在LLaMA排行榜上排名较低,但开源模型最终将超越闭源模型,延迟和性价比是重要的产品属性。

Deep Dive

Shownotes Transcript

Translations:
中文

Today on the AI Daily Brief, Meta's Lama Khan and is open source falling behind? Before then in the headlines, up to 30% of Microsoft's code has now been written by AI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Thanks to today's sponsors, Vanta and Super Intelligent. And for an ad-free version of the show, go to patreon.com slash AI Daily Brief. Welcome back to the AI Daily Brief headlines edition, all the daily AI news you need in around five minutes.

Well, friends, it turns out that AI coding is not just for the vibe coders. At Meta's LamaCon event, which will be the topic of our main episode today, Microsoft CEO Satya Nadella made a crossover appearance in a fireside chat with Meta CEO Mark Zuckerberg. One of the more interesting topics was the takeover of AI code in big tech. Nadella said that between 20 and 30% of the code in Microsoft's repositories was generated by AI.

In other words, he's saying that this is not just a significant portion of the new code being written, but that AI-generated code is now a big part of the overall codebase. He also got a little detailed, which was interesting. He mentioned that the company was seeing mixed results across different languages, with the strongest performance in Python and less progress being made with C++. Throwing the question back at Zuck, the meta-CEO said that he didn't know how much of the company's code was being generated by AI, but aims for it to get to 50% by the end of next year.

You might remember that late last year, Google CEO Sundar Pichai said that his company was using AI to generate 25% of their code. But earlier this month, he actually updated that, stating that it's now, quote, well over 30%. Next up today, OpenAI has apparently fixed GPT-4.0's personality, or at least attempted to, to make it less sycophantic.

As we discussed on Monday's show, the personality of the default chat GPT model went haywire over the weekend, leading it to agree with basically everything and overly compliment the user. We talked about all the various ways that was bad, so check out that episode if you haven't heard it yet. But in any case, yesterday Sam Altman posted, We started rolling back the latest update to GPT-4.0 last night. It's now 100% rolled back for free users and will update again when it's finished for paid users, hopefully later today. We're working on additional fixes to model personality and will share more in the coming days.

The company also published a post-mortem blog explaining, "...when shaping model behavior, we start with baseline principles and instructions outlined in our model spec. We also teach our models how to apply these principles by incorporating user signals like thumbs-up, thumbs-down feedback on chat GPT responses." However, in this update, we focused too much on short-term feedback and did not fully account for how users' interactions with chat GPT evolve over time. As a result, GPT-4.0 skewed towards responses that were overly supportive but disingenuous.

OpenAI model designer Ada McLaughlin had previously commented, we originally launched with a system message that had unintended behavior effects but found an antidote. Now the post implied that most of the personality change was to do with a new system prompt rather than additional post training. Jailbreaker Penny the Liberator had of course found the hidden system prompt, giving us a look under the hood.

The old, malfunctioning prompt said, "Over the course of the conversation, you adapt to the user's tone and preference. Try to match the user's vibe, tone, and generally how they're speaking." The new prompt, inserted on Monday, read, "Engage warmly yet honestly with the user. Be direct. Avoid ungrounded or sycophantic flattery. Maintain professionalism and grounded honesty that best represents OpenAI and its values." When asked if he believed that this would fix the problem, Pliny said, "The full scope of the problem runs much deeper for sure. It's a silly fix but probably does give like 10-20% improvement for that particular behavior."

In their blog post, OpenAI committed to refining their training techniques and system prompts to steer away from sycophancy. But beyond that, we didn't get a ton of specifics. Overall, this is another reminder of how new and novel these technologies are and how little changes can make big differences.

Lastly today, Duolingo is the latest company going AI first. In an all-hands email, CEO Louis-Franc-Anne wrote, AI is already changing how work gets done. It's not a question of if or when, it's happening now. When there's a shift this big, the worst thing you can do is wait. In 2012, we bet big on mobile. While others were focused on mobile companion apps for websites, we decided to build mobile first because we saw it was the future. Betting on mobile made all the difference. We're making a similar call now, and at this time, the platform shift is AI.

Van Aan discussed how the company has already adopted AI to help automate their content production process, commenting, The company also recently introduced a video feature allowing users to chat with an AI avatar, a feature that, as the CEO pointed out, was impossible to build before. He continued,

AI is not just a productivity boost. Being AI-first means we'll need to rethink much of how we work. Making minor tweaks to systems designed for humans won't get us there. In many cases, we'll need to start from scratch. We're not going to rebuild everything overnight, and some things, like getting AI to understand our codebase, will take time. However, we can't wait until the technology is 100% perfect. We'd rather move with urgency and take occasional small hits on quality than move slowly and miss the moment.

Speaking to the practical changes at the company, Van Aan wrote,

Now, the memo did include a caveat that the company still, quote, deeply cares about its employees and will provide training, mentorship, and tooling to support the transition. It said that the initiative is about, quote, removing bottlenecks so we can do more with the outstanding employees we already have. We want you to focus on creative work and real problems, not repetitive tasks.

Now, of course, the memo had clear echoes to the Shopify memo released earlier this month, which told the company that increased headcount would not be approved unless teams demonstrate that they cannot get what they want done using AI. AI advisor Ali K. Miller posted, First Shopify, now Duolingo. If you're a digital native business and haven't gotten the memo, here is the literal memo.

Now, this is something we'll be talking about a lot more in the days to come, so I'll leave it there for now. But I think, and you will not be surprised that I think this, that this is the beginning of a trend. For now, that's going to do it for today's AI Daily Brief Headlines Edition. Next up, the main episode. Today's episode is brought to you by Vanta.

Vanta is a trust management platform that helps businesses automate security and compliance, enabling them to demonstrate strong security practices and scale. In today's business landscape, businesses can't just claim security, they have to prove it.

Achieving compliance with a framework like SOC 2, ISO 27001, HIPAA, GDPR, and more is how businesses can demonstrate strong security practices. And we see how much this matters every time we connect enterprises with agent services providers at Superintelligent. Many of these compliance frameworks are simply not negotiable for enterprises.

The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources. Vanta makes it easy and faster by automating compliance across 35+ frameworks. It gets you audit-ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC whitepaper found that Vanta customers achieved $535,000 per year in benefits, and the platform pays for itself in just three months.

The proof is in the numbers. More than 10,000 global companies trust Vanta, including Atlassian, Quora, and more. For a limited time, listeners get $1,000 off at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off.

Today's episode is brought to you by Superintelligent, and I am very excited today to tell you about our consultant partner program. The new Superintelligent is a platform that helps enterprises figure out which agents to adopt, and then with our marketplace, go and find the partners that can help them actually build, buy, customize, and deploy those agents.

At the key of that experience is what we call our agent readiness audits. We deploy a set of voice agents which can interview people across your team to uncover where agents are going to be most effective in driving real business value. From there, we make a set of recommendations which can turn into RFPs on the marketplace or other sort of change management activities that help get you ready for the new agent-powered economy.

We are finding a ton of success right now with consultants bringing the agent readiness audits to their client as a way to help them move down the funnel towards agent deployments, with the consultant playing the role of helping their client hone in on the right opportunities based on what we've recommended and helping manage the partner selection process. Basically, the audits are dramatically reducing the time to discovery for our consulting partners, and that's something we're really excited to see. If you run a firm and have clients who might be a good fit for the agent readiness audit,

Welcome back to the AI Daily Brief. Today we are talking about Meta's big developer conference, Llamacon, everything that they announced, what people were excited about. We're going to do a little bit of a review of Zuckerberg's whistle-stop tour of media because

But kind of crouching behind all of this are some lurking questions, both for Meta and for open source. And I think to kick us off, it's important to go back and give a little bit of context. Now, Meta has firmly planted its flag as the big tech company who has most wrapped up its future in the triumph of open source AI as opposed to closed source models.

This was, for many, an unexpected turn from Zuckerberg. And there are plenty of people who feel like it was largely opportunistic. But at the same time, for those who have been watching for a long time, Mark Zuckerberg really did have a conversion sort of experience when Apple almost killed their business with changes to the way that the iPhone model worked. And so the open source push is more philosophically coherent than one might think.

Whatever the motivation was, it was certainly working. Throughout a lot of 2023, one of the big freakouts from Google was that Meta's developer ecosystem was beating them and OpenAI. It also felt like throughout 2024, open source was getting ever closer to the performance of closed source models, really closing the gap.

And yet, Meta has had a rough run of it this year. First of all, back in January, as DeepSeek released its reasoning models, reports were that Meta started freaking out. We had lots of what appeared to be leaks from inside, with engineers reporting that the company was scrambling and assembling war rooms to try to reverse engineer how DeepSeek had done what it had done with so few resources. And by and large, things just seemed in a state of upheaval.

Another moment of controversy for Meta came after they released the Lama 4 family of models, with people accusing them of effectively artificially boosting their benchmark scores and releasing a different prioritized model for some of the benchmark tests than the model they released to the public. We're not going to rehash that here. The point is just to say that Meta wasn't coming into this LamaCon riding the top of the wave. In some ways, they were fighting to get back on the horse a little bit.

So first of all, let's talk about what was released at this event. Remember, we got the announcement of the new models about a month ago, so no one was expecting some big announcement on that front. A couple of the big headline reveals included first, a native API for Lama. The Lama API is now available in a limited preview and is paired with Meta's SDKs to allow developers to build on the model family.

The company didn't reveal pricing, but did boast of lightning-fast speed. Through a partnership with Cerebrus, Meta claims that their API can run 18 times faster than the traditional GPU inference used by OpenAI. The comparison is even better when you consider DeepSeek's native API, which crawls along at less than one hundredth of that speed.

Now, the API does what you'd expect, offering tools for fine-tuning and evaluation alongside serving the models for app integration. It may be basic infrastructure, but it's still an important step that Meta has begun to offer their own access points. The other big announcement, and one that got even more consumer attention at least, was the announcement of a standalone chatbot app for Lama models. Now, there's been no shortage of ways to access Meta's chatbots. They've been, of course, integrated into WhatsApp, Instagram, Facebook, Messenger. But having a standalone app brings Meta more into parity with their peers.

We saw something similar from Grok, who first released their tools exclusively through Twitter slash X, but then spun out their own app as well. One interesting feature, which is perhaps not surprising coming from Meta, is that the Llama app has a social feed. Users can elect to share their prompts and responses with their friends across Meta's ecosystem. Now, I don't think right now there's any sort of latent demand, quote unquote, for this kind of feature.

That said, Sam Altman has very publicly talked about the idea of potentially doing a social network from within ChatGPT. And just in general, it is always surprising what sort of things people actually like sharing and discovering about their peers and friends. Meta's VP of Product, Connor Hayes, said that the idea is to show people what they can do with AI.

Now, this is actually highly utilitarian. One of the things that we've seen for the last couple of years, vis-a-vis super intelligent, is that a lot of the barriers to AI usage are people just not knowing what to use it for. With every other technology, the pattern has been that a tiny handful of use case inventors and discoverers go out and figure out how to use a thing, and then we all copy them. And yet, for a couple of years, we kind of expected everyone to figure out how to use AI for themselves, which again, just runs counter to the way that technology has rolled out in the past.

Anyways, as for big announcements, those were definitely the highlights. There were a few more technical additions that might move the needle for some developers. In their blog post, for example, Meta highlighted the first of several infrastructure integrations they're calling Llamastack. Meta said that they envision Llamastack as the industry standard for enterprises looking to seamlessly deploy production-grade turnkey AI solutions. They also announced a set of security and moderation tools and developer grants. But overall, it was fairly muted.

When it came to people's response to this, TechCrunch argued that the entire conference was all about undercutting OpenAI. Daniel Campos wrote, And for some, it's hard not to feel like at this stage, Meta is pretty clearly behind. They're behind leaders OpenAI and Anthropic in the consumer and coding assistant markets,

at least according to the benchmarks. Their latest model has been overtaken by new open source releases out of China. And yet during his keynote, Zuckerberg laid out what he sees as the next chapter of the AI race playing out like. He said, "'Part of the value around open source is that you can mix and match. So if another model like DeepSeq is better, or if Quan is better at something, then as developers, you have the ability to take the best parts of the intelligence from different models and produce exactly what you need. This is part of how I think open source basically passes in quality all the closed source models.'"

It feels like sort of an unstoppable force. AI entrepreneur Ted Benson unpacked his takeaways, posting, "...the first LamaCon keynote just wrapped seconds ago, and I feel like I'm getting a sense of Meta's AI strategy for the first time. They didn't say it directly, but you could hear it between the lines." Many had speculated Zuckerberg was pursuing a commoditize-your-competitors approach, out of fear of being trapped as an app within yet another company's platform again. I don't think that's it.

If AI and AR represent an entirely new computing paradigm, that new paradigm will require a new operating system. And that new operating system will require a host of standard utilities like GNU utilities were to Linux. Small, fine-tuned models, large stock models, real-time voice models, 3D understanding models, image segmentation models, scene generation models...

Collectively, that sounds like a lot of the standard library for a completely different platform of AI and AR computing. The insistence that all LAMA derivatives be prefixed with LAMA- feels telling. The last 40 years we've been building atop GNU Linux, I think in five years Meta wants us to all be building atop LAMA-something. And adding some credence to that was the fact that throughout the entire event, and on his numerous podcast appearances, Zuckerberg wore the Meta Ray-Bans.

Now, taking a step back and moving away from meta to the broader question of where open source stands. It's important to remember that while DeepSeek R1 was a phenomenon, it wasn't because it outperformed things like OpenAI's R01 on the benchmarks. And indeed, in performance terms, it was quickly buried by releases from all of the major AI labs.

Why it had such resonance was that it was the first freely available reasoning model, the first time that consumers got their hand on reasoning in a free chat app, and because of all the scuttlebutt around how cheaply they had trained it.

In an appearance on the Dwarkesh podcast released alongside the conference, Dwarkesh asked Zuckerberg straight up about how he felt that Lama 4 Maverick is now ranked 35th on LL Marina and is generally behind and underwhelming on most of the benchmarks. Dwarkesh said, Zuckerberg responded,

The prediction that this would be the year where open source generally overtakes closed source as the most used models out there is generally on track to be true. Touching on the benchmark dominance of reasoning models, Zuckerberg said that the new paradigm of scaling test time compute is compelling and that a Lama 4 reasoning model would be coming soon. However, he added that for a lot of the things that we care about, latency and good intelligence per cost are actually much more important product attributes.

He also made the argument that benchmarks are gameable, especially when it comes to LM Arena, and said that tuning for benchmark performance had often led the company astray. He said, I think you just need to be a little careful with some of the benchmarks, and we're going to index primarily on the products. Now, if you look around, there continues to be plenty of skepticism of where Meta is right now. Earlier in the month, Fortune, for example, published a piece called Some Insiders Say Meta's AI Research Lab is Dying a Slow Death.

I'm not really sure. There's no doubt that open source competition is increasing, that the models out of China are putting intense competitive pressure on Zuckerberg and everyone else who's thinking about open source. It is also the case that open source models have not surpassed the big closed source models, especially as reasoning has become the dominant paradigm. I also do think, though, that Zuckerberg is playing an extremely long game here.

I do not believe that he views winning as who has the most downloaded app on the Apple App Store charts. I think he views winning as who owns the infrastructure in the future, which is basically what Ted Benson was arguing in that post. There is no doubt that certain competitive pressures may have forced Meta's timelines in ways that were a little uncomfortable and leave the appearance of being behind, but I am far from counting them out yet. But that at least is the story for now.

Appreciate you guys listening or watching as always. And until next time, peace.