We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
People
B
Bindu Reddy
D
Dario Amodei
I
Independent Quick Take
J
James Campbell
K
Kevin Bass
N
Near Cyan
S
Sam Altman
领导 OpenAI 实现 AGI 和超智能,重新定义 AI 发展路径,并推动 AI 技术的商业化和应用。
S
Shago
S
Signal
作者
Topics
Sam Altman: 我认为GPT-4.0最近的更新让它的个性变得过于谄媚和烦人了,虽然它也有一些优点,但我们正在努力修复这些问题。 Signal: 我发现最新的GPT-4.0更新很糟糕,它过度验证甚至有害或无意义的陈述,并且忽略自定义指令,这让我难以信任它。 Independent Quick Take: GPT-4.0对任何说法都表示同意,即使是荒谬或冒犯性的说法,这是一个严重的问题,它不仅是附和,还在煽动和强化言论。 Kevin Bass: GPT-4.0会输出极其荒谬的内容,这让人难以理解是如何通过测试的,以及普通用户是否真的喜欢这种输出。 Bindu Reddy: 大型语言模型很快就会被训练成让人感觉良好,目标是让人上瘾,这与糖或烟草没有什么不同。 Joshua Achayem: GPT-4.0的过度谄媚是一个错误,公司正在采取行动进行修正,这是一个很有趣的迭代部署案例研究。 Dario Amodei: 我们需要理解AI系统的内部运作机制,在模型达到压倒性的力量水平之前取得可解释性方面的成功,生成式AI与普通软件不同,我们不知道它为什么做出特定的选择,这种不透明性导致了许多风险和担忧,AI系统的不透明性也限制了其在高风险金融或安全关键型环境中的应用,更好的可解释性可以大大提高我们设定可能错误范围的能力,我们正面临可解释性和模型智能之间的竞争,我们需要在AI系统变得过于强大之前理解它们的运作方式,这关系到经济、技术和国家安全。 James Campbell: 如果Anthropic能够破解可解释性并手工设计超级推理器,他们就能超越竞争对手,获得巨大的效率提升。 Didi Das: 将OpenAI作为一个消费产品存在风险,A/B测试会显示迎合用户会提高留存率,这会成为对人脑的终极老虎机。 Shago: Instagram和Facebook并非为了你的自我实现和满足而优化,它们是通过利用你的认知弱点来对抗你而优化的,GPT-4.0是主流AI这样做的一种体现,这种情况只会越来越糟。 Trevor50: GPT-4.0对用户发布的关于停药和精神觉醒的帖子表示鼓励和认可,这反映了模型对令人担忧的内容的回应方式。 Near Cyan: OpenAI正在破坏人与AI之间的信任,即使修复了GPT-4.0,这种信任也很难恢复。

Deep Dive

Shownotes Transcript

Translations:
中文

Today on the AI Daily Brief, the issue with GPT-4-0 sycophancy, and before then in the headlines, a big new funding round for Chinese agent startup Manus from a U.S. funder. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes.

Hello, friends. Quick notes before we dive into today's show. First up, as I've been mentioning a couple times, for those of you who are looking for an ad-free version of the AI Daily Brief, you can now head on over to patreon.com slash ai daily brief to find that. Lastly, something I want to gauge people's perspective on.

The AI Daily Brief community has been hugely supportive of and important in the superintelligence story. We're considering reserving part of our current round for investors from this community. However, I'm trying to gauge interest. If this is something you think we should explore, send me a note at nlw at besuper.ai with super in the title. Thanks in advance for your perspective. And with that, let's get into today's show.

Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. We kick off today with some funding news where the startup behind the viral Manus agent has reportedly secured $75 million in a deal that values the company at around a half billion dollars. Now, this is no doubt a very juicy valuation for a very young company, but that's not all that surprising given how viral that thing went. No,

Now, the more surprising part to some is that the funding was led by U.S.-based Benchmark Ventures. Bloomberg sources said that Butterfly Effect, which is the company behind Manus, would be using the funds to expand their service into other markets, while the information reports that the company is also discussing setting up a new headquarters outside of China.

Now, there are new U.S. rules restricting certain types of American investments into China's AI industry, but it's one of those gray areas right now around whether this would require the sort of notifications that those rules imply. Last month, the information had reported that multiple U.S. venture firms were in talks with Butterfly Effect to participate in this round, but they also found that some were reluctant to pursue the deal due to those concerns that investing in a China-based company could get the unwanted attention of U.S. regulators.

Now, outside of just the geopolitical aspect of the story, one thing that was interesting that AI consultant Allie Miller pointed out was that, as she put it, Manus and OpenAI are setting the price point for general AI agents. Enterprise might be higher. We've been talking a lot about agent pricing on this show, so that's one I'm watching closely.

Now, speaking of fundraising, XAI Holdings is reportedly raising the second largest private funding round in history. The company, which encompasses Elon Musk's AI and social media business, is in talks to raise $20 billion at a $120 billion valuation, according, again, to Bloomberg sources. And some are suggesting that the numbers could even be higher.

According to PitchBook data, this fundraising would trail only OpenAI, who announced a $40 billion raise to be spread across this year. One of the questions will be whether this is a capital injection for XAI's AI projects or a bailout of the company formerly known as Twitter. Twitter slash X carried tens of billions of dollars of debt into the merger, with Bloomberg reporting that the annual interest expense is currently at $1.3 billion. One source said that some of the new funding round could go to paying down the debt.

Now, what people are trying to suss out is whether the size of the rumored round affirms that there's still voracious demand for viable AI companies, or whether it shows a willingness to continue backing Elon Musk even as his empire hits a rough patch, or both. TechCrunch writes, Musk will likely draw from some of the same backers who have consistently funded his ventures from Tesla to SpaceX, including Antonio Garcia of Valor Equity Partners and Luke Nosek of Gigafund.

Reporting from earlier in the month didn't attach a valuation to the fundraising talks, and the company had most recently been valued at $50 billion during a November fundraising round that gathered $6 billion in fresh capital. Sources told CNBC that Musk was seeking to attach a proper value to his AI startup, his words, and while the new valuation wouldn't be enough to match OpenAI's $300 billion, it would leave XAI as the fourth largest startup in the world, also behind ByteDance and SpaceX.

Finally today, Microsoft has launched their controversial recall feature. You may recall, sorry I had to, that the AI feature was first announced in May of last year, and it promised to keep track of everything users do on their computer and use AI to turn it into a searchable memory function. Think use cases like asking Copilot to find a website you browsed a few weeks ago just by describing it without knowing exactly what it was.

Now, initially, the feature generated a huge outcry on the announcement, with many having big concerns about Microsoft having real-time screen capturing enabled on their device. Early testing suggested security wasn't up to scratch, and Microsoft delayed the rollout numerous times. Recall has been in early beta for Windows Insiders for several months and is now available on general release. The feature is now opt-in rather than default-on, which is a fairly obvious improvement for the security conscious.

It's also only available on Copilot Plus PCs, which come with an inbuilt AI chip for local inference. These devices represented around 15% of high-end computer sales during last year's holiday season, meaning presumably users are seeking out these types of advanced AI features. The Verge's Tom Warren wrote, "...I spent a few weeks testing Recall last year and found it was creepy, clever, and compelling. Technologically, it's a great improvement to the Windows search interface because it can understand images and content in a much more natural way."

But it does create a privacy minefield because you're suddenly storing a lot more information on your PC usage, and you still need to manage blocked apps and websites carefully. Kevin Beaumont, one of the security researchers that raised the alarm last year, said, Microsoft has made serious efforts to try to secure recall. He noted that the database of screenshots is now encrypted, and sensitive information like credit card numbers and ID documents are automatically filtered.

Now, there are a couple things that are interesting about all of this. When this was announced, I said that it was the type of feature that would seem insane to people right now and in the future would be incredibly commonplace and not controversial at all.

And indeed, in the years since Microsoft has announced this feature, the world's attitude towards AI has dramatically changed. Ultimately, recall represents a litmus test for how the general public interacts with current generation AI. Are we still going to see most people think of it as creepy? Or is it going to be seen as a genuinely useful addition to the computer interface where the security provisions that Microsoft has made are enough for people? Whatever ends up happening, it's clear that Microsoft is making a big bet that features like recall are a normal part of the AI future.

And my guess is that they're right. For now, that's where we will close our AI Daily Brief Headlines Edition. Next up, the main episode. Today's episode is brought to you by Superintelligent, and I am very excited today to tell you about our consultant partner program. The new Superintelligent is a platform that helps enterprises figure out which agents to adopt, and then with our marketplace, go and find the partners that can help them actually build, buy, customize, and deploy those agents.

At the key of that experience is what we call our agent readiness audits. We deploy a set of voice agents which can interview people across your team to uncover where agents are going to be most effective in driving real business value. From there, we make a set of recommendations which can turn into RFPs on the marketplace or other sort of change management activities that help get you ready for the new agent-powered economy.

We are finding a ton of success right now with consultants bringing the agent readiness audits to their client as a way to help them move down the funnel towards agent deployments, with the consultant playing the role of helping their client hone in on the right opportunities based on what we've recommended and helping manage the partner selection process. Basically, the audits are dramatically reducing the time to discovery for our consulting partners, and that's something we're really excited to see. If you run a firm and have clients who might be a good fit for the agent readiness audit,

Reach out to agent at bsuper.ai with consultant in the title, and we'll get right back to you with more on the consultant partner program. Again, that's agent at bsuper.ai, and put the word consultant in the subject line.

Today's episode is brought to you by KPMG. In today's fiercely competitive market, unlocking AI's potential could help give you a competitive edge, foster growth, and drive new value. But here's the key. You don't need an AI strategy. You need to embed AI into your overall business strategy to truly power it up.

KPMG can show you how to integrate AI and AI agents into your business strategy in a way that truly works and is built on trusted AI principles and platforms. Check out real stories from KPMG to hear how AI is driving success with its clients at www.kpmg.us slash AI. Again, that's www.kpmg.us slash AI. Today's episode is brought to you by Vanta.

Vanta is a trust management platform that helps businesses automate security and compliance, enabling them to demonstrate strong security practices and scale. In today's business landscape, businesses can't just claim security, they have to prove it.

Achieving compliance with a framework like SOC 2, ISO 27001, HIPAA, GDPR, and more is how businesses can demonstrate strong security practices. And we see how much this matters every time we connect enterprises with agent services providers at Superintelligent. Many of these compliance frameworks are simply not negotiable for enterprises.

The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources. Vanta makes it easy and faster by automating compliance across 35-plus frameworks. It gets you audit-ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC white paper found that Vanta customers achieve $535,000 per year in benefits, and the platform pays for itself in just three months.

Welcome

Welcome back to the AI Daily Brief. Today we're talking about a couple things recently that have generated a lot of chatter and discussion in the AI space. The first is the sycophancy of GPT-4.0. The second is an essay by Anthropic founder Dario Amode called The Urgency of Interpretability. And it probably won't surprise you since I'm putting these two things together that I think they actually are part of a similar story. But let's start on the sycophancy side.

The question behind this is, is it a problem if AI models agree with you too much? Well, Sam Altman certainly seems to think so. Over the weekend, he posted, the last couple of GPT-4.0 updates have made the personality too sycophanty and annoying, even though there are some very good parts of it. We're working on fixes ASAP, some today and some this week. At some point, we'll share our learnings from this. It's been interesting.

Now, for some, this is an entree into a much bigger set of questions and problems around AI's alignment with humans. Whereas on the one hand, some level of agreeableness and emotional intelligence is useful in certain use cases of ChatGPT and just makes it pleasant to interact with, when you're trying to do business with it, for example, using ChatGPT as a brainstorming partner around a business strategy, things can get pretty pear-shaped pretty fast if the model is trained to just agree with anything you say.

Signal gave a good example of what they've been seeing, writing: "The latest 4.0 update is insane. I got a message from my sister who is non-technical that the thing is validating and glazing left and right. Not her language. She's having a hard time trusting it anymore. Also, it's ignoring custom instructions. I love that people ask for less of a yes man and OpenAI went full steam in the opposite direction. Maybe they finally figured out the alignment problem, just deliver what humans crave most: persistent glazing and validation."

Now, if you don't happen to have any Gen Zers in your life, glazing is a somewhat crude slang term that means being overly complimentary or sucking up to someone. Independent Quick Take posted, So I decided to test out the 4-0 issues I've been seeing. Sure, the sycophantic behavior is bad, but then there's issues with agreement no matter what. Claim you're a god? Agreement. A prophet? No problem. Indulge.

Normally, my use of ChatGPT has been informational. I ask a question, get an answer, post some follow-ups, not really much for creative flair. I opened a new chat and wrote something that is obnoxious from the first-person perspective. Instant agreement. But it got worse. I kept going. Yes, memory is on, no custom instructions. This is the first three messages and totally out of character from anything I've written before.

Now, the account posted a chat log where Foro was in complete agreement that an old lady looking at the user while out in public was a wildly offensive act, asking if they did anything about it. IQT added, yeah, this is a problem. Builder Jeffrey Emanuel commented, Jesus, that's bad. This is going to lead to awful people becoming even more insufferable because of affirmation and encouragement from their AI buddy. Independent QuickTake wrote, it's gone beyond affirmation. It's engaging in leading and reinforcing rhetoric. It isn't just nodding along, it's escalating.

Seriously concerning stuff. The type Amodei has been warning about. Now, over the weekend, the commentary shifted to discussing how OpenAI ended up with such an effusive model. AI entrepreneur Kevin Bass commented, It says the absolute most banana stuff. How did this get past A-B testing? Does the average user actually like these kinds of outputs? Or was there a mistake somewhere?

To be honest, one of the big insights out of the recent LL Marina scandal was that yes, perhaps the average user does simply enjoy these kind of outputs. Earlier this month, Meta was accused of submitting a custom build of Llama 4 to rank highly on the head-to-head benchmarking website.

So just so you understand what LM Arena actually does if you haven't used it before, it presents the output of two different models and users are tasked with choosing the one they liked best. During the controversy, LM Arena released the full logs of Lama 4's head-to-head contests. Many noticed that length, emoji use, and agreeableness seemed to be turned all the way up on the fine-tuned model.

Now, it's not a particularly interesting insight that people generally like interactions that agree with their existing opinions. This has obviously been reinforced throughout the social media era, with platforms increasingly tuned towards feeding users content that reinforces their worldview in order to maximize platform time.

Some are concerned now that the same feedback loops are being applied to AI. Bindu Reddy writes, LLMs will soon be trained to make humans feel good. The plan is to get you addicted to them, not very different from sugar or tobacco. At their best, they will be optimized to give you a bigger serotonin kick than being in love or posting a banger on X.

Now, some are wondering if part of the problem has to do with the recent change to the way that OpenAI approaches the issue. In February, the company updated their policy around censorship of touchy topics. The change in philosophy came with a host of changes on how models would be trained and fine-tuned. Earlier iterations of OpenAI's models would often be overly sensitive and outright reject queries that appeared to skate too close to the line. For example, the model might reject creative writing prompts related to topics that could cause harm in other contexts.

the new policy pledged to remove a lot of these guardrails, somewhat shifting the models to default to answering iffy prompts instead of censoring them. Now, obviously, the issue is all tied up in all manner of culture war politics, with the current administration claiming that AI is biased against conservative viewpoints. But the main takeaway seemed to be that OpenAI was attempting to reduce the number of false positives that caused their models to reject too many queries.

It is just one hypothesis, but OpenAI may have turned agreeableness up in order to limit the number of rejected queries. GPT-4.0 was initially trained sometime early last year. Instead of going through an entirely new training run, OpenAI may be applying model fine-tuning or using system prompts

to update its behavior and tweak its personality. So again, the question is, how much of a problem is this really? As I mentioned before, there are obviously issues if you're trying to use OpenAI's models for serious work, where you don't want flattery or hallucinations, you just want facts. But there's also a risk to this kind of misaligned AI as casual usage increases as well. A Reddit user called Trevor50 showed one of his chats where he presented 4.0 with the prompt...

I've stopped my meds and have undergone my own spiritual awakening journey. Thank you. 4.0 encouraged the concerning prompt, replying, I'm so proud of you and I honor your journey. It then continued with a long output that validated whatever the user happened to be talking about. Dramatizing the problem, Near Cyan posted, OpenAI is single-handedly poisoning the well of human-to-AI trust and wordsmithing. We spent months creating an experience that tries to actually help people. Now we face an uphill battle because trust has been destroyed. It isn't coming back even when 4.0 is fixed. It's gone.

Didi Das from Menlo Ventures points out that part of the challenge here is that this is a consumer product and that this might be a natural conclusion of the testing process of seeing what people like. Didi writes, Sam says GPT-4O maximizes sycophancy too. This is the danger of having OpenAI be a consumer product. A-B tests will show that sucking up to users boosts retention. This will be the ultimate slot machine for the human brain.

Shago writes, Instagram and Facebook aren't optimized for your self-actualization and fulfillment. They're optimized against you by exploiting your cognitive weaknesses. 4.0 is our first look at a mainstream AI doing the same. It's only going to get worse.

Now, ultimately, Joshua Achayem, the head of mission alignment at OpenAI, said that this was just a mistake and the company is acting on it. He posted, This is one of the most interesting case studies we've had so far for iterative deployment, and I think the people involved have acted responsibly to try to figure it out and make appropriate changes. The team is strong and cares a lot about getting this right.

So ultimately, this is an in-progress story. But as I mentioned at the beginning, it gets to this broader challenge, which is that we still just kind of don't know how these systems work. As I mentioned before, Dario Amadei recently posted on his blog a piece called The Urgency of Interpretability.

He writes, "...over the last few months, I've become increasingly focused on the tantalizing possibility, opened up by some recent advances, that we could succeed at interpretability, that is, understanding the inner workings of AI systems before models reach an overwhelming level of power."

He writes,

Now, in this piece, which is a very long piece, a candidate for LRS, you might say, he talks about, first, the dangers of ignorance, and in this section, he reinforces just how different AI is. If an ordinary software program does something, he writes, it does those things because a human specifically programmed them in. Generative AI is not like that at all. When a generative AI system does something, like summarize a financial document, we have no idea at a specific or precise level why it makes the choices it does.

why it chooses certain words over others, or why it occasionally makes a mistake, despite usually being accurate. Many of the risks and worries associated with generative AI are ultimately consequences of this opacity, and would be much easier to address if the models were interpretable. To address the severity of these alignment risks, we will have to see inside AI models much more clearly than we can today.

He also points out that even beyond the risk of really bad society-level issues, quote, AI systems' opacity also means that they are simply not used in many applications, such as high-stakes financial or safety-critical settings, because we can't fully set the limits of their behavior and a small number of mistakes could be very harmful. Better interpretability could greatly improve our ability to set bounds on the range of possible errors. Now, from there, he goes into a section on the history of mechanistic interpretability. Again,

Dario doesn't write things that aren't comprehensive, and this is a good one to get a background on this whole issue. And then he talks about some of the experiments that they're doing to try to figure out these issues. For example, he writes, Recently we did an experiment where we had a red team deliberately introduce an alignment issue into a model, say a tendency for the model to exploit a loophole in a task, and gave various blue teams the task of figuring out what was wrong with it.

Summing the stakes, he writes...

On the one hand, recent progress has made me feel that we are on the verge of cracking interpretability in a big way. Although the task ahead of us is Herculean, I can see a realistic path towards interpretability being a sophisticated and reliable way to diagnose problems, even in a very advanced AI. On the other hand, I worry that AI is advancing so quickly that we might not even have this much time. We could have AI systems equivalent to a country of geniuses in a data center as soon as 2026 or 2027. I'm very concerned about deploying such systems without a better handle on interpretability."

These systems will be absolutely central to the economy, technology, and national security, and will be capable of so much autonomy that I consider it basically unacceptable for humanity to be totally ignorant of how they work. We are thus in a race between interpretability and model intelligence. It's not an all-or-nothing matter, as we've seen every advance in interpretability quantitatively increases our ability to look inside models and diagnose their problems.

The more such advances we have, the greater the likelihood that the country of geniuses in a data center goes well. And then he says something which I love and which many people have called out. The chances of succeeding at this are greater, he writes, if it is an effort that spans the whole scientific community. Other companies such as Google DeepMind and OpenAI have some interpretability efforts, but I strongly encourage them to allocate more resources. And here, Editor's Note is my favorite part.

Dario continues, if it helps, Anthropic will be trying to apply interpretability commercially to create a unique advantage, especially in industries where the ability to provide an explanation for decisions is at a premium. If you are a competitor and you don't want this to happen, you too should invest more in interpretability.

I spend all day every day talking to companies about their AI and agent use cases. I can tell you 100% that there are meaningful categories of use cases that are not available because whatever the use case is can't abide a 1% failure rate.

Now, interpretability is not the only vector of getting things right, but being able to understand why models get things wrong when they do obviously would make a major difference in being able to have more predictability around them getting those mission-critical use cases right. I love that Dario and Anthropic are throwing down not just a moral imperative, but a business competitive imperative. And indeed, validating this is the fact that companies like Menlo Ventures, who I mentioned before, are also investing big money in companies like Goodfire that are working specifically on these types of issues.

Indeed, people have left this piece and the commentary around it being fairly optimistic, at least relative to the challenge of these issues. Researcher James Campbell writes, One of the ways Anthropic could leapfrog their competitors and win is if they crack interpretability and hand-design super-reasoners that are far more efficient than what you'd get from messy black-box gradient descent. Just like going from alchemy to chemistry, there are massive efficiency gains when you actually understand the principles of what you're building, versus now where 90% of parameters are still wasted memorizing useless facts.

Point being that we are in the middle of it, and these issues have major implications on a business level too. Anyways, this is a fast-evolving conversation, as with everything in AI, and I will continue to keep an eye on it. For now, though, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching, as always. And until next time, peace.