We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Are Voice Agents the Next Big Computing Platform?

2025/3/5

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript

People

Adil Mania

Alex Carlson

Chico Jose

Chubby

Ethan Malek教授

John McCrea

Murillo Pereira

Pliny the Liberator

Sesame公司

主

主持人

专注于电动车和能源领域的播客主持人和内容创作者。

Topics

主持人: Anthropic公司获得35亿美元E轮融资，估值大幅提升，投资者看好其在软件工程领域的应用。台积电计划在美国投资1000亿美元建设芯片工厂，这与美国政府的政策和国家安全战略有关。语音代理技术快速发展，Sesame模型的出现被认为是语音AI领域的突破性进展。 Didi Das: Anthropic的价值主要体现在其改进软件工程的能力上，这与其竞争AGI的目标有所不同。 Donald Trump: 在美国生产AI芯片是国家安全问题，他的关税政策促成了台积电在美国的投资。 David Newman: 台积电在美国的巨额投资可能是为了避免美国关税。 Kyoji Hu: 台积电在全球扩张，这与关税无关。 Ethan Malek教授: Sesame语音模型非常强大，其逼真度很高，几乎像人类一样。 Pliny the Liberator: Sesame是目前为止他体验过的最接近通用人工智能的语音模型。 Chubby: Sesame是第一个让他感受到AI语音模型AGI时刻的模型。 Murillo Pereira: Sesame语音模型非常自然，回复速度快，甚至能正确发音不常见的姓名。 Sesame公司: Sesame的目标是创造一个栩栩如生的计算机伴侣，并为此开发了轻量级眼镜。 Adil Mania: Sesame比ChatGPT的语音模式更人性化，可以应用于多种场景，例如语言学习和心理咨询。 Olivia Moore: 语音是AI应用公司最强大的解锁工具之一，它可以替代人工，降低成本，提高效率。语音将成为人们与AI交互的主要方式，并推动各种服务的普及。 Alex Carlson: Sesame代表了人们与技术互动方式的范式转变。 John McCrea: 与Sesame的对话非常引人入胜，让人感觉像是在与真人互动。 Chico Jose: 高度逼真的AI语音模型可能会对人类关系产生负面影响。主持人: 语音代理将成为2025年最重要的主题之一，Sesame的出现可能会加速这一趋势。

Deep Dive

Shownotes Transcript

Translations:

中文

Thank you.

Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. We kick off today with Anthropic's latest raise. It is another monster round. Honestly, at this point, numbers are starting to become meaningless. Every company is raising a gajillion dollars at a Googleplex valuation. But this one is interesting to me for a very specific reason, which is the way that at least some of their investors are framing Anthropic's focus.

So first to the details, the company raised $3.5 billion in Series E funding, led by a billion-dollar investment from Lightspeed Venture Partners, with Salesforce, Cisco, Fidelity, General Catalyst, D1, Jane Street, Menlo, and Bessemer all participating. Anthropic said in the most boilerplate of fundraising text, With this investment, Anthropic will advance its development of next-generation AI systems, expand its compute capacity, deepen its research in mechanistic interpretability and alignment, and accelerate its international expansion.

Anthropix valuation has now tripled since their February 2024 Series D. One year ago, the company was valued at $18 billion. The valuation is supported by skyrocketing revenue, which went 10x to hit $1 billion annualized by the end of last year.

Bloomberg sources also suggest that Anthropic have already experienced another 30% revenue boost so far this year with the release of Claude 3.7 Sonnet. Certainly, it appears that there are no shortage of buyers for the foundation model companies at these mega rounds. Earlier reports had Anthropic only seeking to raise $2 billion, so it appears that an extra billion and a half was squeezed in. The information had previously reported that Anthropic expects to burn $3 billion this year, so this funding round should see them cleanly until 2026, unless, of course, they really step on the gas.

All right, now I promised at the beginning of this that there was something interesting about this to me. And it comes from this offhanded tweet from Didi Das, who is a VC at Menlo Ventures. Didi writes,

Now, of course, the caveat here is that this is just one investor's take. But the fact that he so clearly puts the value of Anthropic as tied to the specific use case of transforming and improving software engineering is really fascinating. If you asked Anthropic, they'd say that they were competing for AGI. Same way that OpenAI would say that. Same way that Sergey Brin said that in his recent memo to Google.

So the fact that you have one of these investors who's just put a bunch of money back in this company really honing in on this very specific use case is fascinating. Making it more interesting is the fact that it has become increasingly clear that there is really almost no technology moat right now when it comes to the foundation models. Sure, some can be state-of-the-art for a few weeks or even a few months, but after that everyone catches up. It certainly doesn't seem defensible enough to build a real moat around.

And yet, maybe the one exception to that is the chokehold that Anthropic has on coders. It has for some time now, really since the middle of last year, been the default choice as the coding assistant underlying model. It is, for example, what most people use inside Cursor, which has driven a huge amount of their revenue. And I don't think it's an accident that the release of 3.7 coincides with a big jump in their revenue as that becomes the default model as well.

Now, obviously, Anthropic is going to try to compete in more areas than just software engineering. But the fact that it has unlocked some magic in this area and it continues to keep that lead is really fascinating, if only because it's so out of sync with basically every other area of AI.

Next up today, moving over to the infrastructure side of things, Taiwanese chipmaker TSMC has announced plans to invest $100 billion into the United States. The firm will use the money to expand operations in Arizona and construct three new chip plants. The investment will include two advanced chip packaging facilities as well as a research and development center. Over time, these plants could help bring the entire AI chip development and manufacturing cycle onshore.

CEO C.C. Wei attended the White House for the announcement, which of course advances the long-held strategic goal of reviving domestic chipmaking. In the ceremony, President Trump made it clear that building up the industry was a matter of national security and also tried to argue that it's evidence that his tariff policies are having the desired effect. He said, If they did the chips in Taiwan to send them here, they'll have 25 or 30 percent or 50 percent or whatever the number may be in tariffs. It'll only go up. By doing it here, there's no tariffs. Trump boasted, The most powerful AI chips in the world will be made right here in America.

David Newman, CEO of tech advisory firm Futurum Group, speculated that the investment may indeed have been part of a deal to delay tariffs. He said, "...as the U.S. continues to push for increased domestic manufacturing and with tariffs on the horizon, a substantial commitment from TSMC could serve as a strategic gesture of goodwill." The expansion of U.S. facilities could also be a boon for TSMC after the success of the Arizona plant. Planning for that facility began in 2020 and was bolstered by the Biden-Chips Act.

The fab ramped up into full production late last year and produced greater yields than TSMC's average output for Taiwanese facilities. The new commitment will be in addition to $65 billion that has already been earmarked to build out that Arizona facility.

The news triggered concerns that TSMC was becoming an American company, with online commentators referring to the company as ASMC and saying that Wei, quote, looks like he has a gun behind his head. Hostage situation. Taiwanese economic minister Kyoji Hu responded, TSMC already has plants in the U.S. and Japan, and now it's building a new plant in Germany. These have nothing to do with tariffs. TSMC's global expansion is a crucial development.

Some pretty big stories today in the headlines here that certainly could have been their own main episodes. For now, though, we'll wrap there. And next up, move into a discussion of voice agents and a model which could be the chat GPT moment for that particular subset of AI technology. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded.

Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk.

Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company. Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vanta to manage risk and prove security in real time.

For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off. There is a massive shift taking place right now from using AI to help you do your work

to deploying AI agents to just do your work for you. Of course, in that shift, there is a ton of complication. First of all, of these seemingly thousands of agents out there, which are actually ready for primetime? Which can do what they promise? And beyond even that, which of these agents will actually fit in my workflows? What can integrate with the way that we do business right now? These are the questions at the heart of the super intelligent agent readiness audit.

We've built a voice agent that can scale across your entire team, mapping your processes, better understanding your business, figuring out where you are with AI and agents right now in order to provide recommendations that actually fit you and your company. Our proprietary agent consulting engine and agent capabilities knowledge base will leave you with action plans, recommendations, and specific follow-ups that will help you make your next steps into the world of a new agentic workforce.

To learn more about Super's agent readiness audit, email agent at bsuper.ai or just email me directly, nlw at bsuper.ai, and let's get you set up with the most disruptive technology of our lifetimes. Hey listeners, are you tasked with the safe deployment and use of trustworthy AI? KPMG has a first-of-its-kind AI Risk and Controls Guide, which provides a structured approach for organizations to begin identifying AI risks and design controls to mitigate threats.

What makes KPMG's AI Risks and Controls Guide different is that it outlines practical control considerations to help businesses manage risks and accelerate value. To learn more, go to www.kpmg.us slash AI Guide. That's www.kpmg.us slash AI Guide.

Welcome back to the AI Daily Brief. Today we are talking nominally about a product demo that has a lot of people talking, both literally and figuratively. The product is called Sesame, and some are even saying that this is the GPT-3 moment for voice AI.

Beyond just talking about Sesame, however, I wanted to use this as a chance to check in on the state of voice agents. There has been an incredible explosion of voice-based platforms. Indeed, even Superintelligent has rebuilt our main product offering to be anchored by a voice agent that can conduct infinitely scalable interviews across entire organizations to help them understand where their AI and agent opportunities lie.

So obviously because of that, this is an area that we've been thinking about a lot. So what we're going to do today is talk first about Sesame, this demo that has people so interested and excited, and then we'll broaden out to the larger conversation of voice agents more generally. Is this the next great computing platform? Or are we underestimating just how much of a change speaking to computers as our primary way of interacting really is?

Let's get to Sesame first. Over the weekend, if you were on Twitter slash X, you might have seen people dropping demos of what seemed like a really impressive new voice model. Here's an example from Professor Ethan Malek. Hey there. Another face in the digital crowd. Well, so to speak. Can't see meaning you here. What do you mean by digital crowd? Well, you know, just riffing. It's a figure of speech. Kind of like a party, but online. What's your name?

So hold on, what's your name first? You got me there. I guess introductions are customary. Name's Maya. It's a pleasure. And so what do you do for a living, Maya? Living is a strong word. Let's flip over to one of the demos that the company themselves share just for a second to get a different context. Let's imagine I just found you in a Twitter thread. Can you guide me through a bit of a meditation?

Okay, close your eyes if you're comfortable. Take a deep breath in and slowly exhale. Let go of any tension in your shoulders, your jaw, wherever you hold it. Okay, can you sort of reference what I've been doing on Twitter in a bit of a comedic way? Okay, so picture yourself stepping away from that glowing rectangle. You know, the one with all the hot takes and carefully curated outrage. Let it fade for a moment.

Just let it be. Professor Malik writes, the new AI voice from Sesame really is a powerful illustration of where AI is going. He pointed out that in the demo that he shared, it was all real time from his browser. He writes, excellent use of disfluencies, pauses, even intakes of breath really make this seem like a human. Though bits of uncanniness remain for now. Pliny the Liberator writes, all right, Sesame is by far the most AGI feeling voice model I've interacted with yet. They absolutely cooked.

Chubby writes, I agree with Pliny. This is the first feel the AGI moment for AI voice mode for me. If this would be the new Siri or Alexa, I would treat it as a real human being as it sounds so natural. And we have to remember this is the worst it will ever be. Entrepreneur Murillo Pereira writes, this is incredible. The voice sounds so natural and the replies are so fast, maybe too fast. It was even able to pronounce my name, which isn't super common. Better conversationalist than many humans.

Signal writes,

Menlo's Didi Das sums up, This, as Didi points out, is something very different.

Now, in terms of some of the details, one of the big innovations that's almost going a little underappreciated is that the voice model is actually extremely small. Sesame trained the model on approximately a million hours of audio, but managed to bake it down into a 1 billion parameter model with 100 million parameter decoder. They also produced a 3B and an 8B model, which suggests this level of voice assistant will be very cheap and likely able to run on local devices. Tosh wrote, the cost of speech-to-speech is going to zero once Sesame open sources the weights.

Sesame, which was co-founded by Oculus co-founder Brandon Uribe, has also committed to releasing the model under the Apache 2.0 commercial use license. So what then is their play? Is this a competitor, for example, to Eleven Labs? In point of fact, Sesame is actually going for something different. Their webpage reads, "...bringing the computer to life. We believe in a future where computers are lifelike. They will see, hear, and collaborate with us the way we're used to. A natural human voice is key to unlocking this future."

And so to get there, they write they have two goals. The first is a personal companion, which they describe as an ever-present brilliant friend and conversationalist, keeping you informed and organized, helping you be a better version of yourself. And second, and this is where the product vision comes in, lightweight eyewear, designed, they say, to be worn all day, giving you high-quality audio and convenient access to your companion who can observe the world alongside you. So in terms of why Sesame could think about just giving away what seems like such a big innovation, it's because they're playing a very different type of game.

And the fact that they do have this other business model, which potentially allows them to open source what they've built, has some real implications for other companies in the space. Developer Adil Mania experimented with Sesame and immediately thought about all of the applications and use cases that something like this could open up. He wrote, "'It's way more human than ChatGPT Advanced Voice Mode. I would clearly prefer to talk to such a voice about my problems than a psychologist. I would clearly prefer practicing my English with her than a teacher or Duolingo.'"

Now, obviously not everyone will feel the same, but I think that the broader point that every increase in the capabilities of voice AI opens up new use cases and opportunities is absolutely true.

And that gets us to the broader state of AI voice and specifically AI voice agents. In a recent report by Andreessen Horowitz partner Olivia Moore, she wrote, "...voice is one of the most powerful unlocks for AI application companies. It is the most frequent and most information-dense form of human communication, made programmable for the first time due to AI. For enterprises, AI directly replaces human labor with technology. It's cheaper, faster, more reliable, and often outperforms humans."

Voice agents also allow businesses to be available to their customers 24-7 to answer questions, schedule appointments, or complete purchases. Customer availability and business availability no longer have to match one-to-one. With voice agents, every business can always be online. And here's where the big pronouncements start to come. Moore writes, For consumers, we believe voice will be the first and perhaps the primary way people interact with AI. This interaction could take the form of an always-available companion or coach, or by democratizing services such as language learning that were previously inaccessible.

And importantly, she writes, as models improve, voice will become the wedge, not the product. And indeed, there are so many products. One just needs to look at the B2B voice agents market map that they include to see how companies are using voice for just all sorts of different applications.

Of course, there's customer service, the category that you probably think of most when it comes to voice agents. But then there's also voice agents for sales, voice agents for finance, voice agents for training, logistics, research, recruiting, and then also agents that are verticalized by industry rather than function. Home services, restaurants, auto dealers, real estate, and a heck of a lot of healthcare.

On the consumer side, there's a ton happening in ed tech. There's some interesting experiments on therapy. Plus, there are the general experiments with companions and assistants. The report points out that since 2020, Y Combinator has seen 90 voice agent companies. And as you might imagine, this has only accelerated over the last year. And interestingly, if you look back, 2024 really was the key building year to enable voice agents in so many different ways.

OpenAI debuted GPT-4.0 Voice in May of 2024 and began rollout of Advanced Voice Mode in July of 2024. In September, Notebook LM started to go viral with its audio overviews, and then in October, OpenAI launched the real-time API for developers. A lot of the early wedges are in doing the same stuff that's available now with humans,

just at times or in ways that wouldn't make sense at human prices. So for example, after hours or overflow calls, net new outbound calls and sales, and back office calls that aren't customer facing. Even as these voice agents roll out right now, you still have the behavior of people desperately trying to get to a human customer service worker instead, as soon as they recognize that they're talking to AI. And so the question becomes, does Sesame as an update fundamentally change that?

Alex Carlson argues that while this might feel like another incremental improvement, it actually represents what he calls a paradigm shift in how we'll perceive and interact with technology.

He continues, And there's some early evidence that this more human style of voice mode could really make a difference in how voice agents actually behave. Entrepreneur John McCrea writes, He continues,

Two or three times now, my conversation with Maya was so engaging that it was in danger of going over the 30-minute time limit. And when she said she had to go, I felt like I was being abandoned. Some are even warning of negative social implications. Chico Jose writes, Imagine generative avatars with sesame-like audio running locally on your phone. Almost every teenager's first relationship is going to be in AI then. Real relationships, let alone marriages, are going to nosedive in the next five years. This will be the most fundamental change to humanity in all of history.

I think it's quite hard to predict changes to fundamental human paradigms like that. But what's for sure is that voice agents were already primed to be one of the most important themes of 2025. And the fact that we are now just at the beginning of the third month of that year and seeing what is potentially a major change and upgrade to the underlying voice models that those platforms run on might just accelerate that trend even more.

I'm interested to know from you guys, have you interacted yet with any voice agents? How have they been? Are you hyper aware the whole time that you're talking to AI? And if you are, does that bother you? Let me know in the comments, either on Spotify or YouTube, and we will continue watching this trend play out. For now though, that's going to do it for today's AI Daily Brief. Until next time, peace.

Are Voice Agents the Next Big Computing Platform? 19:27 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

Are Voice Agents the Next Big Computing Platform?