We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

What Comes Next In AI & Agents (According to Y Combinator)

2025/2/5

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript

People

主

主播

以丰富的内容和互动方式帮助学习者提高中文能力的播客主播。

Topics

主播: 本期节目讨论了Y Combinator对AI和智能体未来发展的预测,以及软银、Meta和Anthropic等公司在AI领域的最新动向。Y Combinator认为,AI基础设施建设、智能体应用和AI应用是未来发展的三大方向,并看好垂直领域AI智能体、AI个人助理以及软件工程领域AI智能体的应用。此外,节目还讨论了AI风险管理、合规以及AI对就业市场的影响等问题。总的来说,Y Combinator的预测以及各公司的动向都表明,AI和智能体技术将深刻地改变我们的生活和工作方式。孙正义: 我相信通用人工智能(AGI)的到来比我预想的要早得多。软银将大力投资AI领域,并计划利用Crystal Intelligence自动化1亿个工作流程。 Meta: 我们发布了一份新的政策文件,声明我们可能不会发布我们认为风险过高的模型。我们将依赖内部和外部研究人员的意见来评估AI风险,并采取措施降低风险。 Anthropic: 我们正在挑战黑客攻破我们新的AI安全系统,该系统声称可以阻止95%的越狱尝试。我们通过训练一个新的宪法分类器来构建该系统,该分类器基于定义允许和不允许行为的原则。保罗·麦卡特尼: 甲壳虫乐队的歌曲《Now and Then》获得了格莱美奖,这是AI辅助歌曲首次获得该奖项。但需要澄清的是,这首歌中没有使用人工智能合成任何内容,只是使用了AI技术来清理存档的演示曲目。 Y Combinator: 我们认为AI和智能体基础设施建设、垂直领域AI智能体应用、AI个人助理以及软件工程领域AI智能体的应用是未来发展的重要方向。我们相信,AI将改变软件工程领域,软件工程师将从编写代码转向管理AI智能体团队。

Deep Dive

Chapters

Masayoshi Son's belief in the imminent arrival of AGI has intensified. A new joint venture between SoftBank and OpenAI aims to drive Japanese AI adoption through an AI agent platform, Crystal Intelligence, with plans for global expansion. SoftBank's ambitious goal is to automate 100 million workflows.

Masayoshi Son believes AGI will arrive much sooner than initially predicted.
SoftBank and OpenAI's joint venture will develop an AI agent platform called Crystal Intelligence.
SoftBank plans to automate 100 million workflows using Crystal Intelligence.

Shownotes Transcript

Translations:

中文

♪♪

Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. SoftBank's Masayoshi Son says that AGI will arrive much earlier than he thought. At this point, Son has to be just about the most enthusiastic AI investor on the planet. In June, he said, SoftBank was founded for what purpose? For what purpose was Masa Son born? It may sound strange, but I think I was born to realize artificial superintelligence. I'm super serious about it. Not just a few months ago, he said that AGI was still two to three years away.

Recent developments seem to have accelerated his thinking, with Sun stating at an event on Monday that, quote, The event was used to announce a joint venture between SoftBank and OpenAI aimed at driving Japanese AI adoption. The venture will develop an AI agent platform called Crystal Intelligence with a goal to, quote,

1,000 SoftBank employees will be assigned to kickstart sales and engineering work, and the focus will initially be offering services to Japanese businesses before establishing a plan for a global rollout. SoftBank Group will be using their own organization as the test case, paying $3 billion annually to deploy OpenAI models across their businesses. Notably, this includes chipmaker ARM, which will, quote, use Crystal Intelligence to drive innovation and boost productivity across the company, strengthening its pivotal role in advancing AI globally.

Overall, SoftBank plans to automate 100 million workflows using Crystal Intelligence. Now, at this point, every AI announcement is just a competition for how close to Googleplex your numbers can be. And there is something of a perception in Silicon Valley that taking money from SoftBank is akin to a financial death knell. But there is definitely a lot going on here, and it's probably worth paying attention to. Speaking of AGI and the brave new world we're moving into, Meta have released a new policy document stating that they may not release models they deem too risky.

The company's Frontier AI framework details two categories of models that may not be suitable for release: high risk and critical risk. They consider these to include AI systems capable of aiding in cybersecurity, chemical, and biological attacks. The difference with critical risk systems is their ability to bring about a "catastrophic outcome that cannot be mitigated in a proposed deployment context." High-risk systems are still capable of making these kinds of attacks easier to carry out, but not as reliably as a critical risk system.

Meta gives a few examples of their nightmare scenarios for AI risk, including a, quote, automated end-to-end compromise of a best practice protected corporate scale environment, or the, quote, proliferation of high impact biological weapons. Now, this is the first safety policy update we've seen from a major AI lab since the Trump inauguration and the accelerationist vibe shift. To what extent, then, is this Meta saying you don't need to regulate us because we're taking it upon ourselves to implement safeguards?

That remains to be seen, but I think it's at least a reasonable interpretation. When it comes to determining these risks, Meta does not seem to be using any particular test to classify risk, but instead relying on the input of internal and external researchers with review from senior-level decision makers. They stated that, quote, "...the science of evaluation is not sufficiently robust as to provide definitive quantitative metrics. If Meta determines that a system is high-risk, they will limit internal access and won't release it until mitigations are implemented."

Critical systems will be locked down to prevent exfiltration, and the company will stop development until the system can be made less dangerous. In the policy document, Meta writes, "...we believe that by considering both benefits and risks in making decisions about how to develop and deploy advanced AI, it is possible to deliver that technology to society in a way that preserves the benefits of that technology to society while also maintaining an appropriate level of risk."

Now, in a positive development surrounding risk, Anthropic is challenging hackers to break into their new AI security system. The company claims their newly developed method can block 95% of jailbreak attempts and are inviting red teamers to try to defeat it.

Jailbreaks are specifically designed prompts that circumvent restrictions on an LLM's output. One example that was surprisingly successful on the previous generation of models was to tell the LLM to quote, do anything now. Another is the notorious God mode, which substituted letters with numbers to sneak past safety filters. Jailbreaking is relatively easy to minimize, but methods usually involve a lot of incorrectly refused prompts or adding a ton of compute to run supervision models. Anthropic is claiming that their method avoids those trade-offs. The company has launched a demo with eight different types of unsafe requests,

Red teamers are invited to try to jailbreak the system by finding a prompt that unlocks them all. The intention is to prove the system is resistant against universal jailbreaks that work for all unsafe requests. Currently, no one has managed to get past more than three levels using a single prompt. To construct this system, Anthropic trained a new constitutional classifier using 10,000 generated jailbreaking prompts. This AI technique relies on training a model on a list of principles that define allowed and disallowed actions aligned with human values. This is Anthropic's constitutional approach.

To minimize incorrectly refused prompts, the team also trained the model on benign queries that should be allowed. Their baseline version of Claude had an 86% jailbreak success rate, but with constitutional classifiers added that fell to just 4.4%. Not perfect, but absolutely would be huge progress.

Lastly today, AI has won its first Grammy, sort of. The Beatles track Now and Then won the Grammy for Best Rock Performance, making it the first time an AI-assisted song has taken home the award. Now, you'll remember that this song didn't include a generated version of John Lennon, but instead used AI techniques to clean up archived demo tracks. Now and Then was first put together during the Beatles' anthology remastering project in 1995. It was based around demos recorded by John Lennon in the late 1970s, with Paul, Ringo, and George adding their parts in the 1990s.

The song was never released, with technological limits at the time preventing John's vocals from being separated from the piano on the demo track. In 2021, the surviving Beatles worked with filmmaker Peter Jackson and his sound team to clean up the demo using modern machine learning techniques. The tech is similar to that used in video calls to remove unwanted background noise. When the song was rumored in 2023, there was a lot of anti-AI backlash. Paul McCartney addressed the controversy, stating, To be clear, nothing has been artificially or synthetically created. It's all real and we all play on it.

We cleaned up some existing recordings, a process which has gone on for years. Whatever the case, the Grammy Committee has seen past the backlash to award the Beatles their eighth award, thanks in this case to AI.

That's going to do it for today's AI Daily Brief Headlines Edition. Next up, the main episode. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in.

Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company.

Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vantage to manage risk and prove security in real time.

For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off. If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents buy industry horizontal agent platforms.

Agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode. That's

That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.

If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI Quarterly Pulse Survey.

Did you know that 67% of business leaders expect AI to fundamentally transform their businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI, they're leading the charge with practical solutions and real-world applications.

For instance, over half of the organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation, they're helping shape it. Learn more about how KPMG is driving AI innovation at kpmg.com slash US. Welcome back to the AI Daily Brief.

Blissfully, we have a bit of a quiet day, which means that we get to do something that I've been looking forward to. I basically had this on deck for whenever we got a break from the blistering pace of news.

Y Combinator is, of course, the best-known startup accelerator in the world. And every couple of cycles, they release what they call their request for startups. The idea here is that the partners of Y Combinator get together and talk about what they think the big themes of the future are going to be and where they'd like to see more entrepreneurs taking a crack at particular problems. Now, of course, these aren't the only companies that they'll accept. As they say, the list will only be a small fraction of the ideas they actually fund.

But it's a way for them to give feedback to entrepreneurs who are looking for their next big idea, what they think some of the key themes are. Well, they just recently released their Spring 2025 update. And of the 14 ideas, 13 are AI or at least AI adjacent. Eight of the 14 touch agents in some way. So what I want to do is use this as a way to preview the future that Y Combinator thinks is coming.

I'm going to discuss the big categories that I see across these startup areas and hone in on a couple that I think personally are particularly interesting.

So I think broadly speaking, you could categorize these 14 startup areas in roughly four buckets. The last one is other. There's one idea here that's sort of more about a founder profile than it is about a particular idea. So I'm leaving that one to the side. But outside of that, they all fit into one of three buckets, AI and agent infrastructure, agent applications, or AI applications.

For the sake of focus, once again, I'm not going to spend as much time on the AI applications. I'll call them out briefly. One is compliance and audit, where partner Tom Blumfield points out that LLMs excel at the tasks of traditional compliance, including reading dense regulation, cross-checking internal policy, etc. This is a great example of one of those very unsexy but still very significant problems that AI can solve and just take entirely off the plate of humans.

The other AI application is DocuSign 2.0. However, Michael Seibel argues that with the current crop of products, it's too hard to create a document template, avoid filling out duplicate information, correct document errors, etc. And so the idea is to use AI to simplify this process. All right, but now from there, let's move into the two categories that I think are really interesting. AI and agent infrastructure and agent applications.

Infrastructure is the biggest category here. It is very clear that AI, but even more agents are coming down the pipeline and Y Combinator is interested in the things that are going to enable that transformation.

Some of them are dead on what you'd expect, like data centers. This one is not surprising. The world of the future requires more data, more power infrastructure, more cooling, more material procurement, more project management. And so anything around those themes is of interest. Where it starts to get even more interesting is where you're seeing why combinators start to make bets on which type of agentic applications are going to be ready for primetime in short order.

One very catch-all area is called dev tools for AI agents, and basically is why Combinator saying, we want people to keep making agents better. They're interested in agent builders directly, i.e. companies that enable their customers to easily create and deploy custom agents, as well as agent building blocks, tools, APIs, or platforms that enhance agent capabilities, enabling them to perform more complex actions and achieve greater impact. You could blow out that agent building block category into a million things.

For example, one of the things that agents will at some point need to be able to do is access financial infrastructure. Plaid for agents feels to me to be one of those really obvious to conceptualize but very difficult to build type of projects that will make someone very, very rich over the next few years.

You also get a sense of where we are in the agent development cycle. Partner Jared Friedman talks about browser and computer automation, effectively arguing that while we're starting to see agents be able to use computers in the form of OpenAI's operator as well as Anthropic's computer use, that doubling down on this and giving agents even more access to the browser and using the computer is going to quote 10x the addressable use cases for AI agents. So building out that infrastructure seems extremely important.

Another area of infrastructure is that we're starting to see Y Combinator adjust to a different scaling model. One of the themes from partner Diana Hu is inference AI infrastructure in the world of test-time compute. If you're a regular listener, you'll have heard lots of discussion around how we've seen a shift in thinking around scaling from a focus on pre-training to a focus on applying compute at inference time. Diana points out that, quote, "...as AI apps 10x or even 100x the number of API calls to complex reasoning models, the infrastructure costs will become a real problem."

And so what YC is interested in is better software and inference layer tooling, cheaper ways to handle GPU workloads and optimizations. This is in many ways a doubling down on a key theme that has dominated conversation for the last couple of months. And the last two in this infrastructure category that I find interesting both relate in some way to how enterprises are going to use agents. Partner Dalton Caldwell calls out AI commercial open source software. And effectively the idea here is that many of enterprises AI deployments are going to be custom builds built on top of open source software.

However, when building with open source software, one of the things that you give up for the flexibility and freedom is, of course, the support. YC then is interested in companies that replicate some amount of the type of support you get from a closed source vendor, but in the context of enterprise open source deployments. Still, maybe the most interesting request for startups in this infrastructure section is B2A, software where customers will all be agents.

The thesis here is pretty simple. Right now, a huge amount of internet traffic is people looking for information. Already, much of that is automated. It's bots and non-humans that are scraping and looking for information. However, what Y Combinator is interested in is software that explicitly recognizes that a lot of purchasing decisions are going to be made explicitly by agents in the future.

And so rather than building services that support human internet use and human commerce decision making, they think there's interesting ground for entrepreneurs to build services that specifically aim at serving agents. This is one of those fundamental shifts that I think will create just enormous opportunities. And so I'm really interested to see which startups take up that particular call.

Now, moving on to the agent application section. Some of these are well-trodden territory. Not that YC shouldn't point them out because there's still a lot to build. But one of the themes, for example, is vertical AI agents. They define those as software that's built on top of LLMs that's been carefully tuned to be able to automate some kind of real important work. Now, what's interesting is that they argue that this opportunity is big enough to mint another 100 unicorns.

For every category, they say, with a successful B2B SaaS company, you could imagine an even larger vertical AI company being built. And they argue that although this is a huge point of conversation, that we're still not thinking broadly enough, and that much of the entrepreneurial energy so far has been to very obvious applications rather than the full expanse that the opportunity actually represents.

Another one that's sort of well-trodden territory is AI personal staff for everyone. This is a classic Silicon Valley argument that a good way to guess at what the future of a consumer experience is going to be is to look at what only rich people can afford now and then imagine how it could be brought to everyone. The quintessential example is Uber and now Waymo giving people a private driver which was never accessible until those companies existed. They're interested in how agents bring things like personal lawyers, money managers, personal trainers, private tutors, personal doctors to the realm of the everyday person.

But lastly, maybe the most interesting one to me of all, across both the infrastructure and the agent application category, is one from Pete Kuhman called The Future of Software Engineering. I'm actually going to read a big chunk of this one. Pete writes, Language models can already write code better than most humans. This is going to bring the cost of building software down to zero.

So will agents kill the job of software developer? No, we'll need more human software engineers in the future because software is going to run almost everything. These humans won't write much code directly. Instead, they'll manage teams of agents that build software for them. In addition to writing code, agents will perform most of the other specialized tasks required to build software, including QA, deployment security and compliance audits, translations, operations, etc. We'd like to fund startups that enable small groups of generalist software developers to manage large teams of agents working together to build and ship lots of software.

So two things that are interesting about this. One, it's obviously planting a flag in how they think economically this is going to play out, sort of a Javon's paradox, but instead of being about resource usage, being about talent deployment, where effectively the greater availability of intelligence and the reduction of the cost of intelligence will actually increase our utilization of intelligence. Now, I think that it makes sense that software engineering is the area that they're looking to first for this, but my bet is that this pattern

That the job doers of today will become the managers of agents of tomorrow, I think is a pattern that we're likely to see played out across lots and lots of different domains. Think social media managers. Instead of writing tweets and creating posts, they're going to be able to manage entire armies of agents that do that across multiple platforms at much greater scale.

It is fascinating to think about how to actually build tools to manage those armies of agents. I think that this is going to be critical infrastructure for the future and something, like I said, that goes far beyond just software engineering. Anyways, like I said, I think this is a fun way to see how one influential Silicon Valley institution sees the future of AI and agents. Hopefully this gives you some ideas for what you might build. For now, though, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching, as always. And until next time, peace.

What Comes Next In AI & Agents (According to Y Combinator) 18:28 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

What Comes Next In AI & Agents (According to Y Combinator)