We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
People
N
Narrator
一位专注于电动车和能源领域的播客主持人和内容创作者。
Topics
OpenAI 计划在明年一月发布名为 Operator 的自主 AI Agent,它能够在网络浏览器中执行各种任务,例如编码、购物和预订航班。这标志着 AI 技术发展迈向新的阶段,Agent 将能够更独立地完成复杂任务,并可能对商业和社会结构产生深远影响。尽管目前的技术仍处于早期阶段,但 OpenAI 的举动表明,大型科技公司正在积极推动 Agent 技术的发展,并将其视为 AI 领域的下一个重要突破。

Deep Dive

Chapters
Discussions around whether there is an AI model slowdown, with reports indicating that OpenAI's latest model is not showing the same performance jump as previous versions. Google is also experiencing similar issues, leading both companies to explore new methods for improving performance.
  • OpenAI's latest model shows diminished performance improvements.
  • Google's models are also demonstrating a lack of improvement.
  • Both companies are exploring new avenues, including reasoning and fine-tuning, to boost performance.

Shownotes Transcript

Translations:
中文

Today, in the a ideally brief, we are apparently getting an OpenAI agent as early as january, which is a good thing because in the headlines we talk about how google is also dealing with the same A I slow down that we talk about in the context of open A A little bit on this. The AI daily brief is a daily podcast and video about the most important news and discussions in A I to join the conversation, fall the discount in our show notes.

Welcome back to the AI daily brief headlines edition, all the daily AI news you need in around five minutes. One of the big discussions we've been having this week is whether there is an AI model slow down, according to reports, open an ice or ryan model is not showing the same jump in and performance that was observed between GPT three and GPT four. This apparently has LED opening eye to double down on reasoning and find tuning as a potential way to obtain the performance boosts expect from the next generation of frontier models, and now appears that google is joining up in an eye and expLoring new avens to tackle some of these chAllenges.

According to the information sources at google, their models are demonstrating the same lack of improvement. The information rights past versions of google's flagship gemini large language model improved to a faster rate when researchers used more data computing power to train them. Google's experience is another indication that a core assumption about how to improve models knowing scaling laws is being tested.

Many researchers believe that models would improve at the same rate as as long as they process more data while using more specialized I chips. But those two factors don't seem to be enough. This is particularly troubling for google, whose models have failed to see the same level of adoption as open the eyes.

There was a belief perhaps, that google could leap frog OpenAI in this generation purely due to their advantage computing resources that appears less than less to be likely. So following in the footsteps above in a eye, google is also looking to develop new methods of improving performance. Over recent weeks, google deep mind has put together a team to work on the development of reasoning models.

That team is being made by principal research scientist jack ray and former character that I found her name shades here. To give a sense of how important they consider the work to be, other deep mind researchers are working on making manual improvements to the model, including changing so called hyper parameters, which are the variables to determine how the model processes in from main, how quickly addresses connections between different concepts. Another problem google has run into is duplicate copies of information within training data, which could hurt performance.

Google is also experimented with a synthetic training data, essentially feeding data generated by an alala m back into the core. Plus of training data, they've also added audio and video. And while I was believed that these steps would lead significant improvement, sources that google say they didn't make a major difference.

Meta chia I scientists in turning a war winter Young, the coon, has been predicting these diminishing returns from model scaling for years. Yesterday he posted on threads, I don't want to say I told you so, but I told you so. He references a statement from former opening, our chief scientist, a set cover from earlier in the week, who said, the twenty tens of the age of scaling, now we're back in the age of wonder discovery again.

Everyone is looking for the next thing. Scaling the right thing matters now, even more than ever. Now that makes ela's comments more significant is that he was basically the chief proponent of the idea that you could just add more computing data to keep scaling higher and higher.

The fact he is moving away from that suggests that there is a changing understanding of the technical capabilities here. Look, for his part, commented, we've been working on the next thing for a while. Look on is referring to the fundamental AI research team at meta pursuing new architectures as a path towards A G.

Their focuses currently on world models which seek to train and A I on how objects and environments is interact, rather than just focusing on the connection between words. Still, not everyone's convinced that this is a real thing. Then you ready writes the A I slow down as a non story.

The biggest reason A I is slowing down is that there's nowhere else to go. If you begin to saturate on benchMarks, nothing is left to do. One hundred out of one hundred is the highest score you can get.

Now moving over into the world of business models, complexity says they will begin experimenting with advertising on their platform starting this week, U. S. Users will see ads in the format of sponsored follow up questions.

These ads will be placed to the side of generated dancers and label sponsored. The initial brands and agency partners for the launch include indeed, whole foods, universal, a can and P, M. G. As an example, project ity showed a search for information about looking for a with a sponsor. Follow up that says, how can I use indeed to enhances my job search in a blog post?

Perplex explained, add programmes like this help us generate revenue to share with our publisher partners experiences tauta sut subscriptions alone do not generate in our revenue to create a sustainable revenue sharing program. Advertising is the best way to ensure a steady and and scalable revenue, ws stream complexity said. The ads themselves will be generated by eye rather than prevent rated by sponsors.

Advertisers also won't get access to users personal information. Regarding the choice of format complexity route, we intentionally chose these formats because that integrates advertising in a way that still protects the utility, accuracy and objectivity of answers. These ads will not change our commitment to maintaining a trusted service that provides you with direct, unbiased answers to your questions.

Obviously, right now, how perplexity style A I summaries influenced the core business model of the web, which is basically search ads on google, is one of the big open question. So this will be really interesting to watch these experiments. Basically, I think there are a lot more consequential than just for perplexity as a company itself.

Last a day, speaking of business models, sales for C E O R bin of things is crazy talk that A I could hurt his companies. The bottom line, bennie is very publicly planted his flag on the idea of A I agent over recent months, and we're about to find out whether IT will save sales force from disruption. During a recent appearance on tech round ches equity podcast, he said, what if your workforce had no limits as far as being disrupted? Beni believes that his mote is access decline data.

He said we managed two hundred and thirty Peter bites of data for our customers. You could say that might be one of the main things we do for them. And we do IT with the security and a sharing model anyways.

For me, it's interesting to know that bending off feels like he has to justify the potential disruption to the sap business model from agents. He can say it's crazy talk holiday ones. But there is absolutely no doubt in my experience and in my conversations with enterprises, while bending off maybe right, that our sales forces future and that there are even bigger deal in the company grows to love ty new heights.

There is an incredible pressure being put on the sort of traditional proceed model of sas companies that will not be resolved easily or quickly. Certain ly, something that we're really interested in and thinking about a lot as we Price. Super intelligent.

But that is a conversation for another time in place. For now that is going to do a for today's brief headlines edition. Next up, the main episode, today's episode is brought to you by plum.

Want to use A I to automate your work, but don't know where to start. Plum lets you create A I workload by simply describing what you want. No coding or A P I keys required.

Imagine typing out A I analyzed my zoom meetings and send me your insights in notion and watching IT come to life before your eyes. Whether you're an Operations leader, market or or even a non technical founder, plum gives you the power of A I without the technical hassle. Get into the access to top model s GPT O D assembly A I technology to check out use that's plum with A B for early access to the future of worker automation.

Today's episode is brought you by van tab. Whether you're starting or scaling your company's security program, demonstrating top notch security practices and establishing trust is more important than ever. Penta automates compliance for I S O twenty seven, O O one, soc two gdpr and leading A I frameworks like I S O forty two thousand one and N I S T A I risk management framework, saving you time and money while helping you build customer trust.

Plus you can streamline security reviews by automating questionnaire s and demonstrates your security posture with a customer facing trust center. All power by vent to A I over eight thousand global companies like LangChain lea A I in factory A I use vantage, demonstrate A I trust, improve security in real time, learn more adventure doc com slash N L W that's vented 到 com slash N L W。 Today's episode is brought to you, as always, by super intelligent.

Have you ever wanted an A I daily brief, but totally focused on how A I relates to your company? Is your company struggling with A A I adoption either because you're getting figuring out what use cases will drive value or because the A I transformation that is happening isolated individual teams, departments and employees and not able to change the company as a whole? Super intelligence has developed a new customer internal podcast product that inspires your teams by sharing the best A I use cases from inside and outside your company.

Think of IT is an a daily brief. But just for your company's A I use cases, if you'd like to learn more, go to be super dry ice lash partner and fill out the information request form. I am really excited about this product, so I will personally get right back to you again.

That's be super da eyes lash partner. Welcome back to the A I daily brief. A couple interesting pieces of news out of open eye today, starting with an agent story. OpenAI is reportedly planning to release an autonomous agent next year. Now if you spend any time around the A I space, you'll know that basically since ChatGPT launched, we've been on the verge of the agent era.

The idea of moving from just the super powerful assistance to agents actually doing human replacement style work is something with such dramatic implications for the structure of business and society and what we can accomplish that IT captures a amount of energy, in fact, probably a disproportion and amount of energy relative to how far the technology actually is. And yet it's clear that the major labs has been making nudges in this direction. And so what do we learn so far about this theoretical agent from mobile AI? The agent, which theyve code named Operator, can control a computer to complete tasks independently, including coding, shopping in booking flights, according to bloomberg sources.

Staff were told in a meeting on wednesday that the tool will be released as a research preview in january. That would mean that by early next year, we could have competing computer usage s from anthropic, google and open eye. There are also already more limited agent available from companies like microsoft sales force on a host startups.

So far, we've seen two different approaches to fully fledge computer use. Google's agent is sandbox in the browser window, making IT more limited, but potentially more performance. And throops agent, which is the only one generally available, was trying to control a mouse and a full computer interface, so can direct ally Carry out a much brought a variety of tasks.

In practice, the experience is still rather limited, with the company admitting the agent is slow, cumbersome, m, an era prone. Bloomberg sources said that open a eye is working on several agent related products, and the one that is nearest to completion is a general purpose tool that executes tasks in a web browser. Sam levin has been hyping agencies the next big thing over the past few months.

In october, during a redit M. A, he said, we will have Better and Better models. But I think the thing that will feel like the next giant breakthrough will be agents and open a eyes, dev day chief product officer Kevin wheel said.

I think twenty, twenty five is going to be the year that agenting systems finally hit the mainstream. The verge rights A I labs face mounting pressure to monetize their costly models, especially as incremental improvements may not justify higher Prices for users. The hope is that autonomous agents of the next break through product, a ChatGPT scale innovation that validates the massive investment in A I development.

So what do people think about this? Well, Alice on x rights computer uses the kind of capability I expected OpenAI to launch first. This time, IT looks like they will be following anthropic.

I'm still hoping for a unique twist and huge improvements. I think there's a lot to learn from custom G, P, S and search. One thing is certain twenty twenty five will be the year of A I agents.

I never bet against open eye on these things, especially because of one for data generation, as was used in ChatGPT, search will play a key role. Here, they have a huge vantage. My asks, make IT easy to launch agents, make IT easy to use, and provide feedback over the tools and integrations, reduced lencs and reduce costs.

Others share their sketch tics. m. Ali, on x rights. I hate how companies always flex their AI agents that can quote book of life for you as if that was not the worst use case for A I automation ever.

This, by the way, is something that is a personal pet view of mine as well. I do not need an agent to book me a flight or to order me food. I know that IT is just demonstration of capabilities, but I do think that shows how early we are that those are the things that people always point to.

Others are thinking about the implications for various domains of the world, calamo Clark rights for the learning world. Clicking complete means we must shift from clicking complete courses to gathering learning matrix, a shift that should have happened a long time ago. But now hopefully these agents will be the final nail in the coffin.

N of bad l and d courses. There are also questions of the business model of agents, which are working clio a ride. How do we properly Price agents, especially when they keep getting more capable? They do days of work in one hour, and they were twenty four, seven.

We have no market reference, is a compute per hour per task. Adam silverman of agent ops, right, I think, is agent scale over the next five years, pricing will be compute plus ten percent margin for specific use cases. When OpenAI and others released agents, they will only charge compute. There is a huge opportunity for startups to capitalize on charging significantly more in the interview.

Now going back to this verge, quote, the mounting pressure to monetize costly models and the hope that autonomous agents are the next break through product, I think whereas ChatGPT and claude in the like have been easily as much a consumer innovation as they have been an enterprise innovation, really transforming and hitting both equally and by some assurement consumers more. I believe that where we're going to see the value from agents is absolutely in vertical, highly specific enterprise tasks. I think that getting agents very good at very specific repetitive tasks that happen over and over and over again all the time inside specific companies is much easier than getting good at very general purpose use.

And I think that that's where we're going to see a lot of benefit. Now IT is still very early. What we have available and what's production ready is still very niche.

But if I were a bedding man, that is where I would be placing my chips. That the impact will be in very specific verticals within the enterprise. Second, open a eye story.

The company has outlined a grand policy proposal to boost ster artificial intelligence in the U. S. In effort to stay had to china yesterday had a think tank event in washington OpenAIce.

Head of global affairs, Chris la hin, presented what they are calling their official blueprint for U. S. infrastructure. The company said the plan was, quote, as ambitious as the one thousand nine fifty six national interstate and defense highways act.

They outlined A I economic sence cocreate between state and federal governments, which give the states in incentive to speed up permitting and approval for A I infrastructure. The envision constructing solar and wind power as well as gaining clearance to restart a news nuclear plants OpenAI road states that provide subsidies are other support for companies. Unch ing inf structure projects could require that to share the new computer made available to their public universities to create A I research labs and developer hubs aligned with their key commercial sectors.

OpenAI also wrote a bill called the national transmission highway legis tion would expand power, fiber and natural gas pipeline connectivity across the nation. The company argues that we need, quote, new authority and funding to unlock the planning, permitting and payment for transmission. And they noted that existing procedures aren't keeping up with the eye driven demand.

The document noted that, quote, the government can encourage private investors to fund high cost energy infrastructure projects by committing to purchase energy and other means that lessen credit risk. OpenAI notably, use them in west n southwest is key areas for infrastructure expansion, given the plantier land for construction, focusing on these areas would also ensure that the jobs and prosperity of this wave of technology aren't just concentrated on the costs. Now, the premise of all of this is that without cover investment in the removal of red tape, the U.

S. Will lose its leading A I to china, the OpenAI proposal stated, given the stakes, we need to think big, act big and build big. These decisions termine whether nation leads or lags and technological innovation, often with far reaching consequences for economic competitiveness and national security.

The history of the U. S. They wrote, is one of iconic infrastructural projects that move the country forward, the auto industry, the tennessee valley authority, the manhattan project, the inner state highway system.

One of the things that we ve been tracking a lot recently is the growing push to nuclear, driven by the A. I. industry.

And it's notable in that regards and something that open up points out that china has built more nuclear power capacity over the last ten years than the U. S. Has built over the last forty.

Speaking to that rapid deployment, open our eyes, head of global policy, Chris la. Han said, we don't have a choice. We do have to compute with that. Now the big remaining question is whether these kinds of policies would be adopted by the trumpet administration. Open the eyes says they planned to work with the trump point house on the agenda.

So far, all we know about trump AI policy, however, is that the president electors pledge to repeal the bite and AI executive order, stating that quote in its place, republican support, AI development rooted in free speech, human flourishing. That's about the extent of the details we have so far. Then again, slashing red tape and building out a tone of energy production seems in line with campaign promises.

During an appearance in july, trump said, we will be creating so much electricity that you'll be saying, please, please president, we don't want any more electricity. We can stand. It'll be begging me no more electricity.

So we have enough. We have enough. So who knows? I think it's pretty clear that this is coming into the vacuum of whatever the repeal of the executive order looks like.

As Andrew kern points out, this blueprint is being presented to influence whatever form the new regulations will take. Still, the earnings not get may have summed up best when they wrote this is the new manhattan project. Interesting times ahead. But with that, we will wrap today's a ideally brief appreciate you. Listen, as always, and until next time, peace.