Google's new team within DeepMind, led by Tim Brooks, is focused on building massive generative models that simulate the physical world. These models aim to understand the physics and appearance of the real world, similar to how large language models (LLMs) understand language structure.
NVIDIA's Cosmos models are a family of world foundation models designed to advance robotics and autonomous vehicle development. Trained on 20 million hours of video, these models focus on human movements and can be fine-tuned for specific tasks. They range from 4 billion to 14 billion parameters and are available as open source for commercial use.
OpenAI is losing money on ChatGPT Pro subscriptions because users are utilizing the service much more than expected. Despite charging $200 per month, the costs of delivering the service exceed the revenue generated. OpenAI reported expected losses of $5 billion on revenues of $3.7 billion in 2023.
Johnson & Johnson is using AI agents to optimize key points in the drug synthesis process. These agents analyze data from a smaller number of experiments and extrapolate it to determine optimal methods. While employees still review the output, the company is working on systematizing this oversight.
Moody's employs a multi-agent system with 35 different agent designs, each trained for specific subtasks. These agents analyze public company filings and perform industry comparisons, with some agents acting as supervisors to check for hallucinations. The system synthesizes conclusions from agents focused on different aspects, such as industry competition or geopolitical risk.
Deutsche Telekom uses AI agents to answer employee questions about internal policies, benefits, and product services. These agents, used by about 10,000 employees weekly, streamline HR processes and reduce the need for manual searches. The company plans to expand their capabilities to execute requests, such as processing leave applications.
According to Google's white paper, the core difference between LLMs and AI agents is the ability to access and interact with other systems. Agents can integrate with real-time data feeds, process multiple data sources, and perform multi-step tasks, making them capable of managing uncertainty and complexity in ways traditional models cannot.
The potential ROI of deploying AI agents lies in their ability to reduce human labor costs and increase productivity. By automating tasks, agents can lower operational expenses and free up employees for higher-value work. However, the actual impact depends on whether companies reinvest savings into growth or use them solely for cost-cutting.
Today on the AI Daily Brief, five ways that companies are using AI agents right now. Before that, on the headlines, Google forms a new team to build world models. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link at our show notes. Welcome back to the AI Daily Brief headlines edition, all the daily AI news you need in around five minutes.
Perhaps the biggest theme of Q4 of last year was this question around whether the pre-training model for scaling AI had started to run into serious limits. We obviously got the rise of reasoning models like 01 and 03. We had CEOs like Satya Nadella from Microsoft talking about how new architectures were needed. But we also got some interesting alternatives. One of the approaches that some are interested in are models that can simulate the physical world. Google is forming a new team within DeepMind to work on scaling these types of models.
The team will be led by Tim Brooks, one of the co-leads of OpenAI's Sora video model who left that company back in October. Yesterday, Brooks posted, DeepMind has ambitious plans to make massive generative models that simulate the world. I'm hiring for a new team with this mission. Come build with us.
So far, what we've seen from labs are functional if limited demos. Basically, these are AI models that have a better understanding of the physics and appearance of the real world, understanding it in a similar way to how LLMs understand the structure of language. So far, a lot of what we've seen from world model labs are based on training data from video games or movies, and so are really only a proof of concept.
One of the few projects to move past this stage was Genesis, first shown off last month. That project was able to generate groundbreaking video and extremely accurate robotics training modules using a 4D world simulation. Genesis claimed they were able to train robots 430 times faster than the previous leading physics simulator, cutting the time below a minute.
Now, DeepMind is one of the labs that published a brief demo of a model that understands video game physics last year. That model was called Genie 2, and I actually think that the announcement went a little under the radar. Establishing this new team suggests that they want to push the technology even harder. Job postings for the new team invited applicants to, quote, join an ambitious project to build generative models that simulate the physical world.
We believe scaling pre-training on video and multimodal data is on the critical path to artificial general intelligence. World models will power numerous domains, such as visual reasoning and simulation, planning for embodied agents, and real-time interactive entertainment. The team will collaborate with and build on work from Gemini, VO, and Genie teams, and tackle critical new problems to scale world models to the highest level of compute.
One of the people who has talked most explicitly about this view of the importance of these types of models for achieving AGI is Meta Chief AI Scientist, Jan LeCun. Indeed, he has gone so far as to hypothesize, loudly on Twitter, that standard GPT architecture has no pathway to AGI. This project sounds as though it will be one of the first to attempt to build a world model using the full scale of the training data and compute that can be mustered by a big tech firm.
NVIDIA, meanwhile, is also pushing the frontier of world models, releasing a family of models called Cosmos. During his keynote address at CES, which we will cover in more depth later in the week, NVIDIA CEO Jensen Huang announced, The chat GPT moment for robotics is coming. Like large language models, world foundation models are fundamental to advancing robot and AV development. Yet not all developers have the expertise and resources to train their own.
He demonstrated the model being used to simulate warehouses and roadways, commenting, it's not about generating creative content, but teaching the AI to understand the physical world. The models were trained on 20 million hours of video, with a particular focus on human movements like walking, hand movements, and manipulating objects. They can be fine-tuned for specific tasks and customized for external data.
The family includes three models ranging from 4 billion to 14 billion parameters. The smallest model is optimized for low latency and real-time applications, while the largest model is intended to deliver high-fidelity outputs. And what's more, the models are available as open source for commercial use, allowing robotics and autonomous vehicle developers to use them in production.
Diego Odd posted,
One more quick story before we close out the headlines. One of the big questions surrounding the AI industry is whether it can actually make money. You'll remember that this was a huge point of conversation last summer. We had that Sequoia blog post, AI's $600 billion problem. And now we've learned that ChatGPT Pro, the $200 per month tier, is not only not a cash grab, but is actually not even paying for itself.
A couple of days ago, Sam Altman tweeted, Insane thing. We are currently losing money on OpenAI Pro subscriptions. People use it much more than we expected. In the replies, he added, I personally chose the price and thought we would make money. Now, of course, OpenAI is making a ton of money but losing more. The company reportedly expected losses of around $5 billion last year on revenues of $3.7 billion.
The pricing of all of this stuff at any point has been pretty arbitrary. In a recent interview, Sam Altman said that when it came to the main ChatGPT subscription, the company was tossing it up between $20 and $42. They eventually went with $20 because, quote, people thought $42 was a little too much. They were happy to pay $20. Altman continued, it was not a rigorous hire someone and do a pricing study thing. Now, what makes this interesting isn't anything really about OpenAI itself. It's much more about the question of the long-term profitability of AI.
Mojo Flynn writes, OpenAI losing money is no big surprise, but when they're losing money on a $200 monthly subscription should tell you there's no viable at-scale consumer business model. Even Microsoft with a $30 Copilot subscription is forced to offer discounted pricing.
I don't think it's an unreasonable concern. However, I do have a very different take. I think that we are extremely early in the life cycle of AI. And the simple reality is the cost of delivering the service hasn't come down as fast as the demand for using the service has increased. That's an unsustainable state. But unsustainable doesn't mean an inevitable failure. It means that there's going to need to be a recalibration.
Already, the cost of AI has come down spectacularly from where it was a few years ago, at least in terms of what you can do with the same amount. I would expect that to continue, and I think that we're going to figure out a lot more, use case by use case, what sort of business models different performance levels of AI can support. Frankly, I think this is exactly what venture capital and risk capital is designed to do.
It's designed to allow incredibly promising innovations the ability to build and get through these complicated early stages before these markets get rationalized. I think the speed of adoption of these tools has taken basically everyone by surprise and puts additional pressure on this, even relative to other industries.
Anyways, still an interesting story to watch, one that we will keep track of here. For now, though, that is going to do it for today's AI Daily Brief Headlines Edition. Next up, the main episode. Today's episode is brought to you by Vanta. Whether you're starting or scaling your company's security program, demonstrating top-notch security practices and establishing trust is more important than ever.
Vanta automates compliance for ISO 27001, SOC 2, GDPR, and leading AI frameworks like ISO 42001 and NIST AI risk management framework, saving you time and money while helping you build customer trust. Plus, you can streamline security reviews by automating questionnaires and demonstrating your security posture with a customer-facing trust center all powered by Vanta AI.
Over 8,000 global companies like Langchain, Leela AI, and Factory AI use Vanta to demonstrate AI trust and prove security in real time. Learn more at vanta.com slash nlw. That's vanta.com slash nlw.
If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents by industry, horizontal agent platforms, agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.
That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.
If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Welcome back to the AI Daily Brief. Right now in Las Vegas, the annual CES Consumer Electronics Show is happening, and I anticipate that there will be some interesting AI announcements from that event that we will cover later in the week. Helping us to get the most out of the AI market.
However, for today's episode, as we let those announcements come in a little bit more, I noticed something interesting in the Wall Street Journal. Yesterday, that publication published a piece of their CIO journal called How Are Companies Using AI Agents? Here's a look at five early users of the bots. You can tell the language is a little bit stuck in the past. But what's interesting to me is that in a year where we really are talking about 2025 being the time that companies start experimenting with agents, mainstream media is already picking it up that this is a major theme.
Part of why this matters is that most people in big companies, much to my chagrin, are not so up to speed that they're listening to something like the AI Daily Brief. They're getting their news from sources like the Wall Street Journal. And so when this style of publication starts taking this stuff seriously, it can have a pretty big impact. So what we're going to do today is briefly look through these five use cases that the WSJ covered, and I'm going to pair that with an overview of a recent paper from Google that I think might be a pretty useful resource as well.
The Wall Street Journal piece basically points out that this is a big trend. They describe how many different companies have officially announced their own agents, and they point out one of the biggest reasons, frankly, that enterprises are so focused on agents. Quote, if these agents' work is promised, they could also provide businesses with the return on investment they've been looking for out of generative AI. According to some corporate technology leaders, that means the ability to tie the technology to a reduction in the number of hours employees work, or even how many new people they need to hire.
Basically, there is a priori ROI built in if agents actually work. Agents necessarily replace certain amounts of human labor and presumably do it at lower cost than the equivalent human time.
Now, it's important to note that how companies use those cost savings and that increased productivity is going to dictate just how disruptive this is. If companies reinvest that human time into growing the business in other areas, I tend to think that this will be a phenomenal development for everyone. If, on the other hand, they just view it as a cost-cutting measure, well, that's a whole different kettle of fish. But the real thrust of this Wall Street Journal piece is to try to figure out how agents are being used right now in reality.
The first example they gave us from pharmaceutical giant Johnson & Johnson, who have been deploying drug discovery agents. Honing in on what agents can and can't do, the article points out that these agents aren't yet up to the task of coming up with new drugs all by themselves. Instead, they're deployed to optimize key points in the drug synthesis process. Traditionally, drug manufacturing is refined by running a multitude of experiments, which often have multiple variables to adjust.
Agents are able to take the data from a smaller number of experiments and extrapolate it out to arrive at an optimal method. At this stage, employees are still reviewing the output of agents, but they write, the company is still figuring out how that oversight can be done more systematically.
Next up, we move over to the world of finance, where financial analysis firm Moody's has developed a team of agents to research public company filings and perform industry comparisons. In total, the firm has 35 different agent designs, all trained for different subtasks and linked up together in a multi-agent system. The system even has agents as supervisors to check for hallucinations. The novel idea here is that each agent has its own set of instructions, personality, and data access. This means the agents within the system can come up with different conclusions in their analysis, which are then synthesized together.
For example, one agent might be building their analysis based on industry competition data, while another might be focused on geopolitical risk. Nick Reed, the company's chief product officer, said, It's almost a bit like your ability as an individual person. What we worked out is that an agent is better at not multitasking.
This is obviously a highly relevant conclusion, even if this just represents the current state of things in terms of how enterprises think about deploying agents. Rather than trying to have one agent do multiple things, companies might get better results by assigning multiple agents with narrow subtasks and finding ways to coordinate them, once again, possibly with agents. The thinking is not ultimately dissimilar to the way you would construct a team of humans to carry out a multidisciplinary task. eBay is engaged in one of the most popular agent use cases, writing code.
Interestingly, eBay actually built its own agent framework that can take advantage of several different LLMs. In addition to writing code, eBay's agents are also creating marketing campaigns, and they're planning on rolling out another set of agents that can help buyers find items, as well as helping sellers list goods. The journal writes, eBay's agent framework functions as an orchestrator, dictating which AI models will be used for certain tasks like translating code and suggesting code snippets.
Next up is Deutsche Telekom. And rather than facing outward, their agents are facing inward. The company employs roughly 80,000 workers across Germany. They've trained agents now to answer employee questions about internal policies and benefits. They also have an agent trained to assist service staff with questions about the company's products and services. In this case, we might be pushing the boundaries of the language of agent. This sort of sounds ultimately like a chatbot that has access to internal databases.
Still, call it what you want, it seems to be getting a lot of traction. The company's chief product and digital officer, Jonathan Abramson, said that about 10,000 employees are using it each week. That is dramatically more efficient than having an HR specialist or having employees search for policies on an internal website.
Still, Deutsche Telekom is figuring out how to go farther. The company's next step is allowing the agent to execute requests on behalf of employees, further automating basic HR. The example given was allowing the agent to complete a request for leave and enter it into the HR system, all fully automated from a natural language text prompt.
The final example is, I believe at this stage, the most commonly deployed agent example. In this case, it came from Spanish company Constantino, who manufacture countertops and other stone materials for buildings. The company has brought on a team of agents to fill in gaps for their customer service staff. They refer to the agents as a digital workforce and are thinking about them in a very similar way to human workers. The agents are expected to have basic skills but receive training when they begin work. Agents are given instructions to follow a strict process and
and supervisors are present to ensure they don't go off the rails. The so-called digital staff have replaced the work of three to four team members who were previously involved in clearing customer orders. Those people have now been reassigned to more high-touch areas of customer service, liberated from their data entry tasks. Now, like I said, all of these are fairly basic use cases, but that I think represents where we are. I do believe that 2025 is going to be a huge year for agent pilots, and many of them are going to fall into some of these areas described and articulated in this piece.
Now, one useful resource for figuring out how to implement agents in your workforce is a white paper published by Google last September simply titled Agents. The paper explains what agents are and what they require to function, but more importantly, suggests that companies shouldn't think about agents as an upgrade to existing technology. Instead, they should think about agents as a fundamental shift in the way organizations operate in order to see maximum gains in efficiency and productivity. Basically, the first big idea in the paper is that agents are more than just smarter LLMs.
The core agentic function is being able to access other systems. This could mean simply accessing a database to inform an output, but the possibilities go so much deeper. It's possible, for example, to integrate agents into real-time data feeds to inform autonomous decision-making. Agents have much greater ability to process data than a human. We will likely find agents are able to monitor and take actions based on multiple data sources that would have required an entire team of people to carry out.
Google's paper discusses another big difference between LLMs and agents, the ability to reason through multi-step tasks. There are many different architectures that can be used to achieve this. The agent could use chain of thought, an iterative process of reassessing the task as it progresses based on new information revealed at each step. It could use a tree of thoughts where multiple possible solutions are explored at the same time. Ultimately, according to the paper, this makes agents capable of managing uncertainty and complexity in ways that traditional models can't.
There's a ton of really interesting information in here. I will link to it in the show notes. And of course, one quick shill here, if you've made it this far, you've probably been hearing this ad, but one of the things that we were doing at Super this year is an agent readiness audit where we are digging in with you to help you understand what parts of your company or your workforce's activities are best suited for exploring agents. And we're also helping scope and even support pilots in that area.
If that's something you're interested in, email me at nlw.bsuper.ai and join this 2025, the year of agents. For now though, that is going to do it for today's AI Daily Brief. Until next time, peace.