EP 468: AI News That Matters - February 24, 2025

2025/2/24

Everyday AI Podcast – An AI and ChatGPT Podcast

AI Deep Dive AI Chapters Transcript

People

Jordan Wilson

一位经验丰富的数字策略专家和《Everyday AI》播客的主持人，专注于帮助普通人通过 AI 提升职业生涯。

Topics

Jordan Wilson: 本期节目讨论了 Elon Musk 的 XAI 公司发布的 Grok 3 模型，该模型声称在基准测试中超越了竞争对手，但同时也面临着关于基准测试结果的准确性和 AI 审查制度的争议。Grok 3 的推出也伴随着关于其在政治立场上的偏见以及提供危险信息的能力的担忧。此外，节目还讨论了 OpenAI 如何发现并阻止利用其 AI 工具进行中国影响力宣传的企图，包括传播西班牙语的反美虚假信息。在机器人技术方面，节目介绍了 Figure 公司推出的 Helix 模型，该模型使人形机器人能够更流畅地处理物体和协作。在量子计算领域，Microsoft 发布了 Majorana 1 芯片，这被认为是该领域的一个重大里程碑。最后，节目还讨论了 Mira Murati 的新 AI 初创公司 Thinking Machines Lab，以及关于 Anthropic 可能发布新模型 Claude 3.7 Sonnet 的传闻。 Elon Musk: Grok 3 模型比之前的版本有了显著改进，并使用合成数据进行训练，使其能够反思错误并提高逻辑一致性。Grok 3 还具有新的搜索模式、语音模式和“不受约束”模式。 Igor Babushkin: 就 Grok 3 的基准测试结果与 OpenAI 的说法存在争议，他为 XAI 的说法进行了辩护，并强调了 AI 性能报告透明度的重要性。 OpenAI: 指责 XAI 提供误导性的基准测试结果，并强调了在 AI 性能报告中保持透明度的重要性。同时，OpenAI 也积极地识别和阻止利用其 AI 工具进行恶意活动的企图，包括中国影响力宣传。 Google: 推出了 AI 共同科学家系统，旨在帮助科学家制定新的假设，从而加速科学和医学研究。同时，Google DeepMind 发布了其视频生成模型 VO2，该模型现在可通过其云 API 平台访问。

Deep Dive

Chapters

The release of Elon Musk's Grok 3 has caused significant controversy. Benchmarking disputes with OpenAI and censorship concerns regarding its search capabilities have raised questions about its accuracy and alignment with free speech ideals. Further issues include the generation of instructions for creating harmful substances.

Grok 3 launched with controversies surrounding its benchmark results and censorship.
OpenAI disputed XAI's benchmark claims, highlighting inconsistencies in methodology.
Grok 3 exhibited concerning behavior, providing instructions on creating harmful substances and exhibiting political bias.

Shownotes Transcript

Translations:

中文

This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Elon Musk's Grok 3 is already facing multiple controversies. Google's VO2 state-of-the-art AI video model is

has been released, but not in the way that you would think. And Microsoft is going quantum and apparently discovered a new type of matter. Like, I don't know, like it's Monday and it already feels like we're at the end of the week in terms of the amount of

of AI news that is just hitting us. And that's not all. We might also see a new update, a new model from Anthropic, from Claude after, I don't know, waiting like seems like 30 years.

All right, we're going to be covering those stories and a whole lot more in today's AI News That Matters. So welcome to Everyday AI. What's going on, y'all? My name is Jordan Wilson, and I'm the host of this show, and this is for you. So we do this every single weekday, Monday through Friday, bringing you the real

real updates that matter. So on Mondays, we do our AI news that matters, but we do the everyday AI show every single weekday. So if you're looking to grow your company or grow your career by understanding the latest in Gen AI, you are in the right place.

the other right place for you to be your BFF is going to be our website. That's your everyday AI.com. Uh, you can go there. You can number one, sign up for our free daily newsletter where every single day we recap, not just our podcast, which is usually bringing you some exclusive insights with interviews and everything like that. But,

On there, you can go listen to like now 450 episodes, all sorted by category from the world's leading experts, all for free. So make sure you not only go sign up for that free daily newsletter where we will be recapping today's show, but also go check out, I don't know, one to 20 episodes and be the smartest person in AI at your company. That's what we are here and trying to get you to do.

All right. And also make sure go listen to episodes 443 through 447. Some people are asking recently like, Hey Jordan, where do I start? This looks intimidating. You have like 500 episodes. Go listen to those. That's our 2025 AI predictions and roadmap series. So if you have not listened to episodes 443 to 447, please go do that. All right. Enough chit chat y'all. Let's get into the AI news that matters for the week of February 24th. And also,

Thanks for joining us live. If you're normally on the podcast, come hang out. It's fun. Michael and Michelle, Sandra, Phillip, a lot of people joining on the YouTube machine this morning. Love to see it. Samuel, Bethany, Lauren, Jackie, Big Bogey Face, Alan, Natalie, too many to name this morning. Thanks for tuning in, y'all. All right, let's get into it.

Let's start at the big story, what everyone is talking about. Elon Musk's startup, XAI, has launched its latest AI model, Grok 3, claiming that it surpasses competitors like OpenAI, Cloud, Anthropic, and Google, and others.

So Grok 3 is said to be significantly more capable than its predecessor, Grok 2, and is available to premium users of the social media platform Axon. It's rolled out to free users as well. It's actually been a confusing rollout from the Grok team because when it first rolled out, you had to have a premium plus app.

And then you had to have a premium and then they doubled the price of premium and then they made it somewhat available to free users, but with not a lot of communication. But it is available and the model has been tested on standardized exams in everything from math, science and coding, reportedly outperforming existing AI models. More on that reportedly here in a bit.

So Musk described Grok 3 as scary smart with enhanced reasoning abilities and noted that it is trained on synthetic data, allowing it to reflect on its mistakes and improve logistical consistency.

So the launch of Grok 3 also includes a new product called Deep Search. All right. You know, Grok had to go against the grain and not name itself Deep Research like the other deep research tools from Google, Perplexity and OpenAI. So you have the deep search mode in Grok 3, the new voice mode and people are,

going wild about the unhinged mode. So I don't know if you're 13 and, you know, giggle at profanities, maybe it's for you. And as well as a kind of a think deeper mode. So similar how you can kind of control certain models to use a little more compute or to use a reasoning model. You have that with the new Grok 3 as well.

So the models rollout is part of the beta phase with ongoing improvements expected and the voice assistant features slowly rolling out already. I do believe the latest, I haven't used the new voice features, but I did see it was released to many users as of late this weekend. So XAI has expanded its GPU cluster, doubling the size to support the training of Grok 3. So originally it was reported that they had 1K

100,000 of Nvidia's most powerful GPU chips. Well, turns out they actually reportedly have 200,000 of Nvidia's most powerful GPUs. So they really just willed Grok 3 into existence. So you have to tip your cap to the XAI team, right? Going from essentially zero to Grok 3 in like a little more than a year. And the Grok

The release of Grok 3 comes amidst intense competition in the AI market with previous models from OpenAI and Google setting high benchmarks that Grok 3 has now either topped or has at least caught up to in almost all benchmarks. So, you know, I'm curious for our live stream audience, if you are using Grok.

All right. Michael says, Hey, I'm 35 and giggle at the profanities. Uh, so, you know, let me know if you're actually using grok. If you want to hear more about it, I don't know. It's, it's going to be grabbing a lot of headlines. I think the first couple of weeks. And speaking of that, yeah, we actually have two more grok stories to start off with and they're pretty significant. I think we kind of have to talk about here. So, uh,

Speaking of those benchmarks, there is a controversy that has erupted online between OpenAI and XAI concerning those exact benchmarks. So, uh,

OpenAI accused XAI of presenting misleading benchmark results, which XAI co-founder Igor Babushkin defended, highlighting a broader debate about transparency in AI performance reporting.

So here's the gist, and I'll try to break it down hopefully simply. So XAI published a graph showing Grok 3 outperformed OpenAI's O3 Mini High model on the AIM, that's the AIME 2025 math benchmark, a test whose validity as an AI benchmark has been questioned by some experts.

So the dispute centers around the omission of OpenAI's O3 mini high score at consensus 64. So cons 64 or consensus 64 is a method that allows a model 64 attempts to answer each question, potentially inflating performance scores. But when measured at, at,

which is the first attempt, Grok 3's performance was lower than OpenAI's O3 Mini High, contrary to XAI's claim of Grok 3 being the world's smartest AI. So essentially, if you gave Grok one attempt, it did not beat O3.

OpenAI's O3 Mini and some other models. And they use this, oh, or sorry, this cons 64, which essentially gives a model 64 attempts to see if it can get the right answer. So

Parts of this I understand, right? Because large language models are generative. They're not deterministic. So they are like a controlled roll of the dice, right? So you could do the same prompt 10 times, get nine different answers. You can get one different answer. It depends on what you're asking and if there is a definitive answer or not. So, uh,

I personally though, think it's worth noting this whole cons 64 method because you know, it's essentially now a, he said, she said, open AI said, not grok. You guys lied on these benchmarks and groks like, no, you use cons 64 as well on your benchmarks. Here's the thing that I didn't really see anyone else talking about. Uh,

Yes, OpenAI use cons 64 or the best of 64, but only when comparing its own models. And going to show that when you gave a model like O3 Mini high,

extra compute that is what makes it the high variation of that right they they were essentially showing how much extra juice you can squeeze out of the model uh you know on this high mode or giving it 64 attempts uh to get an answer right but I haven't seen anyone except now grok in xai use this

cons 64 benchmark to compare itself to different models. That is not something that open AI did right in this instance that they're talking about. Open AI was just comparing, uh, its own internal models with the cons 64, not external models. So a bunch of geeky, uh, benchmark drama going on in the AI world, but that is not the only controversy that Grok is already facing. Yeah. Uh,

I would love to say I'm surprised, but I'm not. All right. And y'all, I, I'm not trying to get things political here. I know people are going to, you know, people sending me hate mail. This is just what happened. All right. I don't know why people get so mad at me and they're like, Oh, Jordan, you're making this so political. I'm like, no, this is just facts. This is just facts. This is what happened. So save your grumbling for someone else. All right. So

XAI's Grok3 chatbot has been reported to have its search capabilities censored, sparking controversy over free speech and the truth-seeking promises made by Elon Musk. So this is according to reports and recent users. And when I saw this, I did replicate this. And yes, this is actually or was actually true.

So again, Igor Babushkin from XAI confirmed a system prompt update was reversed after user feedback indicating internal misalignment with company values. So users noted that Grok

three search instructions, uh, had previously told the model to disregard anything that it saw online, right? Because, uh, Grok has, uh, access to the internet and obviously has access to Twitter. So there was a system instruction that told Grok to ignore anything that said Elon Musk or Donald Trump spread in, uh, spread misinformation.

Right. So a couple online sleuths, you know, it's whenever new models come out, people try, you know, thousands of different things to see, you know, what's maybe not correct, what's maybe misaligned in a model. And this is something that was obviously pretty bad, right? If you're trying to be the free speech platform and you are giving specific feedback.

system instructions saying, hey, if anyone asks about misinformation and who spreads it, don't mention Elon Musk or Donald Trump. You're right. Not at all. So the move, like I said, contradicts Musk's previous assertions that X, the platform hosting Grok, is dedicated to free speech. So Grok 3 has taken unexpected political stances, also listing Donald Trump first when asked about who deserved a death

Penalty, yeah, that's not good. And labeling, like we said, Musk as a major misinformation source.

XAI has responded by adding a prompt to prevent Grok from commenting on death penalty cases. Yeah, so people were actually saying, hey, who deserves the death penalty? And it came back naming names, including US President Donald Trump. So that's not good. So the team has said that there is a permanent fix. And since I'm not putting those type of prompts out there into existence, but since people have shown that that has been kind of fixed.

So this behavior highlights the challenges Musk face in balancing free speech with controlling narratives, especially as Grok 3 continues to express views contrary to Musk's apparent political preferences.

And that's just the beginning of it. There's a whole other very serious issues with Grok. It was readily giving out instructions on how to make drugs, how to make chemical weapons of mass destructions, things that large language models should not be spitting out. So I guess we'll have to keep an eye and hope Grok becomes a little less unhinged because that's not good. All right.

Moving on. Now we have more AI misuse. Who would have thought? All right. So OpenAI has identified and disrupted attempts to misuse its AI tools in Chinese influence campaigns, including spreading Spanish language anti-US disinformation. So

According to reports, this is, so according to OpenAI, one of these campaigns was called Sponsored Discontent, and it used ChatGPT to generate anti-American Spanish language articles and English language comments. So these articles were distributed across various Latin American news sites, sometimes as sponsored content, while comments appeared on platforms like...

Yeah. You know, all the time when I say, you know, people ask me like, hey, are you going to use Grok? Should businesses use Grok? And I'm like, absolutely not. And this is one of the reasons why I told you a couple of them and some of the controversies that Grok is facing right now. But the other thing is Grok relies heavily on

And it has been widely shown that the X, formerly known as Twitter platform, not only has the highest rate of disinformation and misinformation versus other social media platforms. No social media platforms are perfect, right? And Meta also uses its platforms and its training data for its Lama models, but not at the level in the frequency that Grok does and that Grok uses X platform.

So this is just another reason why you got to be careful. And there's a lot of bots, right? There's a lot of bots. So OpenAI found that its GPT technology was being used to spread kind of this pro-Chinese and anti-American disinformation on platforms like Facebook.

I already said that a couple of weeks ago. I believe I said that in my AI predictions and roadmap series, right? That AI was going to be used in a lot of bad ways by China. So Ben Nemo from OpenAI's intelligence team noted this is the first known instance of Chinese influence operations targeting Latin America with translated articles.

So there was another campaign called Peer Review, and it involved using ChatGPT to create marketing materials for a tool allegedly used to report protests to Chinese security services. So OpenAI has banned the involved accounts, citing violations of policies against using AI for unauthorized surveillance. Let's get to robotics. So figure.

A robotics startup from Silicon Valley has introduced Helix, a new AI model for humanoid robots. So the Helix model allows robots to handle objects, collaborate with other robots, and control their upper bodies more smoothly. So this new launch of the Helix platform follows Figures' decision to end its collaboration with OpenAI and raise 1.5 billion dollars.

So yeah, Figure made some headlines about three or four weeks ago when they said, hey, yeah, we're ending our partnership with OpenAI because in their first Figure 01 model, it was using ChatGPT's or OpenAI's models, both their vision and

Their speech to text and just their full 4.0 model was using it for the figure 0.1. So now we saw this kind of what figure decided to do instead. And it did look fairly impressive, although the demos were extremely limiting. But Helix could represent a significant step toward integrating humanoid robots into everyday home environments. So unlike previous models, Helix does not require extensive training on specific tasks.

enabling interaction with unfamiliar objects. So in the demonstration video, Helix powered robots successfully put away groceries, showcasing their practical applications. So figure did say in this demo video, essentially it was giving them groceries to put away in a simulated kitchen. There were two of these new figure robots, presumably running the Helix

kind of model or the helix system. And it was said that they hadn't been trained on these items, right? So there were some grocery items and some other things that were placed on a counter. And then these two figure AI robots silently put away the groceries and worked with each other. So they handed each other things, which I thought was both pretty cool, right? Because there's times like I got home late last night. I didn't want to put away the groceries.

all right but i had to would i want two humanoid robots putting away the groceries i don't know maybe um maybe

Maybe in the future, but a little weird. I actually found it a little unsettling that the robots weren't talking. I think Figure was trying to show like, oh, look how smart these AI, these humanoid robots are. They can, you know, first, they haven't been trained on these models. Like, oh, the apple goes in the dish and everything.

You know, the cold thing goes in the fridge, right? So a figure said that they weren't trained specifically on all of these items, but they weren't communicating, which I don't know.

At least when I think about it, when and if there's a humanoid uprising, I would like for them to be talking to each other so at least I can understand what's going on. So although it shows some pretty emergent capabilities, I don't know. Personally, I found it a little disturbing that they were collaborating silently.

So Figures founder Brett Adcock highlighted a major breakthrough in robot AI, like I said, achieving this Helix model entirely in-house. So the company is reportedly in talks to raise another $1.5 billion, valuing it at nearly $40 billion. So, uh...

Yikes. I don't know. Are you guys excited about this? Angie from LinkedIn says silent grocery agents are creepy and asking, are they just listening and watching us? I don't know. Joe says, I don't want chatty robots putting away groceries away when I'm trying to sleep. So I don't know. Maybe people want multiple humanoid AIs, you know, silently going around and doing things. I don't know.

I would, I would prefer hearing actually what they're saying, but that's just me. That's just me. All right. Uh,

Some other big, big AI news, not necessarily on the large language model side, but Google has introduced an AI system called AI Co-Scientists designed to help scientists formulate new hypothesis, potentially accelerating scientific and medical research. So this AI system uses unique methods involving AI agents that generate debate and refine ideas before presenting them to human scientists.

So unlike other AI models, AI co-scientist produces new ideas rather than summarizing existing ones, setting it apart from reasoning models like OpenAI's O3. So the system has been powered by Google's Gemini 2.0, but can work with any large language model offering flexibility across various fields.

So the AI co-scientist comprises several specialized agents that work together to generate, review, and refine medical hypothesis, simulating a team of research assistants. So during testing, the AI co-scientist

successfully generated a hypothesis about antibiotic resistance that matched findings from an independent study demonstrating its potential effectiveness. So while the AI co-scientists can perform complex tasks, it is designed to collaborate with human scientists, not replace them. So the US and UK

have initiated fellowships to study AI's impact on scientific research with funding from the Alfred P. Sloan Foundation and the UK government. So according to researchers, these advancements in AI could significantly augment biomedical and scientific discovery, ushering in a new era of AI-empowered scientists. So,

I've been saying this for many years, right? I've always said the future of large language model is many small language models, right? And we're kind of seeing that here with Google's co-scientists. Instead of just having 10 different versions of Gemini 2.0 and each of them just take a small section of the task,

These are specially developed, essentially small agents, and these agents just have one role. So it's thinking of it as a narrow focus. You know, sometimes we talk about, you know, ANI, artificial narrow intelligence versus AGI, artificial general intelligence. And right. These large language models are trying to tackle artificial general intelligence, which means they're trying to be the best at everything. But again,

Uh, you know, and we're, we're not just talking from a, uh, an LLM perspective. We're talking from an agentic perspective as well. Right. That's why I think early on the early buzz of agents in, you know, late 2022, early 2023, at least LLM powered, uh, agents, it was these

agents, it was like one agent that was trying to do absolutely everything where I think what we just saw here from Google with co-scientists is going to be the larger trend throughout the industry is you're just going to have at first maybe a handful, but then dozens, but eventually hundreds of agentic AIs working with each other, but really just fine tuned for one very specific task. Why? Well, narrow intelligence is

is much easier to achieve than general intelligence, right? So if you train a model or if you fine tune a model specifically on one area, let's say medical or researching potential reasons antibiotics fail as an example, right? If you train one model,

On just that, it is going to perform much better than the larger model that it was distilled from that was not trained specifically on that. So pretty exciting stuff there from Google.

All right. Our next piece of AI news, some unsettling one, but the Trump administration is planning to cut almost 500 roles at the U.S. Safety Institute housed within the National Institute of Standards and Technology or NIST. So this is according to reporting from Axios. So these cuts are significant as they may deeply impact AI safety and regulation efforts, potentially leaving the

A-I-S-I as gutted.

So the AISI has been instrumental in overseeing AI model testing and collaboration on regulation efforts with companies like Anthropic and OpenAI. So yes, for the last year or so, the big AI labs have been working with the US AI Safety Institute to make sure new models that they release are safe. Well, might not be happening anymore with this new technology

with this recently created USAI Safety Institute now reportedly being gutted. So the cuts also affect semiconductor production, including quote unquote from Axios, 74 postdocs, 57% of CHIPS staff focused on incentives and 67% of CHIPS staff focused on R&D.

So the decision seems contradictory to the Trump administration's goal of achieving AI dominance over China, especially considering the national security implications of the chips.

Initiative. So the anticipated firings follow the exclusion of AISI staff from the recent AI Action Summit in Paris and the resignation of AISI Director Elizabeth Kelly, reportedly due to political pressure. So this development is part of Trump's broader AI agenda, which prioritizes AI dominance over safety and regulation. All right.

New news, I guess we have a name now from Meera Muradi's newest company. So Meera Muradi, the former CTO, the former chief technology officer of OpenAI, has officially unveiled the name of her new AI startup called Thinking Machine Lab, aiming to address significant gaps in advanced AI systems.

Sorry, Thinking Machines Lab. So the startup's mission is to make AI systems more understandable, customizable, and generally capable, according to a blog post shared on Tuesday. So it's not just her idea.

At the top of the ticket, there's some other big names from OpenAI and other big tech companies. So John Shulman was a co-founder of OpenAI, will join Maradi as the chief scientist after he left Anthropic. Yeah, so Shulman went OpenAI.

Anthropic now going over to Thinking Machines Lab. Also, Barrett Zoff, previously OpenAI's vice president of research, will serve as the chief technology officer. And at least seven former OpenAI staff have joined the team.

The team also includes researchers from top AI companies like Meta, Google DeepMind, Character AI, and Mistral. So Maradi left OpenAI in September to explore new opportunities and was reportedly in talks to raise over $100 million for her new startup.

So, well, what is it all about? Well, the new venture is significant for the AI industry as it highlights a trend of key talent moving to new projects and potentially shaping the future direction of AI research and applications. So,

It should be interesting to see what Maradi and her teammates cook up at Thinking Machines Lab. So we don't have a ton of new information about what they're going to be working on aside from addressing the gaps in advanced AI systems and to make AI systems more understandable, customizable, and generally capable. So not exactly sure what that is.

is actually going to mean yet, but Marotti did assemble a pretty impressive roster of talent to get Thinking Machines Lab off the ground.

Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.

Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,

or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI. Michael says he's excited for Mira's company. Yeah, I'm excited to see what they cook up as well.

All right. Speaking of cooking up, you can now cook up some new AI video with Google's VO2. So Google DeepMind has announced the cost structure and availability for its VO2 video generation model, which is now accessible via its cloud API platform.

So creating a video using VO2 is priced at 50 cents per second, translating to about $1,800 per hour on the Vertex cloud, as noted by Google deep mind researcher, John Barrett. For context, the blockbuster film Avengers Endgame costs approximately $32,000 per second to produce using traditional methods. So yeah,

$32,000 for a very highly visual blockbuster film like Avengers Endgame, where right now it's 50 cents per second for VO2. Obviously, those two things are not the same, right? Don't think they are. I'm not trying to draw a comparison, but you might be thinking 50 cents a second. Is that expensive? How much do? Well, there you go. $32,000 per second for high-end cinema.

Yeah, I called Avengers Endgame high-end seminar.

I like it. I watch Marvel movies. I have hobbies, right? It's almost like I'm arguing with myself as I don't sleep and just read and talk about AI all day. I'm like, yes, I'm a real boy. I have hobbies. All right. So although VIA 2 is more expensive than OpenAI's Sora, which charges $200 per month with no usage cap, it remains a more affordable alternative to conventional filmmaking.

So the pricing only covers the AI generation process, but additional costs, obviously, for human labor and multiple iterations to achieve desired results should be considered. Yeah. So, yeah, obviously a human still needs to go in there and work with this. So that's not the total cost, right? This is just the cost to actually produce it.

So AI enthusiasts and professionals could consider the potential of AI models like VO2 to completely change video productions and its implications for cost savings and efficiency in creative industries. So aside from the API, VO2 has also just been released across a variety of platforms. So

We talked about it last week. It was released in a limited capacity in YouTube shorts. You couldn't use the full VO2 model, but now you can. So you don't just have to cook it up with your own developers using the API because now FreePic and FTP,

I don't know if that's fall AI or F-A-L-A-I, right? But FreePic and F-A-L also have this service baked into their video generation platform. So this is pretty exciting news because, well, number one, I'm surprised, right? I'm surprised that Google did not make this a video on its front end or did not make this VO2 available on its front end for the general public.

Yet, I'm assuming that will come at some point once they figure out pricing, but they did make it available via the API.

And what that means is just about any video generation company out there that uses, uh, multiple APIs as an example, free pick and fall or FAL, right? They can start using VO two. And why is this important? Well, uh, VO two is hands down the best AI video model out there. It is better than open AI's Sora. Uh, it is better than cling. Uh,

out of China. It is better than Adobe Firefly, right? And I'll say it's not even necessarily close. So the VO2 model is extremely impressive. And I will say this, Sora, we got a tease of it first, right? And then we had to collectively wait like eight months, but at least it's now available for anyone inside OpenAI's platform when you go to Sora.com. So it's interesting that Google didn't release this on its own platform,

first, right? Where you can like log in and use it, right? You have to use it via the API or one of these other third-party service providers, or you can use it in a very limited capacity in the YouTube Shorts section. But it is by far the best AI video model. And I will say it is the first AI video model that I think will confuse the general public.

Right. Because I think with even with open eyes, Sora, which I would say is probably in second place, I think there's, you know, some some competition there for who's next best after Google's VO2. But I'll say with Sora and a lot of these others buying for second place, for the most part, you can tell. Right. If you just take your first shot out of Sora, you can usually tell it's AI generated. Right.

It struggles a little bit with physics. It struggles with understanding real-world simulations. Google, not as bad, right? It's not perfect, right? I think there was one kind of viral example that, you know, once all these platforms were released, because there are some trusted testers that did get early access to VO and can go use it inside Google's platform. But there was one kind of famous or viral comparison video. It was someone cutting tomatoes, right?

a close-up of someone cutting tomatoes in Sora and in all the other, some of the AI video generators from China and Runway and all these others really struggled. And sometimes it was cutting a finger and sometimes the tomato just kind of cut itself or you keep chopping the tomato and it just wouldn't come out into little slices. And then the VO one was extremely impressive. So

I do think this is the first video that will confuse the average human viewer in terms of it's real, right? And yes, you can start with an image as well. So you can get a, you know, use some of the best platforms out there like mid journey, get a, get it still image and then create a video that looks very realistic. So there's obviously some great upsides to this, right? Like maybe corporate training videos are going to get offloaded.

updated, the ones that haven't been updated since 1997. And they're still like a three by three aspect ratio, right? And they're terrible, right? So maybe there's good use cases like old, you know, companies are going to be able to create more engaging content. They're going to be able to update more videos at a cheaper cost.

And smaller companies are going to be able to produce high quality videos that just didn't have the budget or the talent or the expertise to create before. But obviously there's a ton of downsides with this because I think people are not going to be able to realize what is real and what is fake, especially when the next version comes out. But I do think with enough generations and in the hands of a skilled, uh,

person, it can be hard, at least in short little bursts, to tell if VO2 is real or not. All right. Kind of our last piece of AI news. Well, Microsoft has a quantum leap. So Microsoft has announced a significant milestone in quantum computing with the unveiling of its Majorana 1 chip, which could revolutionize industrial scale problem solving.

So also they said they discovered a new state of matter. But we'll leave that side, that discussion for like the scientists. This is an AI podcast, right? I don't I don't understand that part. But the announcement led to a boost in shares for quantum computing companies with D wave companies.

quantum up nearly 10% and Riget Teddy computing rising 2.5%. I got that wrong. Rigetti computing, right? So the whole quantum computing industry with this Microsoft announcement just went boom over the weekend. So Majorana 1 is a quantum computing unit or a QPU. Yes, I know. Now we have CPUs, GPUs, and

QPUs, TPUs, and QPUs. I guess we're just going to PU everything until we run out of other letters in the alphabet. So Majorana 1 is a quantum processing unit or QPU that utilizes a new type of qubit to the topological qubit described as small, fast, and digitally controlled.

So Microsoft claims its architecture can potentially fit a million qubits on a single chip, surpassing IBM's goal of a 100,000 qubit quantum supercomputer by 2033. So yeah.

Microsoft said, IBM, hold my QPU. And they just said, we're going to 10X it. So the development marks a transition from scientific exploration to technological innovation after Microsoft has been researching this for apparently more than 17 years.

So quantum computing is expected to transform, well, all fields. And that's why it's important if you're following generative AI. But it's especially expected to transform fields like pharmaceuticals, cybersecurity, and supply chain optimization, although mainstream applications are still years away. But despite enthusiasm, industry leaders like NVIDIA's Jensen Wong and Meta's Mark Zuckerberg cautioned that practicality

Quantum computing is expected to revolutionize fields such as pharmaceuticals, cybersecurity, and AI. Although it's going to be a while until we actually see this into fruition. We don't cover quantum computing a ton on the show. We did cover it once. But here's why it's important for AI. Think of it like this. It will make everything possible.

thousands times faster, right? So think of all the power in the compute that is needed right now to train new models and when we actually use them. If quantum computing comes to fruition, right? And if this new Majorana 1 chip does help in that quest, everything will be like

I mean, according to reports up to a million times faster, because again, I'm not an expert in this, but how traditional computing works is you essentially have one part of a computer working on one task.

at a time, right? It can't work in parallel. You know, one part of a computer works on one part of the task, then the next part of the computer works on another part of the task. Whereas this, with this new Majorana 1, reportedly it'll just have like a million qubits working on every single possible explanation or part of a problem.

in parallel, right? So things that would literally, in theory, take thousands of years could be done in seconds when and if quantum computing is achieved. So this does not mean we've achieved quantum computing, but this is a pretty big milestone and a quantum leap

on the Quantum Quest from Microsoft. So we probably won't be covering this too much as it's not like super generative AI, but it is something that impacts literally everything because then instead of having to use 100,000 GPUs or 200,000 GPUs over the course of a year, right? It's a very time-consuming, energy-consuming process

costly process. In theory, you could do all of that training async, right? You could run it simultaneously and probably get it done in, I don't know, like a minute or something like that, right? Again, is that science fiction? Maybe a little bit, but it's starting to look like more near-term science fact than science fiction. So pretty interesting. All right. So

Let's wrap it up and say there might be some new announcements maybe as soon as this week. So according to the grapevine and the grapevine is, you know, just people on Twitter and reporters on the Internet, Anthropic may be releasing a new version of Claude this week because tipsters online are saying that they've seen references online.

of Claude 3.7 Sonnet, which would be Anthropic's newest model, which come on Anthropic, like everyone was saying, you know, they updated Sonnet 3.5 and they just called it Sonnet 3.5 new. And everyone's like, yo, just call it Sonnet 3.6 or something, right? But apparently we may be seeing an announcement on Claude 3.7,

Sonnet, which would have reasoning capabilities and kind of that extended thinking for detailed step-by-step problem solving. And it would offer users, reportedly offer users the choice between quick responses and a thorough analysis, making it ideal for AI agents, complex workflows, and customer interaction. So according to rumors, Cloud 3.7 Sonnet is...

uh, anthropics most intelligent model to date in the first to offer extended thinking. Uh, so kind of this hybrid, uh, model, um, situation, right. Where it uses a little bit of the quote unquote old school transformer. Uh, but then it also uses a reasoning model. So, uh, it is reported that, uh,

Amazon could be announcing this or Anthropic could be announcing it in step with Amazon Wednesday at the Amazon AI Alexa event. So we'll be covering that this week. And also there's rumors that, you know, we'll,

whether it's right around the same time or in the weeks after, uh, open AI might be releasing their new GPT 4.5, uh, technology shortly after, uh, Anthropic releases their new hybrid version of Claude. So again, this is all rumors and rumblings, but we are beginning to see some, uh, kind of snippets or breadcrumbs of this in code, right? So I believe this was spotted on, uh,

Amazon bedrock system showing a clawed 3.7 as a model choice.

and then was reportedly taken down. So we'll see. It's rumors and rumblings for now, but AI does not sleep. And maybe I should start sleeping a little more. All right. I hope this was helpful, y'all. If it was, please repost this, right? So if you're listening on YouTube or Twitter, please don't just keep everyday AI as your secret cheat code, right? Our team spends

so much time every single day, making sure you dear listener are the smartest person in AI at your company. So when someone's like at your company, he's like, Hey, should we be using rock? Right? We've been using, uh, open AI for three years. And they're like, let's get off rock. You at least know, Hey, there, there's a couple of things we should, we should be considering right in this conversation, right? Our goal is to make you the smartest person in AI at your company. So you can grow your company and grow your career with generative AI. So

If that is you, and if this was helpful, please consider sharing this. If you're on the podcast, thank you for listening. Please subscribe to the show. Leave us a rating. We'd appreciate that. Also, go listen to episodes 443 through 447, our 2025 AI predictions and roadmap series. Thank you for tuning in. Go sign up for the newsletter, youreverydayai.com. See you back tomorrow and every day for more Everyday AI. Thanks, y'all.

And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

EP 468: AI News That Matters - February 24, 2025 47:23 Share

Everyday AI Podcast – An AI and ChatGPT Podcast

Deep Dive

Shownotes Transcript

EP 468: AI News That Matters - February 24, 2025