We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

EP 473: Claude 3.7 drops, OpenAI releases GPT-4.5 and more AI News that Matters

2025/3/3

Everyday AI Podcast – An AI and ChatGPT Podcast

AI Deep Dive AI Chapters Transcript

People

Jordan Wilson

一位经验丰富的数字策略专家和《Everyday AI》播客的主持人，专注于帮助普通人通过 AI 提升职业生涯。

Topics

本周人工智能领域发生了很多事情，包括Anthropic发布了Claude 3.7 Sonnet，OpenAI发布了GPT-4.5，以及关于苹果AI和AGI的讨论。Claude 3.7 Sonnet是第一个公开发布的混合AI模型，结合了Transformer和高级推理能力，在编码方面表现出色。然而，其推理能力还有待提高，一些用户在某些任务中会回退到3.5 Sonnet。GPT-4.5是OpenAI最新的大型语言模型，强调可靠性和相关性，在情感智商方面表现出色，但其API定价非常高。谷歌联合创始人谢尔盖·布林为了赢得AGI竞赛，要求提高生产力，增加在办公室的时间，引发了人们对AI开发领域高压工作文化的担忧。Meta正在开发一款独立的AI应用，以与OpenAI和谷歌竞争。Microsoft Copilot提供了免费的无限语音和深入思考功能。Eleven Labs推出了Scribe，一个支持99多种语言的语音到文本模型。亚马逊否认了Anthropic的AI为其新的Alexa Plus功能提供支持的报道。Sesame AI的语音聊天机器人Maya因其逼真地模仿人类对话的能力而备受关注，但其事实回忆能力还有待提高。苹果宣布了5000亿美元的美国投资计划，其中包括在德克萨斯州建设一个AI服务器工厂，但其Siri的重大升级要等到2027年。

Deep Dive

Chapters

Anthropic released Claude 3.7 Sonnet, the first hybrid AI model combining transformer capabilities with advanced reasoning. It excels in coding and software engineering, scoring high on benchmark tests. While impressive, some users find its reasoning capabilities inconsistent, sometimes altering code unnecessarily.

Claude 3.7 Sonnet is the first publicly available hybrid AI model.
It combines transformer capabilities with advanced reasoning.
It scored a 70.3 on the suite bench coding benchmark.
It is particularly strong in encoding and front-end web development.
Some users are reportedly rolling back to 3.5 Sonnet for certain tasks.

Shownotes Transcript

Translations:

中文

This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Anthropic Claude dropped 3.7. Sonnet OpenAI responded days later with their much-needed

a weighted model, GBT 4.5, and we may be waiting many more years before we actually see a full AI from Apple. We might even see AGI before we see full Apple intelligence.

Yeah, it was one of those kinds of weeks in AI. It's like, wait, this all happened in one week? Yes, it did. And if you missed any of it, and that's just the tip of the iceberg, don't worry, we're going to be going over those stories and a whole lot more today on Everyday AI.

What's going on, y'all? My name is Jordan Wilson, and I'm the host, and this thing, it's for you. This is your daily live stream podcast and free daily newsletter, helping us all not just keep up with AI, but what it all actually means, right? Deciphering all this PR and all these news releases from all the biggest companies so we can actually use that information to grow our companies and our careers.

If that sounds like what you're trying to do, maybe this is the first time you're listening. Welcome. This is your new home. Your other new home is youreverydayai.com. There you can sign up for our free daily newsletter. So each day we recap exclusive insights that we bring you only on this very podcast, as well as we keep you up to date with everything else that you need to know in the world of AI to be the smartest person in AI at your company or your department.

Uh, right. So if that sounds like what you're trying to do, make sure you go sign up for that free daily newsletter at your everyday AI.com. All right. A quick reminder. Uh, we're going to be, Oh my gosh, it's like two weeks away. Uh, we're going to be broadcasting live with NVIDIA at their GTC conference, uh, starting, uh, March. What day are we actually going to start there? Probably March 17th, the Monday. So, uh, yeah.

And kind of for that week, at least the first couple of days early in the week, we're going to be partnering with NVIDIA to be bringing you a lot of exclusive insights, some great expert interviews, maybe breaking a little bit of news as well. So really excited for this year's GTC conference. We were lucky enough last year to partner with NVIDIA as well. So hey, let me know. Hit me up if you're going to be at the GTC conference in San Jose. Would love to say what's up. All right.

With that, let's get into what's happening in the world of AI for the week of March 3rd. Let's get after it, y'all. All right. So, Anthropic, yeah. That seems like...

It was a week, it was on Monday, right? Right after this show. Seems like it was a month ago that Anthropic released 3.7 Sonnet, but they did. So Anthropic has introduced Claude 3.7 Sonnet, the first publicly available hybrid AI model, which combines that traditional transformer capabilities with advanced reasoning.

So it does kind of merge these two AI paradigms. It combines the traditional transformer model with reasoning capabilities, allowing it to switch between rapid responses and deeper logical thinking. So extended thinking mode is a key feature. And right now the extended thinking is only for paid users and they can enable that mode where the model spends more time solving complex problems and

Also showing a summarized chain of thought for transparency. So free users can use the Claude 3.7 Sonnet model, but at least as of right now, they can't toggle on that extended thinking. So Claude 3.7 Sonnet is the

Very impressive when it comes to coding, software engineering, anything like that. So if that's something that you're in charge of at your company, you're probably going to want to check out 3.7 Sonnet. So it scored a very impressive 70.3 on the suite bench coding benchmark, outperforming competitors like OpenAI's 01 and 03 Mini. It is particularly...

strong, uh, Claude 3.7 Sonnet that is encoding and front end web developments, making it a great choice for software engineers. Uh, anthropic also released Claude code, which again, live stream audience. Let me know if you want, if we should dive into that. It's slightly more technical, but essentially like anthropic released, uh,

I'm not going to say it's a cursor competitor or a competitor to Bolt and Lovable and Windsurf and all these kind of AI IDEs, but it kind of is, right? Even though Cursor has said that, you know, hey, Anthropic Cloud is our default model. It does look like with Cloud Code, Anthropic is trying to get into this AI straight up development space, which I think is a smart move. So they did launch Cloud Code.

a command line tool that allows developers to interact with and update entire code bases directly from their computer's terminal.

So it integrates also with GitHub and supports debugging, signaling Anthropix push into the AI powered coding assistance space. So despite these advancements, Cloud stood firm on their API costs, which at the time seemed kind of silly until we got OpenAI's API pricing for their products.

latest model. So just wait on that one. So it is still the same price, $3 per million input tokens and $15 per million output tokens. But I mean, here's what I think with this latest model, right? So a lot of people are like, oh, it looks like Anthropix trying to get the best of OpenAI, right? With kind of a logic based model with a model that reasons. I'm being honest,

I was not very impressed with Anthropic Clawed 3.7's ability to reason. But again, I am a power, I am a heavy user of OpenAI's O3 Mini. I'd say that's my most used model. I use O1 Pro as well. So, you know, looking at Anthropic's first foray into the kind of

reasoning models, not super impressed by that. Also, a lot of people are reportedly rolling back to 3.5 Sonnet for certain tasks, especially when it comes to coding, because it seems like sometimes Claude uses that reasoning when it maybe shouldn't, and it takes things a little further and changes a bunch of things that you maybe didn't even want changed. So I am using 3.7 Sonnet every single day for certain use cases. I think it's great.

It's the first, right? You have to tip, like I keep saying this, you have to tip your cap to Anthropic because they are the first company with a hybrid model. And I do think that that is going to be the big kind of the future of large language models is going to be kind of combining this quote unquote old school transformer approach with the new reasoning models. So yeah, I'm curious to

You know, cause we're going to talk about GPT 4.5 at the end. You know, I'm curious for our live stream audience and, you know, Hey, let me know if you're listening on the podcast as well. What are your thoughts on these newest releases? Uh, right. So, uh, Sandra here joining us from YouTube says I haven't been able to tell the difference yet. Uh, I've, I've personally, I have been able to tell the difference with, uh, Sonnet three, seven, you know, when I'm testing it just for, you know, niche coding tasks, which is something I don't necessarily do a lot unless I'm testing models, uh,

It's great for that. Everything else that I might use Cloud Sonnet for on an everyday basis, I'd say it's about the same. I think the artifacts feature has actually improved for that reason, right? If you're trying to visualize data or something like that, I think it is better. But for non-coding, non-data visualization tasks, I don't know if we necessarily saw a huge leap with 3DVM.

3.7 sonnet. Again, that's in my testing. I'm using it maybe, uh, I don't know, 45 minutes to an hour every single day since it came out, uh, you know, for the show last week, I probably tested it for a good four or five hours. So, you know, I'm not using it, you know, five hours a day since it came out or anything like that. Uh, but I'm, I'm curious, um, what everyone's thoughts are.

Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.

Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,

or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI. All right.

Our next piece of AI news, Google is pushing hard for AGI and apparently they just need to work more. All right. So Google co-founder Sergey Brin has reportedly called for an increased productivity and more in-office presence for Google as the company intensifies its push to develop artificial general intelligence or AGI.

So Bryn believes AGI is within reach, according to reports, if employees just work a little bit harder. So in an internal memo seen by the New York Times, Bryn stated that Google has all the ingredients to win the AGI race, but needs to, quote unquote, turbocharge its efforts. He suggested that employees work at least 60 hours a week, calling that the sweet spot of productivity.

Also, return to office policies are being emphasized as brand recommended employees come into the office every weekday, exceeding Google's current three day a week on site policy. He argued that remote work and reduced hours are demoralizing to others.

So Google's AI teams are already working long hours. So according to CNBC, some staff working on Google Gemini's AI projects clocked up to 120 hour weeks to address critical flaws in their image recognition tool. Imagine putting in 120 hours a week.

Right. That's nuts. That's like, I don't know. I'm not great at math, but that's more than 15 hours a day. Imagine, right. 15 hours.

Can someone do a live calculator? Let's do 15 times seven. I should be able to do that in my brain right now, but I can't. Okay, so that's 105 hours a week. So that's even more, 120 hours. That's like 17 hours a week. No thanks. Or 17 hours a day working to debug this. But imagine working 15 plus hour days and then they say, ah, we'll achieve AGI if you just work a little harder.

not probably what most people want to hear. So high pressure work culture in AI development is raising concerns. So while the push for AGI could lead to groundbreaking advancements, the demanding workload, such as, as an example, the routine 12 hour days reported by employees at XAI developing Grok highlights the toll on the industry. So,

I don't know how I feel about this, right? Number one, I thought AI was supposed to allow employees to work less, right? And focus on higher quality work. So I don't know, part of this to me just isn't adding up, especially since Google, I think,

think should have been light years ahead of their closest competitors in the AI race, considering they essentially developed the GPT technology, but it was other companies that really ran away with it. And I would say that Google did not even catch up in the AI race probably until late 2024. So now it seems like they really want to win the AGI race and are really just pushing

employees to work more, work smarter, be more productive in the process. So I don't know, seems kind of ironic that we're supposed to all be benefiting from AI in large language models. And we're supposed to be, we're focusing on higher level creative and strategic tasks. And it's like, nope, just work double, just work triple, right?

A little wild. Yeah. Suraj from YouTube just said, Google finally woke up after missing out for a couple of years. Yeah. It's like, oh, we weren't really in the race for 2022 and 2023. So now we just got to work double or work triple to catch up. Yeah. So much for that four-hour work week. We are all promised the utopia of AI, at least not yet. All right.

All right, other big tech trying to play catch up. Meta is reportedly developing a standalone AI app to compete with OpenAI and Google. So a new report suggests that Meta is launching a dedicated app for its Meta AI assistant, potentially signaling a major shift in its AI strategy. So according to reports, Meta is working on a standalone app for its AI assistant, Meta AI. The app would mark a departure from Meta's current approach

of just integrating the AI services into social platforms like Facebook, Instagram, and WhatsApp. So the standalone app could help meta-reach users who avoid social media, like me, or use messaging services from competitors, addressing a gap in its current strategy. This move could potentially bring in millions of new users who were previously out of reach.

So Meta AI, their online service, which launched in 2023, offers features like question solving, image generation and answer suggestions. So while it has seen improvements, it still lacks advanced functionalities offered by competitors like OpenAI's ChatGPT and Google's Gemini.

So Meta CEO Mark Zuckerberg has ambitious plans for AI, stating earlier this year that Meta AI could become the leading personalized AI assistant, reaching over 1 billion people. And the standalone app does align with that vision. Also monetizing, right?

It's obviously a key part of Meta's reported new strategy because they could roll out paid plans and premium features that they might not as easily be able to roll out, you know, in something like Facebook or Instagram or WhatsApp, where a lot of their users may be using the Meta AI technology.

Fred says, I just won't use Meta, right? I use it. I use it online. It's actually a great online resource, right? We did a head-to-head on this show probably like three or four months ago, just how accurate certain large language models are when using the internet. So I think we ran down what we did, OpenAI. We did Google.

We did meta and we did co-pilot. And I was actually surprised by how well meta performed, right? Essentially, the biggest thing, the biggest takeaway from that one was how well it performed

just surveyed the internet and could return back an accurate answer using its Lama model connected to the internet. So I was, I was actually pretty, pretty impressive and yeah, their, their augmented reality is what they're definitely focused on. But yeah, I mean, they're definitely wanting to marry kind of those two different technologies, right? The wearable technology and the you know, just the standalone LLM that's outside of their social media network. So yeah,

All right. Speaking of big tech, yeah, it's just been a big tech kind of week, but pretty impressed here with Microsoft. So Microsoft Copilot is now rolling out some major updates, including free unlimited access to voice and their Think Deeper capabilities.

So that is the O1. Yes, the OpenAI O1 model. You can now use it for free in unlimited access. So if you just go to copilot.microsoft.com, you have to have an account, but you can use essentially their code

Microsoft co-pilots voice mode, which is not quite as good as OpenAI's advanced voice mode, although it uses the same technology, but you can use the O1 model. OpenAI's, what was a couple of months ago, their most powerful model. You can use it for free and unlimited. So huge move here from Microsoft that I think kind of got slept on.

So the voice feature allows users to interact with Copilot hands-free. Use cases include practicing a new language, preparing for a job interview with mock Q&A, or receiving step-by-step cooking advice. I've actually used Copilot for that exact reason.

I don't think it still helped me. And that wasn't co-pilot's fault. It's just, I just can't cook. You know, maybe, maybe I do need that, uh, figure O2 robot, uh, that just silently does work in your kitchen for you. Uh, but, uh, think deeper. I think it's, it's, it's really good. So, uh, we, we did a review of it when it first came out, I don't know, five months ago. So probably you're going to have to revisit that, but, um,

I think it's great. So anything that really requires some advanced reasoning, some logic, right? Maybe you're using, you know, ChatGPT or Claude or Gemini or something else, and you don't have a paid plan and you're like, wow, I'd love to be able to use a reasoning model. So right now you can use very limited O3 Mini for free on ChatGPT. But if you want to get your hands on O1, which is a very, very, very good capable model,

Now you can do it. So Copilot Pro users, yeah, which is myself. I'm on Copilot Pro. I paid $20 a month for that. And I was like, wait, what are we getting? You know, what are we getting now for $20? But, you know, I do appreciate that Microsoft at least emailed Copilot Pro users and they're like, yo, we're making this free version really, really good. If you want to cancel, here's the link. Morgan.

more big tech companies should do that, right? Like if they make something free and all of a sudden the paid plan isn't that good, they should be like, Hey, it's fine. If you cancel, here you go. However, here's what the Copilot Pro kind of account still has. So, you know, think of this,

differently. I'm still a Mac user. Yes, I have a Windows Copilot Plus PC. I still got to use, I still got to set it up. I've been so busy. I'm actually super excited to do that. But maybe your company uses Microsoft 365 Copilot. For the most part, then this is not going to impact you, unless you're using it not logged in outside of your company's biz chat. That's

Maybe, right? So in that case, sure, you can use this that way. But this is for, I think, is going to appeal to a lot of people who are Mac users and who maybe don't have that Microsoft 365 Copilot integration. But Copilot Pro users, so those that still pay $20 a month, can still continue to enjoy and use Copilot products.

across the different Microsoft 365 apps like Word, Excel, PowerPoint, et cetera. So yeah, even though as an example, I'm using a Mac right now, I still have Microsoft Word on my computer and I can use Microsoft Copilot via Copilot Pro in Microsoft Word, in Excel, in PowerPoint, et cetera. So pro users aren't like

SOL, right? They're not just out of luck. They still have some Copilot capabilities that normal free users do not have. But y'all, if you do not use Copilot yet, I would just go right away and try out the unlimited voice and the think deeper features. They're actually super impressive. Yeah. I like what Graham here from LinkedIn is saying.

Co-pilot is the gateway to AI for most people now. Yeah. So a couple of weeks ago, I would probably say, nah, probably chat GPT free. But now I'll say those are like 1A and 1B. I still think the free version of chat GPT is probably a little bit better than this free version of co-pilot. But now they're at least hand in hand, especially if your company is a Microsoft organization. I think it's pretty big here. So, all right.

Next piece of AI news. One of the biggest names

in AI on the text-to-speech side is changing up their biz model a little bit. So Eleven Labs, they're an AI startup company that was recently valued at $3.3 billion, has introduced its first standalone speech-to-text model called Scribe following a $180 million funding round. So yeah, you've maybe heard of

11 Labs as the text to speech, but now they're flipping it and going speech to text. So yeah, even if you listen to this show on the podcast, there's a little intro, right? It's this AI intro, but that's 11 Labs, right? That was like the very first version of 11 Labs, by the way, which was, I thought, pretty impressive for two and a half years ago.

So the Scribe model supports over 99 languages and boasts exceptional accuracy for more than 25 of them, including English, French, German, Hindi, Japanese, and Spanish. So the company claims an impressive 97% accuracy rate for English with a word error rate below 5% for its top performing languages. So according to 11 Labs,

Scribe outperformed competitors like Google Gemini 2.0 Flash and OpenAI's Whisper Large V3, that is OpenAI's kind of speech-to-text model, in benchmark text such as fluors and common voice, setting it apart in speech detection market.

So one cool feature that I liked, hopefully I pronounced this right, but it includes advanced features like smart speaker diarization, which is essentially just automatically identifying the speaker that is speaking, which like for someone that does a podcast and usually I have guests on, that's huge. That's why I can't use like personally, I can't use something like Whisper or Google Gemini 2.0, right? Because I normally have a guest.

So I actually use a tool called Cast Magic that has that built in. So I'm going to definitely be checking out this new offering from 11 Labs, which can auto-identify speakers. I think that's huge. It also has word-level timestamps for precise subtitles and auto-tagging of sound events like laughter.

That's kind of cool, kind of creepy as well, but kind of useful. So right now, Scribe only works with pre-recorded audio, but will soon offer low latency real time options. So that's pretty cool. When that comes out, I'm going to have to hit up 11 labs and always have that going live when I have a guest on the show. That would be super helpful for me.

Uh, you know, I'm curious, does anyone out here, like, do you all use tools like 11 labs? Like I said, I think it's been great, uh, for text to speech. Uh, it was one of the leaders, uh, in that, uh, field. I think it still is for when people just needed voiceovers, um, you know, audio books, you know, I think people overused it maybe early on and didn't put enough care into it, but it's actually a very, very good platform. Um,

So George here is saying 11 Labs, Jibberlink allows AI to talk to AI. Yeah, in a faster than human language. I saw that. That was a pretty cool demo. Essentially, you know, two AI agents were talking to each other. They identified that they were both AI agents. I believe this was an open source project. And then they just used their own Jibberlink technology.

kind of to talk to each other. Sounded like two fax machines, you know, talking to each other. It was pretty cool. Yeah, Samuel here says, I pay the $5 a month to 11 labs just to be able to listen to my docs. Huge point, Samuel. I do that as well. That's one of the things I actually use 11 labs for the most. I pay for it. Sometimes I have a big block of text

And I don't want to go into, as an example, OpenAI's backend in their playground because you can only do a certain amount of text at once. So yeah, I often will do that if, you know,

Well, many times I'll just throw it into notebook LM and get more of a summary or more of a conversation around it. But if I actually need to read something, you know, point by point, and I'm super busy, a lot of times I'll just grab that, you know, a couple thousand words, throw it into 11 labs, you know, crank the output up to two X. Cause you know, that's why I speak at so quickly. Cause I listened to this stuff so quickly, I guess. But yeah, a great, great use case I would say for 11 labs.

Look at this. Here we are, you know, just rounding out the big tech lineup for this week. So Amazon is denying reports that Anthropix AI is powering their new Alexa plus features. So yeah, if,

you pay attention to this show, we cover this. So Amazon finally announced their smarter Alexa, right? Powered by large language models. And earlier reports was it was powered by Anthropic Clawed's AI model, but apparently not because Amazon, well, at least not entirely. So Amazon has publicly refuted claims that it's recently announced Alexa Plus capabilities are powered by Anthropic's Clawed AI models.

And this is sparking a lot of discussion now online.

So Amazon insists that it's in-house models, uh, which is called Nova, uh, powers the majority of Alexa plus conversations. So in response to a CNBC report claiming Anthropics quad model was handling most customer interactions. Amazon stated that Nova has managed over 70% of conversations, including complex requests in the past month. So, uh,

Yeah, this is just starting to roll out over the next like week or two to paid Amazon users. So maybe that was just in testing. They also didn't really say like, hey, what's the other 30%? So I'm assuming maybe the other 30% is clawed. We'll have to see if this report just came out. But Anthropic obviously is a key investor in Amazon.

Amazon is a key investor in Anthropic and Amazon, the company maintains that its own proprietary AI, Nova, is responsible for some of these advanced features. So the upgraded Alexa Plus boasts generative AI capabilities and dubbed Alexa Plus, the new iteration is designed to be more conversational and capable, handling tasks like grocery shopping, booking services, sending texts and browsing websites. Yeah.

It was funny in the Alexa Plus kind of demo. I'm like, why are all these demos just like you buying more things from Amazon, right? Why can't you just show me an instance where Alexa just isn't dumb, right? It's like so much of the demo is just like, oh, you're buying more stuff from Amazon. Like, I don't know.

Like literally I'll ask for like, I don't know the weather or hours at a store. And then Alexa old school dumb Alexa is still just being like, do you want me to add this to your cart? And I'm like, I asked about the weather. Uh, so I don't know. Uh,

I'm not super pumped about this, but it's got to be better than what we currently have. And this was supposed to come out many, many months ago, but the launch has faced delays and early challenges as Alexa Plus was delayed due to issues with hallucinations and incorrect answers during testing. However, Amazon CEO Andy Jassy highlighted the transformative impact of

of Gen AI on making such advancements possible. So yeah, Michelle says Alexa and Siri are both useless. Don't worry, Michelle. It looks like Siri might be useless until 2027. So more on that here in a couple of minutes. All right. Speaking of voice assistants,

A new one is taking over the web. So yeah, kind of all weekend and even early today when I was sleuthing online to bring you all the latest news. Sesame, the AI chatbot, and their chatbot's name is Maya, is really grabbing a lot of headlines for its ability to mimic human conversation with uncanny realism.

Uh, so Sesame's Maya aims to cross the uncanny Valley of conversational AI. Uh, so the company showcased Maya in a demo emphasizing its ability to replicate human speech and interaction, making it feel more like talking to a real person than a chat bot. So yeah, a lot of people that like, I actually follow in respect online, uh,

We're losing their noodles over Sesame and Maya over the weekend. So Maya impressed a lot of users with its conversational flow and realism. So during test conversations, which you can go right now, you don't even have to have an account. It shows it does. How am I going to say this? If you're not a heavy conversational user, you might really be impressed with this new Sesame voice assistant. All right.

It is more neural. It responds with very low latency. The voice does sound more realistic and more human. For me, it sure will be crazy.

I absolutely hated it. I probably won't be using it. Sorry, Sesame. You're not going to be sponsoring the everyday AI show anytime soon. Is it great? Yes. Does it have a very high ceiling? Sure. Right. I don't know. For me, one thing I noticed in testing this new Sesame AI voice model, you know, and I'm curious, live stream audience, did any of you guys use this over the weekend or, you know, early today? It, it,

Doesn't seem to do a good job of actually answering your question. So I think a lot of people are fooled by this low latency claim that a lot of companies, AI voice companies are putting out there because usually the initial response to a question that you ask it is just kind of a delay tactic.

Right. Or it just says, I mean, kind of like a human would write like they kind of laugh about your question or they're like, oh, that's a good one. Right. So is it actually low latency? I mean, yet like yes and no. Right. I think they achieve that like immediate human to, you know, AI conversational rate.

like where it can respond to you almost immediately because it just responds back with some useless, needless, unrelated, right? It's just this little quip that buys itself some time to then answer your question. Also, at least for me, kind of the default of this Sesame I found extremely frustrating because at least for me, when I talk to an AI voice assistant, I don't want fluff.

I don't. And that probably puts me in the minority of people, right? Maybe people want, you know, this, I don't know, unrelated quips and stories. No, I want facts. I want stats. I want fast. I don't want any, I don't want any gibberish, right? So if you're like me and maybe would prefer talking to, you know,

robots sometimes versus humans, right? Like, yo, I just want facts, stats, and I want it quick, right? So at least for me, Sesame wasn't really appealing, probably not something I'll be using a lot. And it did struggle, it seemed like, with just fact recalls.

right? Something simple. One thing I always do is just be like, tell me about the Everyday AI podcast, right? And it didn't, right? And it should be in the training data, right? Because we've had hundreds of episodes dating back to 2022, right? Is that right? And we've been doing it for this long? No, 2023. So it did struggle with just fact recall and some other things that I tried, but it is free. Go try it out for yourself. Let me know what you all thought

So Sam, Samuel says Maya's EQ is next level. That's true. So if you are more on the emotional, you know, if you are looking to get some EQ benefits out of an AI model versus just IQ, I've talked about this. I prefer IQ. The EQ is super nice. You know, it is pretty good. George says it feels, it is very touchy feely, but the voice is really good. Yeah, I'd agree. I would agree with those, with those.

with those observations. Uh, yeah. Nisiani Nisiani knows that's a former journalist in me. Yeah. Anytime someone puts something out, I'm like, eh, I don't know about this. Let me go test it. Uh, and I'll tell you guys how it is, but, uh, I still think it's pretty impressive. You can go, go check it out. Right. All right. Our last couple of pieces of AI news. So Apple has announced a $500 billion, uh, investment in the U S including a server factory in Texas. Uh,

So according to reports, Apple has announced a massive $500 billion investment in the U.S. over the next four years, which will include a new AI server factory in Texas and reportedly the creation of 20,000 research and development jobs nationwide, according to Reuters. So the investment will span a variety of areas, including purchases from U.S. suppliers, manufacturing expansions, and content production for Apple TV.

So Apple will reportedly partner with Foxconn to develop a 250,000 square foot facility in Houston that will assemble servers for its AI powered services. These servers are currently made outside of the US marking a shift toward domestic production.

The company also plans to double its advanced manufacturing fund from $5 billion to $10 billion, with a significant portion allocated to producing advanced silicon at the Taiwan Semiconductor Manufacturing Facility in Arizona.

So most of Apple's products are assembled overseas, but many components such as chips from Broadcom and Skyworks Solutions are made in the U.S. So as part of this investment, Apple will launch a manufacturing academy in Michigan to offer free courses in project management and manufacturing process optimization to small and mid-sized companies. Hey!

More news on AI voice assistants that apparently aren't going to be super smart anytime soon. Well, at least not series, right? So we just heard that Alexa is getting smarter and that's going to be rolling out to paid Amazon subscribers here in the coming weeks. But you might have to wait until 2027 to get that fully smart Siri from Apple and Apple intelligence. Yes, sir.

I did not get that wrong. New reports.

are suggesting from reports from Bloomberg, always on the spot that Apple's long-awaited overhaul of Siri, described as a modernized conversational version, is now reportedly delayed until 2027. So this, yeah, we might have someone on the live stream this morning, I think Michael said we might actually have AGI before we actually have

a smart Siri. So the upgraded Siri is expected to debut with iOS 20, combining a generative AI approach with the Assistant's classic features for a more advanced seamless experience. I think by classic features, it just means an Assistant that's not super useful.

So while Apple is planning to release a limited LLM powered version of Siri with iOS 18.5, so that would be pretty soon, it will reportedly run as a separate model and fall short of the significant improvements users are anticipating and Apple is marketing. Okay.

So Bloomberg reports noted that the real upgrade to Siri will begin to take shape in 19.4, but won't reach full maturity until iOS 20. Wow. The enhanced Siri is expected to feature contextual understanding and improved autonomy, potentially rivaling advanced AI assistance currently dominating the market. But that's AI assistance of today, right? Yeah.

I don't, I fully don't understand how Apple and their Apple intelligence has so severely fumbled the bag. That's, I don't know, what's the euphemism for like fumbling the bag, but 10 times worse?

Apple had all the money, all the resources. They know where this technology is headed. They're partnered with OpenAI and ChatGPT. There's a ChatGPT Siri integration. So they have to be getting a good amount of data from this partnership. Yet 2027, right? All right, I get it. So one part of me is like, all right, it's better to...

under promise and over deliver, right? Then a lot of companies are like, oh, we're releasing the world's best model tomorrow. And then it takes like three years. So I get it. I just do not, I cannot fathom how Apple is so, so behind, at least when it comes to bringing all of these things to market. Yes, Apple, I think is one of the leaders in putting privacy and security first, right? But

I don't know, like at what costs, right? If there were actual smartphones on other companies that were just as intuitive as Apple, right?

Samsung has great phones, right? There's a lot of great phones outside of Apple, but I don't know. Maybe it's just because the Apple interface is so stupid easy. When I pick up a Samsung phone or whatever, if I'm at Best Buy and I'm just scrolling through, I'm like, I don't even know how to use this thing, right? I don't know. Maybe Apple does that on purpose. Maybe they make...

All of their iPhone users, they just make it so easy that it seems impossible to pick up and use a non-Apple device. Maybe that's what we're looking at here. All right. Our last piece of AI news. OpenAI has launched its newest model, GPT 4.5.

So this is the company's latest and largest AI language model, offering improved writing skills, better world knowledge, and a more refined conversational experience. So GPT-4.5 is Apple's largest model, yet describing it as the most knowledgeable model to date.

It is now available as a research preview for ChatGPT Pro users. All right. So with a broader access rolling out in the coming weeks, I do assume it'll probably be by about mid-March that ChatGPT Plus users will have access to this new model, GPT 4.5. I do know that other...

third-party providers such as Perplexity, Poe, and others. So if you have a paid subscription to something like a Perplexity, Poe, u.com, et cetera, you can probably start using this 4.5 model in very limited capacity right now. And you don't have to wait for OpenAI to roll it out to other tiers. But they did say that it will be out to most paid tiers in the coming weeks.

But right now, if you are using, you know, chatgpt.com, right? So if you're using chatgpt on the front end, you are only going to have access to 4.5 if you are on the $200 a month pro plan.

So, uh, speaking of EQ earlier, that's where this model shines. And yeah, I said, ah, I don't really need it, but you know, in my use, in my use so far of GPT 4.5, I do see, uh, the benefits of having a model that just feels more natural, uh, more intuitive, um, and just more human, right? Cause that's the thing. OpenAI straight up said, this is not a frontier model. Uh,

which was kind of surprising. And they did say, hey, the big thing, I broke it down in two words. I would have liked Apple or sorry, OpenAI to break it down this way. So you can go listen. I cover this in episode 472, which I believe was on Friday. I said what they're trying to do is make it

More reliable and more relatable. So here's what that means. On the reliability side, OpenAI shared some benchmarks and metrics that showed the hallucination rate is going down and essentially its ability, its knowledge rate is going up.

So it is much more reliable than past models like GPT-4.0 or even their reasoning models, the 03, 01, 01 Pro, et cetera. So it is more reliable, which is huge, right? That's one of the main reasons that I think a lot of companies and individuals don't even jump in to these models to begin with because they feel they can't trust them. So it's not hallucination-free, right? But it scored much higher

higher on some of OpenAI's benchmarks in terms of just getting things right and hallucinations are drastically down. So that's number one. And then number two, it's more relatable, right? Sometimes when you're speaking to chat GPT, right, either verbally or just typing, right? Let's just say typing because right now the voice mode is still powered by GPT 4.0, not by the new 4.5, but it does feel more human. And so here's the thing.

If you're, if you're the type of person that likes to use a chat GPT as, as a friend, as a, as a life coach, you know, as a therapist, something like that, this is a no brainer, right? Especially when it rolls out to the $20 a month chat GPT plus plan, you're going to love it.

for everyone else, but where I've actually have started to see some value in having this, you know, equally smart EQ, uh, large language model is as a business strategist, right? That's something I use large language models for a lot. Um, and you know, I've, I've noticed that open AI's, uh, latest model in GPT 4.5 does a much better job sometimes of picking up on nuances of what I'm

trying to say, but maybe might not be saying, right? Sometimes I might just be giving ChatGPT just a ton of data and like asking it for suggestions, right? Asking it for strategies. One thing you should always be using for, I don't care what model,

using. You should always be using models to second guess yourself, right? To fight back on a decision that you're making, because if you do that, I think your decision, either you are going to have to defend it and make it even better, or you're going to be thinking about things that you maybe weren't thinking about before. In that case, GPT 4.5 runs laps around everyone else. So in certain use cases, I think it's fantastic. Traditional benchmarks, this thing was a meh, right?

Literally, there's, I mean, yes, the benchmarks across the board improved from their GPT-4-0 models, but this thing did not bench off the charts, which I think a lot of people were expecting, right? But here's the other thing with this model.

This is a foundation for future models, right? So in the same way that Anthropic is going to this hybrid approach, right? They're essentially combining transformer models with reasoning models. OpenAI has said that is their future as well. So when we get, quote unquote, when we get GPT-5, it's going to be a hybrid model like Cloud 3.7 Sonnet is right now. And I think that's the thing that people are overlooking

This is not opening. I said this. They're like, this is not a frontier model. This is not supposed to be benchmarking off the charts. It is a new, fresh model that understands humans, which I think is huge because I think the future reasoning models, even though they may not get a name, right? Essentially, open AI said, yeah, in the future, they're just all going to be one model. But the reasoning models, right?

are going to be exponentially better in future versions of OpenAI's offerings because of this stronger and much more capable GPT 4.5 model. That's how these models are built, right? The O series models were built on 4.0, right? So now when you think, when you have a much more...

human 4.5 model, imagine what that means for the future of these reasoning models or a hybrid model. So I think it's going to be extremely impressive. So like I said, following the launch for pro users, GPT 4.5 will be expanding according to OpenAI and kind of their release announcement. It will be going to plus in team users in the coming week.

So that could be as soon as this week. I'm guessing it might be next week. Essentially, OpenAI said, yo, we're out of GPUs. We can't serve this thing, which I thought was interesting, right? And also the API pricing.

On this is wild. It's wild, right? You know, we were kind of, you know, complaining or, you know, rolling our eyes that Claude 3.7 Sonnet didn't reduce their prices, right? But the price for GPT 4.5 via the API is crazy.

astronomically high, right? $75 per million input, and then 150 per million output. So that's wild. So compared to GPT-4.0, all right, so we're going GPT-4.5, $75 per million input, GPT-4.0, $2.50.

Right. That's wild. And then on the output side, GPT 4.5, $150. All right. GPT 4.0, 10. So 15X more expensive on the outputs. On the input, I think that's what, like 25X or 30X more expensive. Yeah, 30X more expensive.

So the API prices are out of this world. So I'm guessing maybe OpenAI may reduce that once they're able. They said that they were trying to secure more GPUs. I'm sure the costs are going to go down eventually, but maybe they're saying, hey, right now there's people that are really going to, there's companies and customers out there, I'm sure that are going to find value in that relatability and reliability of that new model.

So, wow, it is mind-bogglingly expensive. All right, let's quickly, very quickly, reclap those top stories for the week. So first, Anthropic has released Claude 3.7 Sonnet, the world's first AI hybrid model. Google co-founder Sergey Brin is pushing for harder work for Google to win the race to AGI, reportedly asking employees to work harder

Up to 60 hour weeks, maybe more. Meta is reportedly developing a standalone AI app to compete with open AI and Google. Oh, funny, by the way, Sam Altman responded to that on Twitter and said, maybe we'll make a social media app. Microsoft Copilot, crazy.

Crushing it, you know, offering free users, free unlimited voice in advanced Think Deeper features. That is using OpenAI's O1 model. Eleven Labs has launched Scribe, a standalone speech-to-text model that supports more than 99 languages. Then Amazon is denying reports that Anthropix AI is powering its new Alexa Plus features and saying that it's their own internal models.

Next news story, the internet is going berserk over the new Sesame AI chatbot that offers voice capabilities. Me, it's okay, but go check it out for yourself. It's free to go ahead and try. Apple has announced a $500 billion US investment plan, including a AI server factory in Texas that will reportedly create 20,000 jobs there.

A Bloomberg report shows that we might not get Apple's truly modern Siri until 2027. And then last but not least, OpenAI has launched GPT 4.5, a model that really emphasizes relatability and reliability.

Woo. That was a lot of AI news, y'all. I hope this is helpful. If so, please share this, right? I know some of you everyday AI, it's like your secret. It's your cheat code. It's how you're the smartest person in AI at your company.

Please share the love, right? People are always like, hey, Jordan, how can I help? Click that repost button. That helps, right? If you're listening on the podcast, please follow the podcast. Leave us a rating. Tell someone about it, right? You can send individual episodes. Please share this as much as you can. I know AI can be tricky. It's hard to keep up with. It can be scary. I spend and our team spends 10% of our time

countless hours trying to keep you up to date so you can grow your company and grow your career confidently. Speaking of that, if you haven't already, make sure you go to youreverydayai.com. Go listen to our 2025 AI Predictions and Roadmap Series. That's episodes 443 to 447. They are amazing.

I'm telling you, bangers, bangers. All right. So thank you for tuning in. Please go subscribe to our newsletter at youreverydayai.com. Thanks. We'll see you back tomorrow and every day for more Everyday AI. Thanks, y'all.

And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

EP 473: Claude 3.7 drops, OpenAI releases GPT-4.5 and more AI News that Matters 53:07 Share

Everyday AI Podcast – An AI and ChatGPT Podcast

Deep Dive

Shownotes Transcript

We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

EP 473: Claude 3.7 drops, OpenAI releases GPT-4.5 and more AI News that Matters