We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode AI Daily News Rundown April 18 2025: ⚡️Google Launches Gemini 2.5 Flash with 'Thinking Budget'  🧬Profluent Discovers Scaling Laws for Protein-Design AI 📍Viral ChatGPT Trend: Reverse Location Searching Photos 🤖Meta’s FAIR New AI

AI Daily News Rundown April 18 2025: ⚡️Google Launches Gemini 2.5 Flash with 'Thinking Budget' 🧬Profluent Discovers Scaling Laws for Protein-Design AI 📍Viral ChatGPT Trend: Reverse Location Searching Photos 🤖Meta’s FAIR New AI

2025/4/19
logo of podcast AI Unraveled: Latest AI News & Trends, GPT, ChatGPT, Gemini, Generative AI, LLMs, Prompting

AI Unraveled: Latest AI News & Trends, GPT, ChatGPT, Gemini, Generative AI, LLMs, Prompting

AI Deep Dive AI Chapters Transcript
People
E
Etienne Newman
Topics
我讨论了 Google Gemini 2.5 Flash 推出的“思考预算”功能,该功能允许开发者控制 AI 的处理能力以平衡质量、成本和速度。 我还谈到了人们使用 ChatGPT 根据照片进行反向地理位置搜索的趋势,这引发了人们对隐私的担忧。 此外,我还提到了 Meta 正在寻求亚马逊和微软为其 Llama 模型提供资金,这突显了大型 AI 模型的巨额成本。 我还讨论了其他 AI 新闻,包括 ProFluent 在 AI 蛋白质设计中发现的规模法则、Google Sheets 中集成的 AI 公式、Meta FAIR 在感知 AI 方面的进展、OpenAI 发布的 O3 和 O4-mini 模型、Perplexity AI 与摩托罗拉的合作、OpenAI 考虑收购 Windsurf、Meta 在其 iOS 应用中禁用 Apple 的 AI 功能、Microsoft Copilot Studio 的“计算机使用”功能、在本地计算机上运行 AI 模型的趋势、Anthropic 的 Claude 的新研究功能、维基百科发布的数据集、Cursor 的 AI 支持代理编造公司政策的案例、Google 向大学生免费提供 AI Premium 订阅以及 MIT 开发的技术,该技术可以提高大型语言模型生成代码的语法准确性。

Deep Dive

Chapters
Google launched Gemini 2.5 Flash, featuring a "thinking budget" that controls processing power for tasks. A larger budget allows for more complex reasoning but increases cost and time, while a smaller budget prioritizes speed and cost-effectiveness. It's available via the Gemini API and is being tested in the main Gemini app.
  • Introduction of Gemini 2.5 Flash with a 'thinking budget' feature.
  • Thinking budget controls AI processing power, balancing quality, cost, and speed.
  • Available via Gemini API and being tested in the main Gemini app.
  • Significant reasoning boosts compared to Gemini 2.0.
  • Max thinking budget is 24,000 tokens (several thousand words).

Shownotes Transcript

Translations:
中文

Welcome to a new deep dive from AI Unraveled. This is the podcast created and produced by Etienne Newman. He's a senior software engineer and a passionate soccer dad up in Canada. Glad

Glad to be here. And hey, if you're getting value from these deep dives, keeping up with AI with us, please do take a second to like and subscribe on Apple Podcasts. It really helps us out. It does indeed. So today is Friday, April 18th, 2025, and we are diving deep again. Got a whole batch of AI news, research announcements. Yeah, quite a mix today. Our mission, as always, is to pull out the most important, maybe the most surprising bits for you, our listener.

help you get what matters without drowning in information. And there's a lot to potentially drown in these days. Yeah. The pace is just relentless. It really is. We've sifted through quite a bit. And yeah, the speed of innovation, it's striking. Okay, let's get into it. Sounds good. It feels like progress on, well, multiple fronts all at once. Foundational models, apps users actually see, it's all starting to connect. Yeah, exactly. Let's start with Google. They've got Gemini 2.5 Flash.

And this introduces something they call a thinking budget. Sounds intriguing. Can you break that down? Sure. The thinking budget. Think of it like setting limits on the AI's processing power for a specific task. Developers can basically decide how much thinking the model should do. Okay. So more thinking means better answers, potentially. Potentially, yes. Yeah. A bigger budget allows for more complex reasoning, maybe higher quality results, but...

The tradeoff is it might take longer, cost more. And a smaller budget. Quicker, cheaper. Good for maybe simpler requests where you don't need super deep analysis. It's all about balancing quality, cost, and how fast you need the answer. Right, tailoring it to the job. And this 2.5 flash is meant to be a step up from the 2.0 version in reasoning, I gather. Yes, the reports suggest...

Pretty significant reasoning boosts compared to its predecessor. And what's interesting is it's apparently performing well on tough benchmarks, reasoning, STEM, visual stuff, while being cheaper than some rivals. Impressive. They mentioned a max thinking budget.

it. 24,000 tokens. What does that translate to practically? Well, 24K tokens, that sets an upper limit on how much text or code the model can chew on for its reasoning in one go. It's roughly several thousand words. Gives developers pretty fine grain control. Okay. And people can already use this? Yep. It's available via the Gemini API that's through Google AI Studio and Vertex AI. And they're even testing it as an experimental option in the main Gemini app itself.

So the big picture here is Google enabling more efficiency, more adaptability by letting devs tweak the AI's reasoning effort. Exactly. More control. Okay, let's switch gears. Something buzzing online, people using ChatGPT to figure out photo locations.

Even without the location data. Ah, yes, that trend. It's quite something. People are using OpenAI's newer models, like the O3 model specifically, to analyze pictures, looking for visual clues, building signs, whatever. Even if the photo's metadata, the location info, is stripped out. And the AI figures it out how? It combines those visual cues with web searches. So it sees a landmark, searches for it, cross-references, and can often pinpoint not just the area, but like,

Specific restaurants or shops. Pretty accurately sometimes. Wow. That's clever, but also a bit unsettling. Privacy bells ringing. Definitely. Yeah. Your point about privacy is crucial here. The fact that AI can potentially figure out where you were from a seemingly anonymous photo, well, it raises huge concerns. Like doxing potential. Precisely.

Malicious use is a real risk. Exposing someone's location publicly without consent, it really highlights the need for serious talks about ethical use and safeguards. Yeah, the tech is moving so fast, the ethics discussions need to keep pace. Okay, moving on. There's talk Meta is looking for funding for Llama from Amazon and Microsoft. That's the report, yes.

It points to just how incredibly expensive it is to build and run these huge AI models like their planned Lama 4 behemoth. We're talking massive computing power, right? And specialized engineers. Absolutely. The costs are astronomical.

So meta approaching rivals like Amazon and Microsoft, it could be a strategic play to share that financial weight. Especially since they're pushing Lama into everything on their own platforms, Facebook, Instagram. Exactly. And integrating it everywhere also brings costs related to safety tuning, making sure it's not biased, dealing with data controversies. It all adds up. So external funding helps manage that while they keep developing. That seems to be the logic.

Makes you wonder if we'll see more big tech partnerships just to afford cutting edge AI development. It feels increasingly likely as these models get bigger and hungrier for resources. Yeah, sharing the load through partnerships might become more common. Interesting times. Okay, let's shift to biotech. ProFluent.

They've found scaling laws in AI protein design. What does that mean? Right. Scaling laws in protein design. It sounds technical, but the core idea is similar to what we've seen elsewhere in AI. Meaning bigger is better. Essentially, yes.

Profluent found that if you use larger AI models and train them on more and more protein data, the results in designing complex proteins like antibodies or gene editors get predictably better. And they've built a big model to prove this. A very big one. 46 billion parameters trained on 3.4 billion protein sequences. That data set is huge, way bigger than previous ones used for this. And has it worked? Have they designed useful things? Apparently, yes.

They've managed to design new antibodies that work about as well as some existing drugs. But crucially, they're different enough structurally to potentially avoid patent problems. Okay, that's significant. Anything else? They also created gene-editing proteins that are smaller than CRISPR-Cas9, the famous one. Smaller size could mean easier ways to deliver these tools into cells for therapies. Wow, that could be huge for gene therapy. Are they keeping this tech locked down? Actually, no.

They're being quite open. They're releasing 20 open antibodies through licensing deals, some royalty free, some with upfront fees aimed at diseases affecting millions. That's a great initiative. It really is. It aims to get these potential tools out there faster. So a takeaway here is AI is really starting to accelerate drug discovery, synthetic biology, potentially transformative stuff. Absolutely. A potential revolution in the making. OK, let's bring it back to something maybe more immediately usable for many people.

AI in Google Sheets. They've added an AI formula. They have. It's under the Help Me Organize banner, but the core is a new function, AI. You can put that in a cell, give it a prompt. Like what kind of prompt? Anything, really. Summarize this text, extract email addresses from this column, write a thank you note based on this customer feedback. You can also point it to other cells for context.

So AI, summarize this, A1.A10 kind of thing. Exactly. And it's designed to be pretty easy. You type the formula, give your instructions, and it generates the output right there in the cell. And you can like drag it down to apply to multiple rows? Yep. Just like a regular Sheets formula. Drag the fill handle and it processes in batches.

Huge time saver potential there. Nice. Can you combine it with other functions like IF statements? You can. You can embed the AI function within standard Sheets functions like I concatenate whatever you need for more complex tasks. And there's a refresh and insert option if your source data changes. Okay, that makes AI much more practical for spreadsheet tasks.

Saving time, maybe improving accuracy. That's the goal, making it part of the everyday workflow. Now, staying with the big players, Meta's research arm, FAIR, has been busy with perception AI. What's new there? MetaFAIR is pushing the boundaries of how well AI can understand the visual world. Their new perception encoder is apparently setting new records, state-of-the-art performance. In what kinds of tasks? Things like spotting camouflaged animals, which is really tricky.

or tracking complex movements in videos really nuanced visual understanding. Impressive. And they released models too, a perception language model. Yes, they've open sourced the Meta Perception Language Model, PLM, and a benchmark called PLM VideoBench, specifically for testing video understanding capabilities. And something about 3D understanding, Locate3D. Right. That's focused on precise object understanding in three dimensions.

They've also released a big dataset with 130,000 spatial language annotations, descriptions of where things are in space,

to help train models for that. So really digging into making AI understand space and movement better and AI collaboration. Correct. They developed a framework called the Collaborative Reasoner. They found that having multiple AI systems work together on a problem yields significantly better results than one AI working alone. Interesting. Like AI teamwork. Kind of, yeah. All this suggests, well, better AI for things like robotics, augmented reality.

anywhere understanding the real world is key. Definitely paving the way for more capable applications in those areas. Okay, back to OpenAI. Two new models, O3 and a 4 Mini. What's the difference? So O3 is positioned as their top tier reasoning model right now. It's designed to do more thinking, more complex analysis before giving an answer. The heavyweight into 4 Mini. That one's smaller, faster, more efficient. It's about balancing cost, speed, and competence.

Good for a wider range of tasks where you don't necessarily need the absolute maximum reasoning power. And a key feature for both is understanding images now, like sketches. Yes, that's a big step. Both O3 and O4 Mini can apparently take visual inputs, sketches, whiteboard photos,

and incorporate that into their reasoning, multimodal understanding. And they still have access to things like web browsing, coding. Yep. The full suite of ChatGPT tools is available to them, browsing, running Python code, generating images.

Makes it pretty versatile. Who gets to use these now? They're rolling out to specific subscriber tiers and also available through the developer APIs. So O3 for the highest performance needs a 4 Mini as a more practical, efficient option. So OpenAI is pushing both reasoning power and this multimodal seeing ability.

Okay, let's talk about AI on our phones. Perplexity AI making moves. That's right. Perplexity AI, which positions itself as a kind of AI-powered search engine or answer engine, has struck a deal with Motorola. A deal for what? To get the Perplexity AI Assistant pre-installed on upcoming Motorola smartphones.

And they're reportedly also talking to Samsung about a similar integration. Pre-installed, so competing directly with Google Gemini on Android phones. That seems to be the strategy. Positioning perplexity as a built-in alternative.

The Motorola deal sounds closer to being finalized than the Samsung talks, based on reports. Interesting. So we might soon have more choices for the default AI assistant on our phones. Could be. More competition in the mobile AI space, which could lead to, well, better options for users eventually. Yeah, competition usually drives innovation. Now, speaking of competition and big moves, OpenAI potentially buying Windsurf for

For $3 billion. Yeah, that's a huge reported figure. Windsurf, which used to be called Codium, is known for its AI coding assistant. Right.

Right, the coding helper tools. Exactly. WinServ's tool works in lots of different coding environments, and they put a big emphasis on enterprise data privacy. Apparently, they have around $40 million in annual revenue. So why would OpenAI want them? Just to get into coding assistance more? It would significantly boost their capabilities in that area, for sure. It gives them a mature, respected product and a stronger footing against Microsoft's Copilot and Google's offerings for developers.

A major strategic play in the AI for developers market then? If the deal goes through, absolutely. Okay. Another platform battle.

Meta apparently blocking Apple intelligence features in its apps. Yes, this just came out. Meta is reportedly disabling Apple's new AI features, the writing tools, the Genmoji creation within Facebook, Instagram, Threads, Messenger, WhatsApp on iOS. So if you're using Instagram on an iPhone, you won't be able to use Apple's built-in AI writing help. That's the understanding.

Those specific Apple intelligence tools just won't be accessible inside Meta's apps. Any reason given. Meta hasn't given a specific public reason.

speculation points towards while wanting to promote their own meta AI assistant instead, and maybe lingering friction from past disagreements between meta and Apple. Right. Competitive tensions playing out, potentially at the expense of user convenience on iOS? It certainly highlights those tensions in the AI space. Okay. Shifting to Microsoft now. Copilot Studio has a new feature called Computer Use. Sounds broad. What does it do? It's pretty broad and potentially powerful.

This feature lets AI agents built with Copilot Studio actually interact directly with websites and desktop apps. How? Like controlling the mouse and keyboard? Essentially, yes. Simulating human actions, clicking buttons, selecting menus, typing text into fields. The big advantage is automating tasks on systems that don't have APIs for AI to hook into directly. So you could automate stuff on older legacy software. Exactly. Things without modern integration points.

And it's supposed to be somewhat adaptable using reasoning to handle minor changes in the interface, like if a button moves. - That's clever. Where does the processing happen? Privacy concerns. - Microsoft says all the processing is done on their infrastructure and crucially enterprise data used with this feature is not used for training their AI models.

addressing those privacy and security points for businesses. Okay, so this could really help businesses automate things like data entry, invoice processing, even on older systems. That's the idea, making automation more accessible, even without APIs. Now something a bit different, running AI privately on your own computer, we're hearing more about this, why are people doing it? The big reasons are privacy and data control. When you run an AI model locally on your own machine,

Your prompts and the AI's responses generally don't leave your computer. No data sent to external servers. Right. Keeping it all in-house. What tools are people using for this?

There are a few popular options. GPT-4 All is one. All Llama is another widely used one, especially for command line folks. And LM Studio provides a nice graphical interface on top of tools like All Llama. And these work on regular computers, Mac, Windows, Linux. Yep, they're designed for standard operating systems and generally run on decent consumer hardware. You don't necessarily need a supercomputer, though better hardware helps with bigger models. How hard is it to get started? It's becoming easier.

With a LAMA or LM Studio, it's usually download the software, download an AI model file. There are lots of open source ones available. And then you can start chatting with it, either in the terminal or through the LM Studio interface. The main thing is matching the model size to your computer's power. Exactly. Smaller models work better on older or less powerful machines.

but it definitely puts powerful AI capabilities in reach while keeping your data private. That's a significant development for individual control. Okay, let's check in on Anthropic's Claude, new research feature. Yes, Claude is getting smarter about finding information. This new research feature lets it autonomously search the public web and a user's internal company documents or resources. Autonomously. So you ask a question and it goes looking. Pretty much.

It aims to provide comprehensive answers and, importantly, it cites its sources, whether web pages or internal docs. And integration with work tools. Big one here: integration with Google Workspace.

So Claude can access your Gmail, Docs, Sheets, Calendar, contextually, without you needing to manually upload stuff. That makes its assistance potentially much more relevant. That sounds incredibly useful for work. Who gets this? The main research feature is in beta for Macs, Team, and Enterprise plans in the U.S., Japan, and Brazil right now.

But the Google Workspace integration is rolling out to all paid Claude users. So Claude's becoming a more proactive, context-aware assistant.

Boosting productivity. That's definitely the direction they're heading. Now, speaking of information sources, Wikipedia is releasing a dataset for AI developers. That's right. They've partnered with Kaggle, the data science platform, to release a curated dataset based on Wikipedia content. Why are they doing this? To stop people scraping their site? That's part of it. They want to provide a high-quality, structured alternative to bots hammering their servers trying to scrape data.

The hope is it promotes more ethical AI development and reduces the strain on their infrastructure. O responsible move: giving developers good data while protecting their resource. Exactly. O win-win, hopefully. Okay, a cautionary tale now. An AI support agent made up a policy. Yeah, this involved a company called Cursor. They make AI coding tools.

Their AI support assistant apparently fabricated a policy when interacting with the user, just hallucinated it. Ouch. That's not good for customer trust. Not at all. The company apologized, blamed model hallucination under heavy load, but it really underscores the risks of letting AI interact with customers unsupervised. Right.

Need safeguards, human oversight, especially for important stuff like policy. Absolutely. A reminder that these systems aren't infallible and need careful management in customer facing roles. Okay. On a brighter note for students, Google's giving away AI premium. Yes. Pretty big news for students.

Google is offering its one AI premium subscription that includes Gemini Advanced, the 1.5 Pro model, Docs' Gmail integration, all the AI tools for free to college students with a verified.edu email. For free? Until when? Until spring 2026. That's a subscription that normally costs about $20 a month. Wow, that's generous. What's the play for Google? Likely investing in the next ways of users. Get students familiar with and reliant on their advanced AI tools early on in their education and potential careers.

smart long-term strategy. Definitely gets their tools in the hands of future professionals.

OK, one more research item. MIT figuring out how to make AI write better code syntactically. Yes, researchers at MIT have a new technique. It's about guiding large language models to follow programming language syntax rules more reliably when generating code. Is it about retraining the models? No, that's the interesting part. It doesn't require retraining. It uses clever prompting strategies that can work with different models, model agnostic.

aims to improve accuracy in code generation and things like formatting data correctly, like JSON. Less buggy AI-generated code would be a huge help for developers. Absolutely. Reducing syntax errors means more reliable code, better developer productivity, a potentially significant improvement. Okay, wow. That's a lot. Before we wrap, let's just quickly recap some other notable bits that flashed by. Sure. So OpenAI's O3 model scoring 136 on that Mensa IQ test, higher than Gemini 2.5 Pro's reported score.

UC Berkeley's chatbot arena spinning off into its own company, Elmarina. Right. We mentioned Perplexity's Motorola deal and Samsung Talks. Yep. And XAI's Grok getting memory and a workspaces feature. Alibaba releasing that open source model WAN 2.1 for making videos from two images. That sounds cool. Deezer reporting 20,000 plus AI song uploads daily and filtering them.

And OpenAI apparently looked at buying Cursor's creator, AnyFear, before the Windsurf talks. Man, it just never stops. The pace is incredible across the board. Truly highlights the dynamism, the sheer breadth of what's happening in AI right now. It really does. And look, if navigating all this feels daunting or if you want to upskill for this AI-driven world, I do want to give another shout out to Etienne's AI-powered JamGift tech app. Good reminder.

Yeah, it's built to help anyone master and actually pass over 50 different in-demand certifications, cloud, finance, cybersecurity, health care, business, you name it. If you're serious about advancing your career with these changes happening, definitely check out the Jamgat Tech app. Links are in the show notes. So stepping back from all these individual developments, the sheer integration we're seeing.

AI in spreadsheets, AI helping design drugs, AI on our phones. Yeah. It prompts a bigger thought, doesn't it? It really does. And maybe that's the final thought for our listeners today. With AI weaving itself deeper into our tools, our devices, even potentially our biology through things like protein design.

What are the long term implications for privacy? Sure. But also just for human machine collaboration, how we work, how we live. Yeah. How does society adapt and shape this and how does it shape us? Lots to mull over. Definitely something to think about. Well, thank you again for joining us on this deep dive into the whirlwind world of AI. We hope it helped make sense of some of the key shifts. Always a pleasure. Until the next deep dive.