We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode A wild week in AI

A wild week in AI

2025/4/21
logo of podcast Elon Musk Podcast

Elon Musk Podcast

AI Deep Dive AI Chapters Transcript
People
主持人
专注于电动车和能源领域的播客主持人和内容创作者。
Topics
主持人: 本周人工智能领域发生了许多令人难以置信的事情,例如谷歌训练人工智能与海豚交流,这标志着人工智能技术在跨物种交流领域的重大突破。 谷歌开发的Dolphin Gamma模型是一个紧凑型AI模型,可以在手机上直接运行,用于理解和生成海豚的叫声。该模型使用了音频标记化框架,将海豚的各种叫声转换为离散的音频标记,并利用谷歌的SoundStream编解码器进行处理。 Dolphin Gamma模型的架构具有通用性,未来可能用于理解和模拟其他动物的叫声,甚至实现与某些动物物种的实时双向交流。这将为研究人员提供前所未有的机会来理解动物的语言和行为,并可能促进人与动物之间的沟通。 与此同时,新的动画工具也展现了人工智能在创意领域的巨大潜力。UniAnimate DIT插件允许用户使用运动参考视频为任何角色图像制作动画,其结果令人惊讶地好,即使是具有复杂外观或不寻常解剖结构的角色也能以最少的伪影进行动画处理。Tencent的Instant Character工具可以将虚拟角色放置到新的场景中,并保持其属性的一致性,这对于在视觉内容创作中至关重要。 此外,Nvidia的Parkfield项目是一个用于3D对象的部件分割模型,能够将复杂的网格分解成单独标记的组件,这对于构建物理模拟、机器人技术或游戏资产具有明显的意义。 最后,在北京举行的类人机器人半程马拉松比赛反映了机器人移动技术的快速发展,类人机器人可能很快就会被部署到面向公众或体力劳动的角色中。这场比赛测试了机器人的耐力、适应性和真实世界的稳定性,这些都是机器难以掌握的特性。

Deep Dive

Chapters
This chapter announces the evolution of the podcast, broadening its focus to include all tech titans and its relaunch as Stage Zero. It also encourages listeners to support the show through Patreon.
  • Podcast is expanding its focus to cover all tech titans.
  • Relaunching as "Stage Zero".
  • Listeners encouraged to support the show on Patreon for exclusive content and early access.

Shownotes Transcript

Translations:
中文

This episode is brought to you by Chevy Silverado. When it's time for you to ditch the blacktop and head off-road, do it in a truck that says no to nothing. The Chevy Silverado Trail Boss. Get the rugged capability of its Z71 suspension and 2-inch factory lift. Plus, impressive torque and towing capacity thanks to an available Duramax 3-liter turbo diesel engine. Where other trucks call it quits, you'll just be getting started. Visit Chevy.com to learn more.

The PC gave us computing power at home, the internet connected us, and mobile let us do it pretty much anywhere. Now generative AI lets us communicate with technology in our own language, using our own senses. But figuring it all out when you're living through it is a totally different story. Welcome to Leading the Shift.

a new podcast for Microsoft Azure. I'm your host, Susan Etlinger. In each episode, leaders will share what they're learning to help you navigate all this change with confidence. Please join us. Listen and subscribe wherever you get your podcasts. Welcome back to the Elon Musk Podcast. I'm thrilled to share some exciting news with you. Over the next two weeks, we're evolving. We'll be broadening our focus to cover all the tech titans shaping our world. And with that, our show will become sensational.

Stage Zero. You'll still get the latest insights on Elon Musk, plus so much more. So stay tuned for our official relaunch at Stage Zero coming soon. So for the past four or five years, I've been bringing you in-depth, no-nonsense insights from the world of Elon Musk. But I need your help to keep the show alive and growing.

If you love what you hear, consider supporting Stage Zero on Patreon at patreon.com slash stagezeronews.

By joining our Patreon community, you'll get exclusive content, early access to some episodes, and a chance to shape future topics. Everyone has a voice. And your support goes directly into making this show absolutely better. And it helps me keep bringing you the content that you enjoy every single day. If you're getting value from Stage Zero News, becoming a patron is the best way to make sure this journey keeps going.

So let's make the next five years even bigger together. There's a link in the show notes just for you. Why would Google train an AI to talk to dolphins? That's not what you hear every day. And it's exactly the kind of question that sums up why

What might be the most unpredictable, chaotic and fascinating week in AI so far this year. Not only are researchers now attempting to decode animal communication using lightweight neural networks, but we also saw humanoid robots run a literal half marathon, AI tools that can animate pets into dancers or bring comic panels to life with one click, and open AI launching its most intelligent models yet. Every one of these stories

is a question in itself. Why now? How does it work? And what does this mean for the future of interaction, creativity, and intelligence itself? Now, to start, Google made headlines this week with something called Dolphin Gamma, a compact AI model trained to understand and even generate dolphin vocalizations. Now, what makes this unusual isn't just the application. It's that the model runs directly on your phone.

Researchers used Google Pixel devices to process real-time dolphin chatter using a framework based on audio tokenization. They recorded every sound dolphins make, clicks, squawks, whistles, and converted them into discrete audio tokens using Google's SoundStream codec.

This tokenized data set was then used to train a smaller variant of Google's Gemma model, which comes in at around 400 million parameters. That's small enough to run efficiently on mobile hardware without external computing.

And beyond understanding, the model can also synthesize new dolphin-like sounds, which is a potential breakthrough for researchers aiming to translate interspecies communication into something that humans can eventually understand.

Now, the significance of this extends well beyond dolphins. The architecture is general enough that it could be retrained to understand and emulate vocalizations of other animals. And in theory, it could eventually support real-time two-way communication with certain animal species. Could you imagine talking to your own pets? That poses the question,

into strange territories. If an AI can mimic a language-like structure in dolphin chatter, are we on the verge of developing machine-driven cross-species translators? Now, while AI was speaking to marine life,

Animation tools were speaking to the internet's favorite content creators. UniAnimate DIT, which is a new plugin built for the open source model. Animate Diff 1.2 allows users to animate any character image with a motion reference video. Upload an aesthetic image, any photo of a person, a cartoon, or even a pet, and combine it with a short clip of someone dancing or moving around.

The tool extracts the pose data from the video, then applies it to the static image, producing a fully animated clip with smooth transitions. And what stands out is that the model can guess unseen angles.

like the back of the character and animate flowing fabric or hand movements convincingly. All this runs locally with a minimum of 14 gigs of VRAM, meaning creators can use the tool without relying on cloud services. And I've used it. The results are surprisingly good. Even characters with complex appearances or unusual anatomy, like fictional anime designs or animals, can be animated with minimal articulation.

artifacting. And since everything is released, open source artists and animators now have access to a new kind of puppetry. This is accessible as downloading a GitHub repo. Now, a companion tool emerged this week from Tencent as well. It's called Instant Character. It's focused as accuracy in reference based generation. So you have an image of a fictional character.

Instant Character can place the same character down to the facial structure, outfit details and accessories into a new scene. You can render them in a studio playing piano or walking in a snowstorm in full anime style. And the model is based on Flux, one of the highest fidelity open source diffusion models available. It uses LoRa adapters to style outputs in everything from Studio Ghibli to Makoto Shinkai's signature look.

Now, unlike most existing character transfer models, this one keeps attributes consistent across varied scenes and does so across 2d 3d and photo realistic styles. So why this matters to you isn't just for cosplay creators or fan artists in a world increasingly dominated by visual

organizations and virtual characters, the ability to preserve identity across generated media becomes critical, especially as avatars V tubers and AI generated influencers grow more complex and integrated into media ecosystems. Now, there's a new thing called Parkfield. It's a project from Nvidia. It's focused on a very different kind of segmentation. This time it's three dimensions.

It's a part segmentation model for 3D objects capable of breaking complex meshes into individual labeled components. Now think of a 3D model of a robot or a car with part field. Each part, arm, leg, wheel, mirror is isolated into its own labeled region, enabling texture swaps, physical simulations, or animations to be applied to each section independently. This has obvious implications for anyone building physical simulations,

robotics, or gaming assets. Compared to previous segmentation models, it not only performs better but is much faster, completing tasks in a fraction of the time thanks to more efficient tokenization and inference architecture.

Are you still quoting 30-year-old movies? Have you said cool beans in the past 90 days? Do you think Discover isn't widely accepted? If this sounds like you, you're stuck in the past. Discover is accepted at 99% of places that take credit cards nationwide. And every time you make a purchase with your card, you automatically earn cash back. Welcome to the now. It pays to discover. Learn more at discover.com slash credit card. Based on the February 2024 Nielsen Report.

Now, one more thing, a slightly surreal real world scene, a half marathon for humanoid robots just took place in Beijing. More than 20 companies from across China participated, entering bipedal robots that walked, jogged and ran across and around a racetrack in full view of cheering spectators. Some entries were clunky, barely able to maintain balance, while others like Unitree's G1 in Beijing humanoid innovation centers

Qiangjian Ultra managed smoother gates and even completed longer runs. Now footage of this shows some robots falling or freezing mid run, but others powering through the full event. Now Qiangjian Ultra in particular drew attention for its speed and its stability, suggesting real progress in bipedal locomotion design. The event might sound like a novelty though,

but it reflects a changing reality. Robotic mobility is evolving fast enough that humanoids may soon be deployed in public facing or physical labor roles. Holding a marathon may just be like a publicity stunt, but it's also a benchmark for these robots. Tests endurance, adaptability, and real-world stability. These traits that are notoriously difficult for machines to master.

At Capella University, you can learn at your own pace with our FlexPath learning format.

Take one or two courses at a time and complete as many as you can in a 12-week billing session. With FlexPath, you can even finish the bachelor's degree you started in 22 months for $20,000. A different future is closer than you think with Capella University. Learn more at capella.edu. Fastest 25% of students. Cost varies by pace, transfer credits, and other factors. Fees apply.

Raise the rudder. Raise the sails. Raise the sails. Captain, an unidentified ship is approaching. Over. Roger. Wait, is that an enterprise sales solution? Reek sales professionals, not professional sailors. With LinkedIn ads, you can target the right people by industry, job title, and more. We'll even give you a $100 credit on your next campaign. Get started today at linkedin.com slash results. Terms and conditions apply.

Get $50 in paychecks.

Pick six credits. Better payouts, bigger wins, only with pick six from DraftKings. The crown is yours. Gambling problem? Call 1-800-GAMBLER. Help is available for problem gambling. Call 888-789-7777 or visit ccpg.org in Connecticut. Must be

♪♪♪

Ryan Reynolds here from Mint Mobile. I don't know if you knew this, but anyone can get the same premium wireless for $15 a month plan that I've been enjoying. It's not just for celebrities. So do like I did and have one of your assistant's assistants switch you to Mint Mobile today.

I'm told it's super easy to do at mintmobile.com slash switch. Upfront payment of $45 for three-month plan equivalent to $15 per month required. Intro rate first three months only, then full price plan options available. Taxes and fees extra. See full terms at mintmobile.com.

Don't miss your chance to spring into deals at Lowe's. Right now, get a free 60-volt Toro battery when you purchase a select 60-volt Toro electric mower. Plus, buy three 19.3-ounce vegetable and herb Bonnie plants for just $10. It's time to give your yard a grow up. Lowe's, we help, you save. Valid through 423. Selection varies by location. While supplies last, discount taken at time of purchase. Actual plant size and selection varies by location. Excludes Alaska and Hawaii.

Hey, thank you so much for listening today. I really do appreciate your support. If you could take a second and hit subscribe or the follow button on whatever podcast platform that you're listening on right now, I'd greatly appreciate it. It helps out the show tremendously and you'll never miss an episode. And each episode is about 10 minutes or less to get you caught up quickly. And please, if you want to support the show even more, go to patreon.com slash stage zero.

And please take care of yourselves and each other, and I'll see you tomorrow.