We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The End of Language-Only Models l Amit Jain, Luma AI

2025/5/13

High Agency: The Podcast for AI Builders

Shownotes Transcript

This week Raza is joined by Amit Jain, CEO and co-founder of Luma AI, to explore why the future of artificial intelligence lies beyond language. Amit shares Luma’s bold mission to build world models through multimodal training and why video is the most overlooked and critical data source in AI today.

Chapters:00:00 - Introduction03:40 - Competing with Big AI Labs: Language vs. Multimodality08:09 - Joint Training and Why Current Multimodal Models Fall Short11:01 - Language is Discrete, the World is Continuous14:36 - Do These Models Have World Models?18:18 - Planning, Counterfactuals, and Causal Reasoning in AI22:08 - Capabilities of Ray 2 and Real-World Use Cases26:14 - Rethinking Video Length and Creative Workflows29:18 - Solving Coherence Across Shots and Characters30:00 - When Will AI Create a Feature-Length Film?31:27 - What You Can Build with Luma’s API Today35:49 - Overlooked Ideas and Noise in the AI Industry38:34 - Why Video is the Missing Link in AI

The End of Language-Only Models l Amit Jain, Luma AI

High Agency: The Podcast for AI Builders

Introduction

Why Compete with Big AI Labs Using Multimodality Instead of Language?

What Are the Limitations of Current Multimodal Models?

How Does Language Differ from the Continuity of the Real World?

Do These AI Models Truly Understand the World?

Can AI Plan and Reason Causally?

What Can Ray 2 Do in the Real World?

Rethinking Video Length for Creative Workflows

Solving Coherence in Video: Shots and Characters

When Will AI Create a Full-Length Film?

What Can You Build with Luma’s API Today?

Why Video is the Missing Link in AI?

Shownotes Transcript

The End of Language-Only Models l Amit Jain, Luma AI 40:17 Share

High Agency: The Podcast for AI Builders

Introduction

Why Compete with Big AI Labs Using Multimodality Instead of Language?

What Are the Limitations of Current Multimodal Models?

How Does Language Differ from the Continuity of the Real World?

Do These AI Models Truly Understand the World?

Can AI Plan and Reason Causally?

What Can Ray 2 Do in the Real World?

Rethinking Video Length for Creative Workflows

Solving Coherence in Video: Shots and Characters

When Will AI Create a Full-Length Film?

What Can You Build with Luma’s API Today?

Why Video is the Missing Link in AI?

Shownotes Transcript

The End of Language-Only Models l Amit Jain, Luma AI