Digital avatars aim to replicate the human experience of communication, offering a more lifelike interaction compared to text, which abstracts away nuances of speech and expression.
The terms 'avatar' and 'digital human' both refer to AI-generated representations, but 'digital human' implies a closer resemblance to actual human life, especially as AI improves and becomes more lifelike.
Synthesia requires around 3-4 minutes of video footage, which can be recorded with a webcam or phone. The avatar can then be customized with different voices, scripts, and languages, and can be used in various environments.
Users can either choose from off-the-shelf avatars or create custom avatars of themselves, with both options being equally popular.
Many people are initially self-conscious, but they often appreciate the final result, especially if it allows them to avoid the discomfort of being on camera.
Digital avatars allow businesses to create video content more easily and affordably, enabling them to communicate with customers and employees in a more engaging way than text alone.
Synthesia has strict content moderation policies and requires explicit consent for creating avatars. They also work to ensure that avatars are not used for harmful purposes, though they acknowledge that no system is perfect.
Synthesia plans to launch real-time avatars within the next year, which will allow for more lifelike, interactive experiences, potentially indistinguishable from human conversations.
While some roles like customer service may be fully automated, others that require human connection, like sales or hospitality, will likely retain a human element, as people value the personal touch.
He believes that media literacy will become even more critical, as people will need to critically evaluate the authenticity of content, presuming that most online content is fictional unless proven otherwise.
Hi, I'm Bilal Sadoo, host of TED's newest podcast, The TED AI Show, where I speak with the world's leading experts, artists, journalists, to help you live and thrive in a world where AI is changing everything. I'm stoked to be working with IBM, our official sponsor for this episode.
Now, the path from Gen AI pilots to real-world deployments is often filled with roadblocks, such as barriers to free data flow. But what if I told you there's a way to deploy AI wherever your data lives? With Watson X, you can deploy AI models across any environment, above the clouds helping pilots navigate flights, and on lots of clouds helping employees automate tasks, on-prem so designers can access proprietary data,
and on the edge so remote bank tellers can assist customers. Watson X helps you deploy AI wherever you need it so you can take your business wherever it needs to go. Learn more at ibm.com slash Watson X and start infusing intelligence where you need it the most.
Hi, I'm Bilal Velsadu, host of TED's newest podcast, The TED AI Show, where I talk with the world's leading experts, artists, journalists, to help you live and thrive in a world where AI is changing everything. I'm stoked to be working with IBM, our official sponsor for this episode. In a recent report published by the IBM Institute of Business Value, among those surveyed, one in three companies pause an AI use case after the pilot phase.
And we've all been there, right? You get hyped about the possibilities of AI, spin up a bunch of these pilot projects, and then crickets. Those pilots are trapped in silos. Your resources are exhausted and scaling feels daunting. What if instead of hundreds of pilots, you had a holistic strategy that's built to scale? That's what IBM can help with.
They have 65,000 consultants with generative AI expertise who can help you design, integrate, and optimize AI solutions. Learn more at ibm.com slash consulting. Because using AI is cool, but scaling AI across your business, that's the next level.
Proving trust is more important than ever, especially when it comes to your security program. Vanta helps centralize program requirements and automate evidence collection for frameworks like SOC 2, ISO 27001, HIPAA, and more, so you save time and money and build customer trust.
And with Vanta, you get continuous visibility into the state of your controls. Join more than 8,000 global companies like Atlassian, FlowHealth, and Quora who trust Vanta to manage risk and prove security in real time. Now that's a new way to GRC. Learn more at vanta.com slash TED Audio. That's vanta.com slash TED Audio.
Hey, Belaval here. Before we start the show, I have a quick favor to ask. If you're enjoying the TED AI Show, please take a moment to rate and leave a comment in your podcast app. Which episodes have you loved and what topics do you want to hear more of? Your feedback helps us shape the show to satisfy your curiosity, bring in amazing guests, and give you the best experience possible. ♪
If you could jump online and be able to chat with your favorite musician anytime you like for as long as you'd like, what would that be worth to you? What if you could connect with a personal dating coach as often as you wished to help sharpen up your online dating skills? Would that be appealing? Or what if you could make a digital copy of yourself and release your doppelganger to the web to take care of some of your online identity work for you?
Much of this is actually within reach. Companies are learning to pair AI tech with video, audio, and animation tools to effectively mimic real people and real-ish interactions all at the same time. Musician FKA Twigs, for instance, built a digital clone of herself and uses it to let fans interact with a version of her. The founder of Bumble, the dating app, talked about how the future of dating might begin with digital avatars pre-interviewing each other.
And that sort of flips the AI argument on its head a little bit, doesn't it? We've talked a lot about the potential and risks of AI becoming too human-like, but this is the reverse story. This is about human beings becoming more digital-like to become, in a sense, digital humans. If that's something you'd find useful, there's a handful of companies ready to help you create the digital version of you. One of those is called Synthesia.
Using a short five-minute video you can record with your phone or webcam, you can build a reasonable facsimile of a human being. You can then choose a voice, give it a script, get it translated to dozens of languages, add a few design flourishes, and now you can push relatively pro-looking video content to your followers, your employees, whoever. No sets, no actors, no sweat.
Many of Synthesia's clients aren't individual people. They're massive global companies like Heineken, Zoom, Xerox. Synthesia says more than 50,000 customers have built digital avatars into their comm strategies.
In today's demanding market, we as team leaders need to be more than just experts at our jobs. This means that we need to be a leader, a coach, and a trainer. And we also need to embody the values, mission, and vision of our company. That probably sounds to you like a generic, typical computer-generated voice. And sure, it is. But it's also the voice of a Synthesia avatar that Electrolux, the global appliance company, uses to distribute video modules to help train its workforce.
The tech is impressive enough that last summer, investors lifted Synthesia's valuation to unicorn status, hitting that vaunted $1 billion valuation. It seems like a lot of people are very interested and now very invested in seeing digital humans take off and take over how we communicate with each other now and into the future.
But in this quest to build lifelike, useful digital avatars of ourselves, are we rewriting our understanding of what communicating human to human looks like? Who are we in a world that could soon be dominated by digital doppelgangers? I'm Bilal Al-Sadu, and this is the TED AI Show, where we figure out how to live and thrive in a world where AI is changing everything.
Does your AI model really know code? Its specific syntax, its structure, its logic? IBM's Granite code models do. They're purpose-built for code and trained on 116 different programming languages to help you generate, translate, and explain code quickly. Because the more your AI model knows about code, the more it can help you do. Get started now at ibm.com slash granite. IBM, let's create.
Does your AI model really know code? Its specific syntax, its structure, its logic? IBM's Granite code models do. They're purpose-built for code and trained on 116 different programming languages to help you generate, translate, and explain code quickly. Because the more your AI model knows about code, the more it can help you do. Get started now at ibm.com slash granite. IBM, let's create.
Proving trust is more important than ever, especially when it comes to your security program. Vanta helps centralize program requirements and automate evidence collection for frameworks like SOC 2, ISO 27001, HIPAA, and more, so you save time and money and build customer trust.
And with Vanta, you get continuous visibility into the state of your controls. Join more than 8,000 global companies like Atlassian, FlowHealth, and Quora who trust Vanta to manage risk and prove security in real time. Now that's a new way to GRC. Learn more at vanta.com slash TED Audio. That's vanta.com slash TED Audio.
What does it mean to be human in a world of digital doppelgangers? Big, juicy, philosophical question, I know. But Victor Ripperbelly is one of those real humans who thinks about this a lot. He's the co-founder of Synthesia.
Hey, Victor, welcome to the show. Thanks, man. Glad to be here. Just to level set this conversation first, we already have so many tools for communication, and you've talked about how text was the original data compression for human communication. But now we have video calls, messaging, social media, podcasts, newsletters, emails, the list goes on. Why are digital avatars necessary? I think at its very core,
Almost any technology we've invented for communication kind of abstracts something away, right? Like text being the most obvious example where if you take the experience of me talking to you in real life and delivering some kind of a message versus the way you kind of perceive that message, interpret that message.
would be very different than if I sent you exactly the same words written in a text message. I mean, even kind of pre-text, right? We had cave paintings with all sorts of other technologies that essentially helped us kind of like store information and deliver it to someone else kind of in a different time and different space. And what we've been doing since then is really just trying as much as we can to make these technologies appear to be as close to the experience we have in real life as possible.
And I think we have lots of ways we've sort of gone around that. But obviously, you know, the ultimate way of doing this is that you can replicate the actual human experience of speaking to someone. And digital humans and digital avatars, of course, is an important part in that. And on that note, I've heard you refer to your avatars as digital humans. What's the difference in your mind?
I think there's a lot of different words that kind of goes around AI clones, AI avatars, AI humans. I think ultimately, I think they all represent roughly the same thing. If you say it's an avatar or face or character, that kind of implies that it's a non-human entity, where if you use the word human, that does imply something kind of different about it. And with the era we're living through right now, with computational intelligence improving very, very rapidly,
Maybe I think the reason people are talking about digital human now is because it actually feels...
that we can create something that very closely resembles human life, right? Both in the real world, but I think before that, in the digital world. Like all of us have interacted with chat GPTs and LLMs. We've seen firsthand the power, how much they can actually pretend to be a human. And if we can give them the kind of visual expression and the audio expression of that as well, digital humans, it actually does feel like we're going to get pretty close to
be able to create something that feels like a digital human, not just because we use the word, but because it actually feels like it when we interact with it, right? So next year, we'll launch a real-time avatar we can actually talk to. And I think there's probably something there where that's when we begin to think of it more as a human than we think of it as just a technology. And I think maybe a good way to anchor that is
When you think about like a chatbot and chat GPT, one thing that's very interesting is that I do this myself, and I think most people do, is that when you're interacting with these systems, people are actually quite polite. Yeah, definitely. You talk to chat GPT like it's actually like a coworker. You say, please. And it's kind of weird, right? Because you're interacting with a computer that has no feelings as far as we're aware. But because technology is now so powerful, it's very hard for us, despite consciously knowing that we're interacting with a large language model.
to feel that way, right? And I think that is our relationship with machines that's about to change quite dramatically. And digital humans is going to be the most obvious expression of that. There are two ways that a person can use Synthesia to create a digital human. They can pick from these off-the-shelf avatars that you own and build, or they can get custom avatars made of themselves. I'm curious, which is the more popular route?
It's actually like roughly 50-50. In the beginning, we were kind of like, which one is the most important, right? And I think as kind of time has gone on, it's very clear that there's no answer to that. They both serve different purposes. One of the things we learned very early on and when we started the company was that one of the big reasons that people love the product so much was because they didn't have to be on camera themselves. They don't like how they sound. They don't like their accent. And so a big part of the value proposition around Tenduza was actually the people
could make video without having to be themselves, right? And that was a pretty big unlock. But then it's also very obvious that there's also a bunch of use cases where you wanted to be yourself, right? So if you're
a CEO creating a video about your company strategy for next year, that's kind of weird coming from an anonymous avatar. If you're a salesperson sending out videos to your prospects or to your existing customers to update them on something that's happening in the product, it makes a lot of sense that it's you and so on and so forth. So I think it's just, there will be many different types of use cases. And I think we'll see a mix of people's own avatars. We'll see
entirely generated avatars that are specific to companies and our customers, right? So you can build your own kind of like IP, if you will. And there's also going to be existing real celebrities that's going to have
There's going to be a big unlock in terms of how they can work with brands in a much more scalable way than they could before. Look, even for myself, I would love to have my digital avatar, digital human be delegated to do a bunch of this stuff, especially the setup process of recording a video, I think is painful. But I'm curious for the demographic that you talked about that is super excited about not having to go through that pain or perhaps didn't grow up with selfie culture in this world with cameras all around them.
When those folks first encounter their digital avatars, what kind of reactions do you typically see? A lot of people are very self-conscious, like they would be if they recorded, you know, just a screen recording of themselves or like a selfie video.
But people like it when they like the result. And I think one interesting anecdote here is, you know, in the early days of Instagram, for example, the big growth hack that Instagram employed was actually filters on images and on videos, right? It's actually very simple. It's like you take a picture and you make it, you know, slightly more saturated. You make it like black and white or whatever. But that makes that picture appear to look much, much better than before, right? Whereas every single image that people are taking before on their home cameras would look
fairly crappy without having someone actually edit it, which was like out of bounds for like most people. And so I think what we see a lot of that is the same with avatars.
People want to kind of like touch themselves up. They want to make sure that they're like, you know, being shot in a nice environment with nice lighting, that they're like wearing their best clothes. They want to be like the best representation of themselves. But I think in general, people love it, right? People, especially people who doesn't want to be in video, once they're happy with their appetite, unlock so much for them. Like executives who otherwise are asked to record videos several days a week, they now don't have to do that. They can work with their team to just create the content automatically.
And then I think also people have this sort of... On a personal level, right? It's pretty odd the first time you see your avatar. It's pretty odd the first time you hear yourself speaking a language that you don't actually speak. And it's clearly your voice. It sounds like you. And I think that's a very interesting glimpse for people into kind of like the future, right? A lot of these...
What I love about Gen AI as kind of a cultural movement and technology movement is that it's so accessible that all of us actually gets to feel firsthand what these technologies mean, right? What can they do? How powerful are they? And this is just such a visceral, I think, experience of some of the things that AI can do. And I think also everyone feels like, well, this is only going to get better and better and better, right? Even though, of course, we've made a lot of progress, there's still so much more to go.
I mean, these avatars are really cool. And I will say, I mean, especially coming from a VFX and CG background, you can at this stage tell that they're still an avatar. There's that whole uncanny valley question. And I'm curious on the consumption end of this, what are the reactions like? And does the context matter there? Like if people are reacting to a video in like a sales inbound email versus, you know, and countering it on a banking website versus a virtual CEO address.
How do people react to these digital humans in these various contexts?
So I think you nailed it there, right? It's very much about the context. I'm pretty sure that if I use my avatar to record a love letter to my girlfriend... It's like a "you outsource this?" She's probably a bit disappointed that I sent my avatar to do that and not my real self. But if you're a user trying to understand your mortgage application on a banking website and you're presented with 10 pages of text with very complex information,
almost everyone prefers to watch a video that just simplifies it for them, right? So what we generally see a lot of our customers, almost all of our customers, I think, is that they introduce the avatars like, hey, this is your virtual facilitator. This is not a real person. This is an avatar and they're going to help you through the buying process. They're going to help onboard you to your company, whatever. And what we see overwhelmingly is just that people really love interacting with these videos, especially if the alternative is text, right?
We just did a big study with UCL here in London because we wanted to investigate how do people like to react to these videos. There's a few kind of interesting stats. One of them is that
people actually completed the videos with avatars faster than the one with humans. That's because when they watch the videos of humans, humans are more imperfect, right? Like, you know, we kind of use a few too many words or we say something a little bit clunky or whatever. And so people kind of scroll back into the video to watch a section again. But with the avatars, because it's kind of like perfect in the sense that the script has been kind of written from the get-go, the information is actually more concise.
And also very overwhelmingly shows that people by far prefer to learn by watching AI videos rather than ingesting text. The study that you just, the stats that you mentioned make total sense to me, right? It's like you're distilling down the information and just like communicating it in a far crisper fashion than say, you know, a long meandering conversation from a human. Though, you know, like some humans are more concise than others, right?
When it comes to that CEO example, though, how important is photorealism to you? Maybe to level set, if I had to ask you to grade, you know, the photorealism of your avatars right now on a scale of one to 10, where would you put it? I think if you, I think you have to dissect it a little bit. I think you take the photorealism as in like, how real does it kind of look? I think it's very close to 10. I think what's...
If you took a still frame of the video, I think it's very difficult to tell that it's an avatar, which is very large part due to AI being very good at rendering. I think where avatars still have a bit of a way to go is the body language matching what you say. There's a beat to what we say. When I speak to you now, my eyebrows move in a specific way, my hands move in a specific way. We have this whole language with our bodies, and we don't notice that in the real world because all of us do this.
But we noticed that when we see a video of a little avatar whose body language is kind of like out of sync. So what most avatar products in the market today and
not ours, but most avatar kind of companies usually do is that you take a real video of someone and then you loop it in perpetuity and you just change the lips. This illusion sort of works pretty well in shorter bursts, but you begin to get this kind of weird sense where the head movement is out of tune, where they're saying the hands doesn't match what's being said, and that kind of throws you off quite a bit, right? And I think there there's still a bit to go. Our new model that we're launching soon has kind of full body language, including hands. That makes a big difference.
And then I think there's still in the voice, there's a little bit of imperfections. But I can think that the visual quality is more or less there. It's more about like the last percentage of like the body language and the kind of emotional expressiveness in these avatars, right?
What you're saying makes sense to me. So it's almost like the visual fidelity, if you just look at it that way, is pretty cool. It's kind of crossed the uncanny valley. But on the other hand, yeah, you're totally right. Like that emotive quality and the body language, like in motion, that still needs a little bit of work there. And that part is like, again, AI will, I think the models we have in-house have more or less solved that. But basically, I think what we've seen is that no matter how many human animators you throw at like animating a digital human,
we cannot animate it to perfection. And as humans, we are so, so, so sensitive to even the slightest inconsistencies, right? And what's amazing about AI and generative AI is that
The old school way of doing this, right, is that you sit down as a human being and we try to make a list of instructions of exactly how someone should move. And of course, with AI, what we're doing is kind of like the opposite way around. We're saying, we're not going to tell you what to do. We're just going to show you a bunch of examples of how people actually move. And you can yourself learn what that means, right? So we don't tell the computer, hey, there's like...
six, seven facial bones and muscles and all those kind of abstractions in some sense that we as humans have built to animate digital humans. We can kind of throw those out the window and say to the machine, you figure out your own taxonomy of how the body works and how people move. And that could be like a 5 billion parameter model that a human being would never be able to sit down and comprehend. But if the computer understands it,
Who cares, right? It could produce an output that actually looks and feels very realistic. And I think that's what we've seen in every modality, right? It's just that AI is extremely good at this because it can think way more abstract and in way more kind of parameters and dimensions than human beings ever could, right?
1-800-Flowers takes the pressure off by helping you navigate life's important moments by making it simple to find the perfect gift.
From flowers and cookies to cake and chocolate, 1-800-Flowers helps guide you in finding the right gift to say how you feel. To learn more, visit 1-800-Flowers.com slash ACAST. That's 1-800-Flowers.com slash ACAST. Getting engaged can be stressful. Getting the right ring won't be at BlueNile.com. The jewelers at BlueNile.com have sparkled down to a science with beautiful lab-grown diamonds worthy of your most brilliant moments. They're the most brilliant.
Their lab-grown diamonds are independently graded and guaranteed identical to natural diamonds and ready to ship to your door. Get $50 off your purchase of $500 or more with code LISTEN at BlueNile.com. That's BlueNile.com, code LISTEN for $50 off.
Ryan Reynolds here for, I guess, my 100th Mint commercial. No, no, no, no, no, no, no, no, no. I mean, honestly, when I started this, I thought I'd only have to do like four of these. I mean, it's unlimited premium wireless for $15 a month. How are there still people paying two or three times that much? I'm sorry, I shouldn't be victim blaming here. Give it a try at mintmobile.com slash switch whenever you're ready. For
I love this because this is certainly what you're describing as a huge difference to the way Hollywood has traditionally done it, where it's like, you know, crazy light stage scan where you're essentially in this dome with a bunch of lights pointed at you or, you know, a Medusa scan where you have to do these explicit expressions.
So that really makes me curious, you know, for a lot of these off the shelf avatars you offer, you do capture a ton of your own training data when generating those. And of course, there's a process for folks to make their own digital twin, their own replica as well. Yeah. What does that process look like now and what is it going to look like in the future?
So right now, we need around three to four minutes of footage of someone. And that's just, I mean, that can be recorded with your webcam. You can record with your phone. You can go into a studio. Today, you're still, basically the input is the output, as we generally say. So if you record with your webcam, you're going to get a video back. Your avatar is going to be you sitting, recording yourself on a webcam. If you go into a studio, it's going to be you in a studio. The big thing we're launching very soon is being able to essentially create an avatar of you once and then record
create new variations of your avatar in different environments. So let's say you've recorded one where you're sitting at home in your podcast studio, but now you actually want to record a video where you're on top of a mountain or you're flying a plane or you're skydiving and doing like a million different other things.
we can then create that avatar for you by you just essentially using text to prompt yourself into new scenarios. Cool. This is going to be a big, big, big unlock. So the way it works is that we still need some video of you. And the reason we need some video of you is because if we started from just an image of you, which is basically the modality you want this to work in, right? You take a single image and from that you can generate a scene of you. Then we don't know anything about how you look
how you move, how your head kind of goes around, right? Even my teeth, you know? Even your teeth, the way you talk, we can never infer this from just a single image, right? Because the information is just not there. But what we want to be able to do is we want to build a model that says, this is exactly, you know, like how you move and how you speak and how your hands kind of work in conjunction with what you're saying.
And then once we have that model, then we can much easier just say, okay, here's a picture of you standing on top of a mountain. Here's you in a supermarket. Here's you behind a bar or whatever. And then we can begin to create these kind of new scenes. And I think, you know, this is going to be one of those advancements that's going to have like a huge impact in terms of what people use the product for and how much fun you can have with it.
I love that. It's kind of replacing the whole kind of green screen visual effects workflow, right? If you just go capture it in reasonably diffused, decent lighting, and suddenly you can kind of, you know, choose a bunch of different backgrounds. Like that's like virtual production democratized. Before I get carried away and get too excited about that, I do have a question. Like, so if someone creates this avatar, let's say I made it, who owns it? And can I license my digital doppelganger?
So you own it 100%. And if you wanted to delete it, we'll of course fully delete it. No questions asked. And that'll always be the case. We are thinking about what to do with kind of likenesses and should we create a marketplace where people can rent out their likeness to work with like brands or creators. It's not a functionality we have yet. What's exciting about it is that it opens up like so many new ways of using your likeness, right? So let's say that you're a celebrity, for example. The traditional way a celebrity would engage with a brand is
is you say, okay, Miss Big Celebrity, we're going to go into this warehouse. We're going to shoot an advertising with you. We're going to take a bunch of still photos. And this is then sort of material for all of our campaigns moving forward. And maybe they'll record some social media clips as well. And then you're kind of done. You've recorded all the content and now the brand can then use that.
What this unlocks is what if you have an e-commerce store and every time someone buys a product, you want to send a thank you message from a well-known celebrity. All of a sudden, it doesn't necessarily need the celebrity to do much else than just say, "Yes, I'm fine with this. I'll license up my likeness." And maybe instead of that being kind of like a big upfront payment to the celebrity, celebrity is just paid $1 every time someone buys a product in that store, right?
And the store can quickly switch out the celebrity with someone else if they want to try someone else. Or maybe they think that for one segment of their customers, the celebrity A is the best choice. For another group of customers, celebrity B is the right choice. And because everything here is generated with code, you can actually begin to do these kind of things.
And so what I think we'll see is actually a democratization of like working with celebrities in some sense, where today you need to have like millions of dollars and big budgets and whatever to work with a big celebrity. In this way, the celebrity can actually pick who they want to work with, right? Maybe a celebrity would prefer to work with 500 small artisanal shops all over the US that each pay them, you know, what...
but less, but in aggregate pays the same as like one big Coca-Cola campaign. I think that's actually pretty interesting because my guess would be if you ask little celebrities who they would prefer to work with,
They probably would prefer to work with small artisanal shops with products that they actually love rather than some mega brand who just throw millions at them, right? So I think we'll see a lot of new business models kind of emerge. And I personally think that's pretty exciting. That is exciting indeed. And it brings me back to sort of the B2B focus for your company. Given that most of your customers are businesses, right?
You know, what are the types of things that they're using it for? And, you know, in the past, you've described this sort of as like, you know, it was a vitamin for like the entertainment industry, but it's really a painkiller for businesses. Why is that?
So when we started the company, we initially, as you said, we set out to actually build tooling for video professionals to be more efficient. And the first thing we did was build this like AI dubbing functionality. You kind of take a real video. We did a very famous one, David Beckham, speaking obviously in English. Then we could take that advertisement and we could create it in 10 different languages. And so it looks like David Beckham, in this case, was speaking in a different language. And it's definitely a very cool product. And there was a lot of interest in it and it did like okay in the marketplace.
But we just had this kind of feeling that if we disappear tomorrow, they will find another way of solving the problem, right? And it was kind of like a cool thing, but it wasn't really a painkiller, right? It was a nice thing to have. And it's very difficult to build a big company around something that's nice to have. You want to sell something that people really, really need to have. And so as we kind of went through the motions of taking that product to market and really just trying to build an understanding of video from first principles, we suddenly had this feeling that
There's a lot of people in the world who are not making video today, and they're desperate to make video. And when we spoke to those people, they obviously did not work in the video industry, right? They worked in big companies. They were like a marketing manager, training instructor, sales professional, something like that. And they're all telling us that they are desperate to make video. They have a lot of great content, a lot of great knowledge that they want to share with their customers and with their employees.
but nobody reads anymore, right? They send out these emails that just ends up in the archive. So they wanted to make videos. They tried to make videos. The thing, if you work in a big company, is that often there's a lot of content to produce, which means the quantity of videos you have to make is very high. There's often need to translate them. There's need to update them after you've shot them because something changed in your business.
And that's just impossible to do with a real video. And so for these people, if we can give them a way to make video, which is a thousand times easier and a thousand times more affordable than shooting it with a camera, they would probably be okay with the quality of those videos being lower than what the video industry would. Because for these people, the alternative is not a real video from a camera. The alternative is text. And so it's like, do you compare this to a real video? Do you compare it to text?
It's not like people are saying, you know, all this content we used to shoot with a camera, we'll now make with Cintiq instead. It's people saying, well, all this text that we have and all these slide decks and all this kind of static information, we can now turn that into video content. And that became the kind of inflection point for us once we kind of figured that out. And I think there's...
I love what you said before, because we had the same kind of feeling, right? It's like, how weird is it that potentially the biggest market for visual effects is actually going to be corporate communication in a couple of years, not Hollywood, right? That's very contradictory. Like no one would have thought that to ever happen. But in many ways, I think the biggest ideas, the most impactful ideas always feel very weird and very contradictory, right? Like Airbnb, I think is like, what if people just like invite straight to sleep in their home? Yeah.
for a bit of money. Everyone would be like, you're absolutely crazy, right? But I think that's what technology kind of does. It challenges a lot of these kind of inherent assumptions. And I think in our little world, this is a pretty good example of that. Because ultimately what we do, to your point, is special effects, right? It's visual effects. We call it AI because we use AI, but at its core, right, it's not too different from what Hollywood has been trying to do for many years.
Definitely is the art and science of visual effects. And I'm kind of curious, right? Like on the consumer side, there's this short form video fatigue and just video fatigue. Everyone's doing video all the time. But on the enterprise side, as you mentioned, there's a bunch of this content that just would never have been converted into video form.
If you take that to the limit, do you think there is a similar risk where we just end up polluting our feeds with a bunch of throwaway content? It's just going to be like an onslaught of enterprise B2B video content. But I think what's going to happen is that video is going to become the table stakes. So today, email is table stakes, right?
you don't operate a company without sending out emails. At one point, if you're sending me like email with lots of text in them, you're just not going to open them, right? Your inbox in the future is going to look more like your TikTok feed where you just kind of quickly scroll through what's interesting. And as always, just like it is with email today, just because something gets
gets easier to produce. You still have to be a great storyteller. You still have to figure out what's the right hook to get my attention, to watch your video all the way through and get in contact or whatever it is that you want me to do. I think all those things around storytelling and building a good product and being good at communicating it, none of that goes away. So I think what's true now, what is going to be true in the future, it's about curation and standing out.
So we are seeing an explosion of content. And of course, every time tools like the ones that you're creating come out, people use it for misinformation and disinformation. Right. And so there have been instances in the past where Synthesia avatars were used to spread misinformation. How much of those incidents pushed you to sort of lock down or put rails on the abilities of these avatars?
so the the safety aspect has always been very important to us and you know since we we found the company in in 2017 we did so on on an ethical framework called the free seas consent control collaboration and content is about we never create avatars of anyone without explicit consent and that's kind of like a hard stop and which means we kind of lose out on some virality because we don't make funny videos for satire of like celebrities or whatever right but that's that's a choice we decided to make
The second one is from control, right? So that's basic content moderation, which is we take a very strong view on what you can use the platform for, what you can't use the platform for. We're a B2B product. We work with the enterprise. And so we're probably a bit overly strict in some senses. You know, there's legal categories of content that we kind of are very restrictive around. And we put a lot of effort both with machines and with humans into making sure that people don't use our platform for things they shouldn't.
I think with these incidents that happened in the past, we'll always get judged by the one video that makes it through, and we learn something from that every single time. In many ways, right, like when you do concept moderation, a lot of people disagree with you, no matter what direction you go in. Yeah, you're not going to make everyone happy. Exactly. And especially, of course, when it comes to things like news and politics.
religion, et cetera, this gets very, very hairy. And no matter what you do, there'll be people who don't like it, right? And so there was specifically one of these instances, which I think was something we discussed a lot internally when someone made a video and I'll leave out kind of like the details of it, but essentially a video about like a pretty hairy topic, right? A topic that'll divide people in two, either you're very pro or you're very against. And the video was actually entirely factual.
But it was not perceived at this one big newspaper as being kind of a piece of sort of propaganda information. And that was a very interesting one for us because we fact-checked it and there's nothing that wasn't factual in there. You could argue that talking about it in a specific way was kind of like a ploy to make people believe something specifically. But I mean, all communication has those properties. And
And so what we've decided to do is just to be, again, kind of overly restrictive. So we don't allow news and current events content, unless you're an enterprise customer, for example. That's actually a shame because we had a lot of NGOs, citizen journalists, and those kind of folks making great content on the platform, but it's just too difficult to manage eventually. And so we decided to make that...
to make that rule. So it's something we always work on. As I said, you know, we're not claiming we're perfect, but I think we've, I think we have very, very good systems in place today that keeps bad people out of the platform.
I got to say, the stance you're taking is indeed more restrictive. I hear most platform creators sort of punting this to the point of distribution where they're like, well, the creation tool shouldn't be responsible for this. The distribution platforms should be the ones, you know, bringing the hammer down. Look, I think these questions are like so difficult, right? And there's so many different ways you can think about them. You can think about them philosophically. If there's a question like freedom of speech, from a practical perspective, is this, you know, just about keeping out
the bad people that we all agree are bad people? Is it an economical question? Am I hindering my growth as a company because I'm overly restrictive and leaving the door open for other competitors? There's so many angles. It's not an easy question, right? And what we have talked a lot about is that there is a shift happening right now, specifically in AI, where a lot of companies are moving the point of moderation to the point of creation, right?
Where, of course, with the big language models, we see this all the time, right? There's a bunch of things they just won't talk about. And they'll definitely not help you with the recipe for a bomb or something like that. But even also more vanilla topics, like obviously politics being the obvious one, they'll also be kind of like tiptoeing very much around those kind of things. In our case, it's sort of the same thing where we actually limit you from actually creating the content. And I always explain this as like, that is actually very new, right? Imagine that when you're using PowerPoint, Microsoft Word,
It would stop you from making a slide about how to do something horrible, right? That's a very weird thought for most people. But in many ways, that's actually what we're doing and what we're building, right? And no one has ever...
held Microsoft responsible for the fact that a school shooter can write their manifesto in Microsoft Word, right? Or that I'm sure there's been made PowerPoints about how to do evil, horrible things in wars and so on. But we've never seen that as being Microsoft responsibility. We've always seen that as being, you know, the distribution platform's responsibility once that content actually gets uploaded somewhere.
But I do think that as a society, it's probably good that we're like extra careful when we roll out these things in the beginning. And then, you know, maybe in 10, 15 years, we'll have a different view on things.
how these technologies should be used and governed. But as a starting point, I mean, my own kind of moral inclination and the rest of the companies is that it's good to be a little bit on the back foot and be a little bit more restrictive than what some people feel comfortable with. Now, building off the discussion and looking towards the future, you talked about next year, you're going to have these avatars that you can talk to in real time. There's an interesting thing that we came across. We did this episode with ChatGPT Advanced Voice Mode,
where sort of the guardrails and restrictions that are put on it almost prevent the avatar from being like fully human-like, you know? Like if it's too much in a box, you can kind of see those seams and that kind of pops the illusion. How do you think about that tension, especially as you're moving towards these more expressive product experiences? I totally agree with you. And I think
It's so deeply fascinating to me how, as humans, we're so good at detecting something that's non-human. Like when you talk to the voice mode chat, right, like you understand, okay, this will help you answer like kind of practical, factual questions. And every time you ask it for an opinion or to be a little bit human, it'll just default to, you know, back to the kind of like robustness.
robot speech to some extent. At some point, you know, I think these restrictions will be lifted. There's a big market and there's a big appetite for interacting with computers that feels very, very lifelike, right? So I think we will see that kind of boundary disappear over time. As for us,
I think, again, we've made a decision to be a B2B company. And so we're not going to be offering virtual boyfriends and girlfriends anytime in the near future. But I think a lot of those properties are also very interesting in a business context, right? For example, if you're a salesperson and you do sales training, if you can role play with a prospect,
that can be programmed and prompted to act in a specific way, you could probably ramp a lot faster than if you have to read documents about how to come back from different objections. And I think there's a lot of other and potentially also more controversial applications of this. Think about like psychology, therapists, and doctors. I think we'll see a lot of those pop up in the next couple of years. And I think ultimately,
for a lot of these use cases to really work, it has to feel very lifelike. I think if you're interacting with a sales simulator, which looks like a computer game from the 90s,
you're just going to disconnect from it. It's not going to work, right? And I think right now we're very, very close, like passing through that uncanny valley, where it actually will feel very, very close to having a Zoom call with a real human being. It's interesting, even with your B2B focus, you just outlined a bunch of these scenarios where the box is large enough where you can have a very meaningful, interactive experience. So I have to ask you, how far away are we where we can have these AI avatars that can feel indistinguishable from a human conversation?
I don't think we're very far, to be honest. I think in 12 months time, you could probably simulate Zoom calls at a pretty good fidelity. I think the voice component of this is kind of getting to full maturity. There's a lot of great technologies out there. And the video part of it, depending a bit what you're trying to simulate, but
If you look at the videos that we're watching each other on right now, right? And that's a compressed Zoom feed, then that's not the most challenging thing to replicate. And you're already going to expect a whole bunch of artifacts and compressions and all those sort of things, right? So if that's kind of the goal, then I think you're not very far from it.
Let me ask it in a slightly different way, especially in the visual fidelity. And to use your example from earlier, how long before you can send that digital love letter to your girlfriend and she believes it was actually from you? I think next year. I don't think it's far away. I think looking at what we're building right now, we have the components. We've taught a system how to
predict the correct body language, facial expressions, gestures that goes with what you're saying. We can generate the voice in high enough quality where it sounds deep, felt, and emotional. So I really don't think that it's more than 12 months away. And it'll be very interesting. I usually, internally, we talk about this as like the chat GPT moment for video. I think what's so powerful about chat GPT is that it truly kind of broke through the uncanny valley, right? The first time you use chat GPT, it's so human that you begin talking to it like a human subconsciously.
without even thinking about it. I think for audio and text-to-speech,
kind of got there. And for video, I think this is getting very close. So internally, we think of this like when you can generate a video of like a vlogger on YouTube, like, you know, the traditional style, like sitting in my bedroom, kind of like talking at you, where you can generate that in a high enough quality, a high enough fidelity that you would come home after work one day and you'd put on an avatar video and just sit down and watch an avatar talk for 18 minutes, like a lot of people do with vloggers. That's where the total market for these technologies explodes by, like,
a thousand. I think when that happens, Pandora's box is open. There's going to be lots of ethical questions, lots of cultural questions, lots of art questions about what does this mean. And I think we are pretty meaningful and powerful moment.
So let's get into those ethical questions. I mean, it's fascinating, right? Let's say you have these photorealistic avatars that you can talk to in real time. Could this tech eventually replace humans completely in, let's say, like customer service roles?
And how do you think about that tension, right? It's like, how do you ensure this tech enhances rather than replaces human interactions? Because the thing that keeps popping into my head is like pulling up to a hotel at like 11 p.m. And instead of a human there, there's like a freaking iPad. You know, it's multimodal. It can see me. It'll check me and I'll do everything. It's perfect. It can work around the clock. But there's not a human. And you're already seeing some hotels try this where they've got, you know, essentially a remote worker playing that role right now.
but eventually it'll be autonomous. And that's just one example. So how do you think about that Pandora's box opening? I think there are ultimately two types of use cases. If you're calling a customer support, for example, you don't really care about who the customer support agent is, right? You just care about solving your problem the fastest way you possibly can. And I think if we replace that with an agent or a bot, I think no one will care about that. And I think that'll definitely happen. It's a matter of like when the technologies are good enough.
If you take the example of a salesperson or maybe a hotel receptionist, I think some hotels will want the cheapest room. They'll want to have the fastest experience and just like getting the key card and just getting into your room.
Other hotels will put a lot of emphasis on meeting and greeting you at the door, taking your luggage for you, explaining what's happening in the city this weekend and so on and so forth. That's a product that's pretty heavily service dependent. And I think for those kinds of things, we will really value the human connection. I think it's a bit the same thing with a salesperson. A lot of people want to talk to a salesperson because...
It's a relationship that you build with someone else, right? And I don't think we can replace that. And I think that the human touch and the human element will become much more important in the future. AI is going to be much faster at replacing people typing in Excel spreadsheets all day than a waiter giving you a great experience at the local restaurant.
I think that's well said. But I want to ask you, do you foresee a world where having a digital avatar is as common as somebody having a social media profile? Like Meta recently announced digital avatar tools for creators on their platforms, for instance. Absolutely. I think it's just an evolution of technology.
the profiles we all have today. In some sense, your profile on a social media network is also a clone of you. It's maybe not as visible as like an avatar of yourself, but that is what it is, right? It's a digital representation of who you are.
And if I go back to my childhood when I was on forums, right, we'd have a username. And then the next generation of forums, you'd have a username and a profile picture. And then you'd have like a profile picture with a profile page where you can write something about yourself and your interests or whatever. And then we all graduated to like social media. And now we have not just one picture of ourselves, we have a whole gallery of pictures that talks about us. And on TikTok, we have a whole library of videos that explain something about ourselves and who we are and our place in the world and so on and so forth. So I think in many ways, it's just a natural evolution of that.
That we will have kind of digital personas that represent us in the kind of digital space. So are you imagining this tech evolves to a level where, let's say, my digital self not only represents me in the virtual world, but in a sense kind of lives my virtual life for me? Yeah.
I don't think it's off the table, you know. I think, again, I don't think that I will enjoy interacting with my friend's bot as much as I'll enjoy interacting with my friend in the flesh, knowing that it's actually him. I think it'll be, again, more probably practical. And maybe we'll have like agents that says like, hey, you haven't seen Simon for six months.
why don't we arrange something? And I'll say, yeah, that's actually a good idea, right? Then my AI will go to Simon's AI and say, hey, these guys haven't met up for a while. Why don't we set up something for them in a couple of months' time, right? We know that they both love listening to techno music. So let's find a concert or
Well, you know, rave somewhere close by and set that up for you, right? So I think, again, it's more utilitarian, I think. I don't think it's going to be like our AI catching up on behalf of us and then giving each of the humans the lowdown of what was discussed in our student's life. I hope that's not going to be the case. But I think those kind of things, I think we will see a lot more, right? And for one, as someone who has a pretty busy life, I think that'd be pretty awesome, actually. But I think from a very philosophical perspective, you can argue that
Basically, everything online is already not real, right? Like your Instagram profile is not a real representation of you. We present ourselves in the best light possible. And I think our avatars and all the digital content we create around ourselves will probably just be an extension of that.
I think what we'll have to learn, and what I actually sense the younger generation to some extent are learning, is that this is like, it's fiction grounded in reality, right? And I usually use the example, like when you go to a dinner party, or when your parents went to a dinner party, also when you do for that matter, but in a different time and age, right? You sit down at the dinner table and you ask people, how's it going? And people do exactly the same thing in real life as they do on Instagram, right? It's very few people sit down at the table and say, actually, you know what?
I'm really tired of my wife. I want a divorce. I hate my job. Like most people, yeah, it's going pretty well. Like we project a version of ourselves to the world. And so I think it's like this idea of projecting yourself is not something that Instagram has created. It's always been the case. It's amplified, perhaps. It amplifies it and it makes it more concrete in many ways. But I think most human behavior has been the same for like thousands and thousands of years. We just express it in a different way.
So in this future where these digital humans are photorealistic, they've crossed the uncanny valley. What does that mean for individuality? Like, will we be confused by the fact, like, I can't even tell if this is like Victor that I'm interviewing or you delegated your deep fake to like come and do the interview. And it's like indiscernible to me. Like, what is going to happen to transparency in that context and individuality? I think that if you look at text, like you have been able to produce text and share with anyone online.
for the last many years. And I think by now, most of us have some sort of critical sense that just because something exists as text or the internet somewhere does not make it true. If you see a tweet from some random account saying World War IV just kicked off or whatever, your first instinct is going to be, that's probably not true. You've got to triangulate that information with a news source or you've got to whatever. And I think what's going to happen now is that
We're going to have to move from a world in which, in general, if someone has been recorded with a microphone, with a camera, most people assume that that means that just the fact that it exists means that it's true. That's not going to be the case anymore, right? And so it'll be even more important that all of us learn how to be literate with media. We need to look at things from different angles. Who created this piece of content? When was it created? Is this from a reputable source? And I think these technologies are developing very fast.
I think it's going to bridge into a world where we just per definition believe nothing of what we see online. We presume that everything is fiction. Everything is a Hollywood film, right? And I think also just that we basically go back to saying we can only trust things just because it happened in front of us if we saw it in real life.
That doesn't mean we can't trust anything we read or see online. We're just going to have to be more critical around, like, presuming that just because something exists, it does not actually make it true, right? And I think that's actually going to be a good thing that we just, per definition, think that, like, almost everything is fake and we work backwards from that. And there's a couple of ways we can work backwards from that. We're working with Adobe and some other tech companies, something called C2PA, which is the idea that you fingerprint and watermark content, essentially.
I think we'll move into a world where content is per default verified. So when you take a picture with your phone, when you make a video on Synthesia, when you create an image in Photoshop, you choose to register that piece of content in the global database of all the world's content. I hate the word, but I actually think a blockchain can be a good solution here because it's immutable.
When you then upload it to YouTube, whatever your social media platform is, it will look at the content. It'll identify it in the database for all the world's content and say, this video was created by Victor originally in 2019. It was made with Photoshop or with Tenthesia or whatever. Here's some information around it. We know where this came from originally. And that will move us into an internet, I think, where most content will be verified. That'll help you make a decision as to
to evaluate every single piece of content, essentially, and will then be in a world in which the content is not verified. It will stick out like a sore thumb. I think you're right. We are going into a world where authenticating content will be the default and we'll have provenance for most pieces of content that are created. Leaving aside sort of the
concerns about the technology. What is it about the potential of digital avatars that excites you most about humans wanting to interact, live, work, and play in this future? What can go right if you execute your mission correctly? I think the beautiful thing about technology is that it enables everyone to essentially have a voice, to be able to bring their ideas to life.
share their knowledge with the world. The two main vectors there is of course distribution, which is that you can share the content once you've created it, and the other one is creation, right? And I think we've seen in many modalities how powerful it is when you allow more people to create.
If you look at more recent examples, just in my own life, you know, I love music and I've seen firsthand how the fact that we've been able to produce digital instruments and we can sample things has led to new genres like electronic music, house and techno, for example, right? That's not, that would have been possible with real instruments. Yeah.
And when you see just more recently camera technology being very accessible, like YouTube and I mean, podcasts like we're doing right now, those are essential formats that didn't exist before we invented technologies that massively democratize that. And so for me, the promise of all this is like, well, what if everyone could be a Spielberg, right? What if anyone, a film student can go out and say, I have a great idea. And all I need to realize that is a lot of time and a good idea, right?
There'll be a whole bunch of content that we discussed about that's never going to be watched by anyone. It's going to be crappy content. But there will also be a film student from somewhere in some small country in the world that manages to produce amazing art despite not being connected to Hollywood. And I think that's really the thing that excites me the most. It's like freeing creativity, culture, and art is such an important part of moving humanity forward, of creating peace in the world, bridging all the gaps that we kind of have between us.
And I think that that's going to be a massively positive thing for the world. We've already seen it play out in many other types of media and getting video there as well. It's going to be, I think it's going to be transformational for the world. Love it. Victor, thank you so much for joining us. Thank you. Victor Ripper-Belly is the co-founder and CEO of Synthesia. And yes, I'm quite sure I spoke with a real Victor, not his digital twin.
Though, in a year or two, even that certainty might be up for debate. What fascinates me is how we've inadvertently paved the way for digital humans through our everyday tech compromises. I mean, think about it. We've grown completely comfortable with grainy video calls, audio glitches, and awkward zoom delays. These imperfections have actually created the perfect landing pad for digital avatars. We're already operating in a world where good enough video quality is, well, you know, good enough.
But what Synthesia shows us is that this isn't just about making believable digital humans. It's about transforming how we create and share ideas at scale. When I started making videos, it meant countless hours of shooting, reshooting, and painstaking editing just to get a simple message across. Now we're approaching a world where anyone with an idea can spin up a video presentation in minutes in any language with any number of perfectly delivered takes. And that power to create is incredible.
but it also means we're racing towards a fascinating cultural crossroads. Soon, everything we see online might come with its own digital birth certificate, a verified chain of creation that tells us exactly where it came from and how it was made. It's like we're building a new trust architecture for the digital age. In a world where anyone can create any video featuring any person saying anything, maybe what becomes most valuable isn't the tech that makes it all possible.
but the story underneath it all. The TED AI Show is a part of the TED Audio Collective and is produced by TED with Cosmic Standard. Our producers are Dominic Girard and Alex Higgins. Our editor is Banban Cheng. Our showrunner is Ivana Tucker. And our engineer is Asia Pilar Simpson. Our researcher and fact checker is Christian Aparta. Our technical director is Jacob Winnick. And our executive producer is Eliza Smith.
And I'm Bilal Velsadu. Don't forget to rate and comment, and I'll see you in the next one.
And guess what? Life insurance is probably a lot more affordable than you think. In fact, most people think life insurance is three times more expensive than it is. So with State Farm Life Insurance, you can protect your loved ones without breaking the bank. Not sure where to start? State Farm has over 19,000 local agents that can help you choose an option to fit your needs and budget. Get started today and contact a State Farm agent or go to statefarm.com.
No matter what branch you serve, military roots run deep. At American Military University, we recognize the sacrifices of service members and their loved ones. That's why we extend our military tuition savings to your family tree. Parents, spouses, legal partners, siblings, and dependents all qualify for our preferred military rate of just $250 per credit hour. American Military University. Savings for the whole family. Learn more at amu.apus.edu slash military.
Nothing delivers comfort and joy quite like the unrivaled quality and taste of Omaha Steaks. It's guaranteed perfection in every single bite. And right now, you can save on unforgettable gifts with 50% off site-wide at omahasteaks.com. Plus, score an extra $30 off with promo code HOLIDAY. With five generations of experience, they consistently deliver the world's best steak experience.
And the gifting experts at Omaha Steaks have made it easy to deliver the perfect gift with thoughtfully curated gift packages featuring gourmet favorites. From legendary steaks to mouth-watering desserts and more, save 50% off site-wide for a limited time at omahasteaks.com.
Plus, our listeners get an extra $30 off with promo code HOLIDAY. That's 50% off at omahastakes.com. And an extra $30 off with promo code HOLIDAY. Minimum purchase may apply.