We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode The Future of AI Art? OpenAI’s Latest Model Might Be It

The Future of AI Art? OpenAI’s Latest Model Might Be It

2025/4/15
logo of podcast Lex Fridman Podcast of AI

Lex Fridman Podcast of AI

AI Deep Dive AI Chapters Transcript
People
J
Jaeden Schafer
Topics
Jaeden Schafer: 我对OpenAI最新发布的AI图像生成模型印象深刻,它集成在ChatGPT中,能够生成图像中的文本,这是以前模型难以实现的。该模型在生成文本方面表现出色,例如可以生成包含完美文本的登机牌。它可以轻松创建高质量的信息图表等设计,这可能会对Canva等公司构成冲击。该模型可以保持图像生成的一致性,并能根据用户的要求改变图像风格,而保持主体不变。例如,它可以根据上传的图像重新创建图像,并改变其风格,例如将一张照片转换成护照照片。它可以处理复杂的提示,并根据详细的指令生成图像,这是以前模型做不到的。它可以混合文本和图像,例如将生成的图像添加到真实世界的照片中。该模型允许用户编辑图像,例如调整长宽比、颜色和添加透明背景,这对于平面设计师来说非常有用。它可以根据用户提供的草图或图像生成不同风格的图像,例如将草图转换成漫画,或将图像转换成雕塑。它可以重现图像,即使是包含大量文本的截图,尽管在处理复杂图像时可能会出现崩溃。总的来说,这是一个非常强大的工具,可以生成高质量的图像,并对其他图像生成工具构成威胁。

Deep Dive

Chapters
OpenAI has integrated a groundbreaking image generation model into ChatGPT, capable of producing stunning visuals with text. This technology surpasses previous models in accuracy and design capabilities, potentially disrupting existing graphic design tools like Canva.
  • OpenAI's new image generation model is embedded in ChatGPT.
  • It excels at generating text within images.
  • Its infographic creation capabilities are noteworthy, potentially challenging platforms like Canva.

Shownotes Transcript

Translations:
中文

Opening Eye for the first time in years has just launched their brand new image generation model and they have it embedded into ChatGPT. Today on the podcast, I'm going to be breaking down demos, how this is working. I've actually got a chance to play with this and use it and I am absolutely blown away by what

what this is actually able to do. So today on the podcast, we'll be diving into it. Now, the first thing I wanted to mention is the fact that as they've rolled this out, the number one feature that I'm excited about is the fact that it can generate text inside of the images. So this is something that has been notoriously terrible, you could say, for these image generation models in the past. They recently released

Came out with a tweet. They said, 4.0 image generation has arrived. It's beginning to roll out today to ChatGPT and Sora, to all pro plus teams and free users. So literally everybody is getting this. They then had a picture right below it where it's literally someone holding a boarding pass

It says, boarding pass introducing 4.0 image generation now in ChatGP and Sora, March 25th, 11 a.m. PDT. Okay, look, as you can tell now, it's very good at text. Look at all this accurate text. All that's written on the piece of paper, and I am blown away by, like, how clear this is. So you can tell it generated a boarding pass with all of this information on it, and the text looks perfect. So I decided to actually test this out because I've...

I was a little skeptical. Sometimes you can see these like demos and these tweets and it's like, wow, this looks amazing. You're not exactly sure where it where it sits on this. And so I decided to give it a test myself. And I literally decided to I was trying to just one shot an infographic. They said it could do infographics. I said, make an infographic on why Arizona is so hot.

And literally, without giving it any more sort of information on what I wanted, it created a very well-designed, it's got like this really cool desert-y yellow feel to it. It says, why Arizona's hot, desert climate, low elevation, high pressure. It's got explanations on each of those below them. And the text looks perfect. It's all the same font. It's all super cohesive. I didn't have to choose any design. In my opinion, this slash what comes after this is going to almost kill companies like

like Canva, or at least you're going to need to be able to maybe like generate something like this and open it in Canva. And it's going to be kind of like, Canva is going to have to figure out some AI tools to make it so you can just like edit this directly. Because I don't really see myself in the future if I want to create graphics or something, trying to go find a template or a design, I'm just going to one shot it. And like, it's very good at listening to your instructions. So I gave it virtually no instructions. I just said make an infographic.

But I could have said make an infographic, include cactuses, include the sun. So they actually went through demos of what it's capable of doing. And it's very, very impressive. One of the things that it can actually do is you are like working with it in a chat and it can be super consistent. So you can create the same character. They showed a demo of this where essentially they were creating the exact same character. He had it create like this.

this like, you know, geometric penguin character, for example. And then he got it to create the exact same geometric penguin, but all of a sudden he made it

in, you know, a realistic miniature style as if a professional made it and painted it. And all of a sudden they create like the same thing, but now it looks like a little miniature sculpture. It's the exact same penguin from the exact same angle, holding the exact same keys. And so to me, like this is very, very impressive. Now, the other thing that they were then able to do after they kind of did that was they went through and got it to generate this in a whole, in like a crystal style as if it was turf, as if it was lava, as if it was a gummy bear, as if it was

metal, like all of these different styles. And what's so impressive to me is that it is literally the exact same. It's the exact same penguin. We're just looking at it from a whole bunch of different

different ways. This is really good for creativity. You can essentially upload an image and get it to recreate it and then change the style. And you can imagine doing this yourself. I saw a demo where someone was essentially able to upload a photo. So this was Ali K. Miller on LinkedIn. She uploaded like a podcast cover that she had done with, you know, her profile picture or whatever professional studio photo or whatever. And then she said, create a

And so, by the way, this one that she's doing isn't even this same one from Google has released this. So OpenAI is coming up with sort of this response to this tool from Google and it's able to do pretty much the same things. But for the Google product anyway, she uploaded a podcast cover and said, create an official passport photo for this woman. Be sure to use the exact same woman. It created what was called like a passport photo, which looks just like a passport photo. And it looks exactly like her.

Like you could tell it's obviously recreated with AI, but it is her. And so we're getting to this point where these tools are so good at you upload a character and then it just recreates it in a bunch of different variations. So that was a really cool demo. The next thing that they showed off that this thing is very good at is generating complex prompts. So they essentially created a prompt

that they used for this, which they had 15 different sort of things. There was like a pair of googly eyes, a thumbs up emoji, a pair of blue scissors, a white giraffe, the word opening eye, like they had all of these different things that they wanted it to create. And then it created a graphic with all 15 of the things that described inside of that graphic. So the reason why they showcased that, and I'm so blown away and why I think it's important is because

Now it's to the point where these images, you know, we had image models that were good before. I think mid journey was pretty good. It would look quite realistic. You could generate really realistic photos of people. Now it's useful. Now you can say, I want there to be a, you know, like I want there to be a camera. I want there to be this specific product. I want there to be this specific lighting, this specific angle. I want you to have like,

10 of these things in the background and it will listen exactly to what you say, right? You're like, I want them to be wearing green shoes and I want there to be seven pairs of green shoes on the windowsill in the background. I want there to be five jackets hanging up in the closet. This was not something that previous AI models were able to do. And so it's really, really incredible.

that it has this capability now. So the next thing that it is now able to do is to essentially blend text and images. And I kind of went over that with my example of the infographic that I thought was really impressive. But I saw so many other examples where imagine now you create that infographic, but then you want to merge that with a real world photo. So they did a demo where they created an infographic

And then they created, essentially, they had somebody holding that infographic on the front cover of a textbook in front of the Arc de Triomphe in the real world. So it looks like a real photo with that infographic being like something on a piece of paper inside of it. That to me is like really cool. It's like, it's very meta. You can generate...

graphics. And then because you're chatting with the chat interface, you generate a really cool graphic. It's like now take that graphic, stick it on the front cover of a textbook and put a man doing this and it will then generate the next photo. And then you could say if you wanted to, you could say, now take that photo and put it on the front cover of a newspaper and have someone reading it. And it's like, now take that picture of a newspaper. Like you can just go in like you're creating graphics that coincide with graphics that get so detailed.

This is really, really cool. I think for the first time, these are very useful. Okay. A couple other features that I think are definitely worth mentioning. One of the big ones is how you can actually edit these photos. So there's a couple of cool things you can do. Obviously you're sitting there chatting with it, describing how you want to edit the photo. You can say things like specific aspect ratios, which is really cool. You can say exact colors. You can use hex codes.

My gosh, this is incredible for graphic designers that are like, hey, our brand colors are, you know, these five or these three hex codes. You put those hex codes in, it's going to recreate your logo or recreate, you know, stuff behind your behind the background of whatever your photo is. Now it's all going to match your brand colors. This is amazing. And of course, you can also do transparent background. So they showed a demo where they created a sticker of a dog and they made a transparent background. They actually were able to pull it off.

and literally download that as a transparent PNG background. They made a bunch of different stickers. I thought that was really cool. The last thing I wanted to show off was they did a demo where they essentially were able to go and create images in a bunch of different styles using GPT-4. So the first thing they did is they made a comic book. She drew out a comic book

took a picture of it uploaded it so this is what I then went and actually tested out and I'll show you what it was able to do but she just kind of a sketch of a comic book and then she said you know can you make this into a real comic of a dragon so then it went and actually illustrated it it took her sketch it it illustrated it into the color then it was pretty funny but then she kind of said like hey here's a picture of like a crystal penguin is one of the crystal penguins they I

I generated earlier in their demo and she's like, now change out the dragon for this crystal penguin and it threw it straight into the comic book. So it's like,

I think the ability to upload images and get it to kind of do these in real time. She also then took the crystal penguin and said, generate a lifelike statue of this in my living room. And it then was able to generate it in the living room. So you're uploading images inside of images. This is just incredibly useful, incredibly useful. So I decided to test like the image, like if it's actually able to regenerate images. I tried with like a bunch of, yeah,

memes where I'd like I took a screenshot of a meme and I said remake this photo at first it kind of glitched out when I said remake this photo and it just like Created the text for the photo then I told it to create an image and it wasn't very good based off of that So I was a little discouraged I think this probably has something to do with the way it created the text first so I tried it one other time and while it actually did crash on the video generation and

I took a screenshot of literally Riverside. It's the software I use to like record my podcast. And I said, recreate this image exactly, even including all the text. And like, we're talking about a screenshot of like tons of UI, tons of text elements all over the screen. It generated about half of the image before it crashed. But in that half of the image, it has like perfectly written out text that looks absolutely amazing.

I'm very, very blown away and impressed by this. So overall, it looks like we are seeing some absolutely incredible things from what I've been able to demo and test so far. I mean, we're talking like the text is amazing. Like what? We're recreating screenshots of whatever's on my screen. We're making one shot graphics. We're making stickers. We're editing things, transparent backgrounds. This is literally the image generator of, I think, many people's dreams.

I, to be honest, had completely kind of written off image generation on ChatGPT for over a year now. There's just so many better options. And this blows everybody. I mean, literally everybody out of the water. This becomes an incredibly useful tool to the point where I think it threatens Canva. It threatens like so many other

players. And so I'm impressed with Google, like I mentioned, has that one other tool that they have rolled out that's able to do some similar things. ChatGPT is just the biggest at this point. And so I think they didn't let Google steal their thunder for long. They came out with this and it is incredibly impressive. Highly recommend checking this out. If you're a pro user, if you pay for it, even a free user, this is rolling out to literally everybody. You have to go check it out. The one thing you need to make sure to do is you need to make sure that ChatGPT 4.0 is selected.

You don't need to go and select a dolly or go select any sort of image thing. Just make sure it's ChatTube T4O. That's where you're getting the best version of this image generation. Thanks so much for tuning in to the podcast. If you enjoyed it, make sure to like and subscribe over on YouTube. Drop us a comment or a review on Apple or Spotify. Thanks so much for tuning in. And I hope that you all have an amazing rest of your day.