We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode OpenAI’s New Model Transforms the World of AI-Generated Images

OpenAI’s New Model Transforms the World of AI-Generated Images

2025/4/27
logo of podcast AI Education

AI Education

AI Deep Dive Transcript
People
J
Jaeden Schafer
Topics
我被OpenAI新发布的图像生成模型彻底震撼了,它集成在ChatGPT中,功能强大,细节丰富,能够生成令人惊叹的视觉效果。它最让我兴奋的功能是能够在图像中生成文本,这是以往图像生成模型的重大突破。 我亲测了该模型,它能够根据简单的指令生成高质量的图像,例如信息图,而且文本清晰准确,字体一致,设计精美。这让我相信,该模型将会对Canva等图像设计公司造成巨大的冲击,未来人们很可能直接使用AI生成图像,而不再需要寻找模板或设计。 该模型还能保持图像中角色的一致性,并能将其以多种风格呈现,例如将同一个企鹅角色分别以几何风格、微缩模型风格、水晶风格等多种风格展现。这对于创意工作者来说非常有益,可以极大地提高创作效率。 此外,该模型能够处理复杂的提示,并根据详细的指令生成图像,例如指定图像中包含的各种元素、颜色、角度和光线等,这是以往AI模型无法做到的。它还能将文本和图像混合,例如将生成的图像添加到现实世界的照片中,实现图像与现实的无缝融合。 该模型还具有强大的图像编辑功能,允许用户调整长宽比、颜色(可以使用十六进制颜色代码),并添加透明背景,这对于需要精确控制图像细节的专业设计师来说非常实用。 我测试了该模型的图像重新生成功能,它能够根据上传的图像重新创建图像,并改变其风格,例如将播客封面转换成证件照,甚至能够重新生成包含大量文本元素的截图,虽然在处理一些复杂的图像时可能会出现崩溃的情况,但其效果依然令人印象深刻。 总而言之,OpenAI的这个新图像生成模型功能强大,实用性极高,它不仅能够生成高质量的图像,还能进行复杂的图像编辑和风格转换,这将彻底改变图像生成领域,对许多图像设计工具和公司构成威胁。它已经成为许多人梦寐以求的图像生成工具。

Deep Dive

Shownotes Transcript

Translations:
中文

Opening Eye for the first time in years has just launched their brand new image generation model and they have it embedded into ChatGPT. Today on the podcast, I'm going to be breaking down demos, how this is working. I've actually got a chance to play with this and use it and I am absolutely blown away by what

what this is actually able to do. So today on the podcast, we'll be diving into it. Now, the first thing I wanted to mention is the fact that as they've rolled this out, the number one feature that I'm excited about is the fact that it can generate text inside of the images. So this is something that has been notoriously terrible, you could say, for these image generation models in the past. They recently released

Came out with a tweet. They said, 4.0 image generation has arrived. It's beginning to roll out today to ChatGPT and Sora, to all pro plus teams and free users. So literally everybody is getting this. They then had a picture right below it where it's literally someone holding a boarding pass

It says, boarding pass introducing 4.0 image generation now in ChatGP and Sora, March 25th, 11 a.m. PDT. Okay, look, as you can tell now, it's very good at text. Look at all this accurate text. All that's written on the piece of paper, and I am blown away by, like, how clear this is. So you can tell it generated a boarding pass with all of this information on it, and the text looks perfect. So I decided to actually test this out because I've...

I was a little skeptical. Sometimes you can see these like demos and these tweets and it's like, wow, this looks amazing. You're not exactly sure where it where it sits on this. And so I decided to give it a test myself. And I literally decided to I was trying to just one shot an infographic. They said it could do infographics. I said, make an infographic on why Arizona is so hot.

And literally, without giving it any more sort of information on what I wanted, it created a very well-designed, it's got like this really cool desert-y yellow feel to it. It says, why Arizona's hot, desert climate, low elevation, high pressure. It's got explanations on each of those below them. And the text looks perfect. It's all the same font. It's all super cohesive. I didn't have to choose any design. In my opinion, this slash what comes after this is going to almost kill companies like

like Canva, or at least you're going to need to be able to maybe like generate something like this and open it in Canva. And it's going to be kind of like, Canva is going to have to figure out some AI tools to make it so you can just like edit this directly. Because I don't really see myself in the future if I want to create graphics or something, trying to go find a template or a design, I'm just going to one shot it. And like, it's very good at listening to your instructions. So I gave it virtually no instructions. I just said make an infographic.

But I could have said make an infographic, include cactuses, include the sun. So they actually went through demos of what it's capable of doing. And it's very, very impressive. One of the things that it can actually do is you are like working with it in a chat and it can be super consistent. So you can create the same character. They showed a demo of this where essentially they were creating the exact same character. He had it create like this.

this like, you know, geometric penguin character, for example. And then he got it to create the exact same geometric penguin, but all of a sudden he made it

in, you know, a realistic miniature style as if a professional made it and painted it. And all of a sudden they create like the same thing, but now it looks like a little miniature sculpture. It's the exact same penguin from the exact same angle, holding the exact same keys. And so to me, like this is very, very impressive. Now, the other thing that they were then able to do after they kind of did that was they went through and got it to generate this in a whole, in like a crystal style as if it was turf, as if it was lava, as if it was a gummy bear, as if it was

metal, like all of these different styles. And what's so impressive to me is that it is literally the exact same. It's the exact same penguin. We're just looking at it from a whole bunch of different

different ways. This is really good for creativity. You can essentially upload an image and get it to recreate it and then change the style. And you can imagine doing this yourself. I saw a demo where someone was essentially able to upload a photo. So this was Ali K. Miller on LinkedIn. She uploaded like a podcast cover that she had done with, you know, her profile picture or whatever professional studio photo or whatever. And then she said, create a

And so, by the way, this one that she's doing isn't even this same one from Google has released this. So OpenAI is coming up with sort of this response to this tool from Google and it's able to do pretty much the same things. But for the Google product anyway, she uploaded a podcast cover and said, create an official passport photo for this woman. Be sure to use the exact same woman. It created what was called like a passport photo, which looks just like a passport photo. And it looks exactly like her.

Like you could tell it's obviously recreated with AI, but it is her. And so we're getting to this point where these tools are so good at you upload a character and then it just recreates it in a bunch of different variations. So that was a really cool demo. The next thing that they showed off that this thing is very good at is generating complex prompts. So they essentially created a prompt

that they used for this, which they had 15 different sort of things. There was like a pair of googly eyes, a thumbs up emoji, a pair of blue scissors, a white giraffe, the word opening eye, like they had all of these different things that they wanted it to create. And then it created a graphic with all 15 of the things that described inside of that graphic. So the reason why they showcased that, and I'm so blown away and why I think it's important is because

Now it's to the point where these images, you know, we had image models that were good before. I think mid journey was pretty good. It would look quite realistic. You could generate really realistic photos of people. Now it's useful. Now you can say, I want there to be a, you know, like I want there to be a camera. I want there to be this specific product. I want there to be this specific lighting, this specific angle. I want you to have like,

10 of these things in the background and it will listen exactly to what you say, right? You're like, I want them to be wearing green shoes and I want there to be seven pairs of green shoes on the windowsill in the background. I want there to be five jackets hanging up in the closet. This was not something that previous AI models were able to do. And so it's really, really incredible.

that it has this capability now. So the next thing that it is now able to do is to essentially blend text and images. And I kind of went over that with my example of the infographic that I thought was really impressive. But I saw so many other examples where imagine now you create that infographic, but then you want to merge that with a real world photo. So they did a demo where they created an infographic

And then they created, essentially, they had somebody holding that infographic on the front cover of a textbook in front of the Arc de Triomphe in the real world. So it looks like a real photo with that infographic being like something on a piece of paper inside of it. That to me is like really cool. It's like, it's very meta. You can generate...

graphics. And then because you're chatting with the chat interface, you generate a really cool graphic. It's like now take that graphic, stick it on the front cover of a textbook and put a man doing this and it will then generate the next photo. And then you could say if you wanted to, you could say, now take that photo and put it on the front cover of a newspaper and have someone reading it. And it's like, now take that picture of a newspaper. Like you can just go in like you're creating graphics that coincide with graphics that get so detailed.

This is really, really cool. I think for the first time, these are very useful. Okay. A couple other features that I think are definitely worth mentioning. One of the big ones is how you can actually edit these photos. So there's a couple of cool things you can do. Obviously you're sitting there chatting with it, describing how you want to edit the photo. You can say things like specific aspect ratios, which is really cool. You can say exact colors. You can use hex codes.

My gosh, this is incredible for graphic designers that are like, hey, our brand colors are, you know, these five or these three hex codes. You put those hex codes in, it's going to recreate your logo or recreate, you know, stuff behind your behind the background of whatever your photo is. Now it's all going to match your brand colors. This is amazing. And of course, you can also do transparent background. So they showed a demo where they created a sticker of a dog and they made a transparent background. They actually were able to pull it off.

and literally download that as a transparent PNG background. They made a bunch of different stickers. I thought that was really cool. The last thing I wanted to show off was they did a demo where they essentially were able to go and create images in a bunch of different styles using GPT-4. So the first thing they did is they made a comic book. She drew out a comic book

took a picture of it uploaded it so this is what I then went and actually tested out and I'll show you what it was able to do but she just kind of a sketch of a comic book and then she said you know can you make this into a real comic of a dragon so then it went and actually illustrated it it took her sketch it it illustrated it into the color then it was pretty funny but then she kind of said like hey here's a picture of like a crystal penguin is one of the crystal penguins they I

I generated earlier in their demo and she's like, now change out the dragon for this crystal penguin and it threw it straight into the comic book. So it's like,

I think the ability to upload images and get it to kind of do these in real time. She also then took the crystal penguin and said, generate a lifelike statue of this in my living room. And it then was able to generate it in the living room. So you're uploading images inside of images. This is just incredibly useful, incredibly useful. So I decided to test like the image, like if it's actually able to regenerate images. I tried with like a bunch of, yeah,

memes where I'd like I took a screenshot of a meme and I said remake this photo at first it kind of glitched out when I said remake this photo and it just like Created the text for the photo then I told it to create an image and it wasn't very good based off of that So I was a little discouraged I think this probably has something to do with the way it created the text first so I tried it one other time and while it actually did crash on the video generation and

I took a screenshot of literally Riverside. It's the software I use to like record my podcast. And I said, recreate this image exactly, even including all the text. And like, we're talking about a screenshot of like tons of UI, tons of text elements all over the screen. It generated about half of the image before it crashed. But in that half of the image, it has like perfectly written out text that looks absolutely amazing.

I'm very, very blown away and impressed by this. So overall, it looks like we are seeing some absolutely incredible things from what I've been able to demo and test so far. I mean, we're talking like the text is amazing. Like what? We're recreating screenshots of whatever's on my screen. We're making one shot graphics. We're making stickers. We're editing things, transparent backgrounds. This is literally the image generator of, I think, many people's dreams.

I, to be honest, had completely kind of written off image generation on ChatGPT for over a year now. There's just so many better options. And this blows everybody. I mean, literally everybody out of the water. This becomes an incredibly useful tool to the point where I think it threatens Canva. It threatens like so many other

players. And so I'm impressed with Google, like I mentioned, has that one other tool that they have rolled out that's able to do some similar things. ChatGPT is just the biggest at this point. And so I think they didn't let Google steal their thunder for long. They came out with this and it is incredibly impressive. Highly recommend checking this out. If you're a pro user, if you pay for it, even a free user, this is rolling out to literally everybody. You have to go check it out. The one thing you need to make sure to do is you need to make sure that ChatGPT 4.0 is selected.

You don't need to go and select a dolly or go select any sort of image thing. Just make sure it's ChatTube T4O. That's where you're getting the best version of this image generation. Thanks so much for tuning in to the podcast. If you enjoyed it, make sure to like and subscribe over on YouTube. Drop us a comment or a review on Apple or Spotify. Thanks so much for tuning in. And I hope that you all have an amazing rest of your day.