We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode 10 Things Transformed by ChatGPT's New Image Generation Model

10 Things Transformed by ChatGPT's New Image Generation Model

2025/3/30
logo of podcast The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript
People
B
Balaji Srinivasan
N
NLW
知名播客主持人和分析师,专注于加密货币和宏观经济分析。
Topics
Balaji Srinivasan: ChatGPT的新图像模型带来了革命性的变化,它不仅改变了图像滤镜的使用方式,使我们只需要几个关键词就能生成不同风格的图像;它还将改变在线广告的制作流程,使广告单元的生成自动化;此外,它将提升模因的质量,因为降低提示工程的难度,更容易获得好的结果;它可能改变书籍的呈现方式,可以将文本转化为漫画形式,提高书籍的可访问性;它将改变幻灯片的制作方式,可以自动生成更具视觉吸引力的幻灯片图片;它将改变网站的设计,可以自动生成与网站风格一致的占位符图像;它可能改变电影的制作方式,可以轻松地重新制作旧电影或创造新的视觉风格;它将改变社交网络的使用方式,图像上传按钮旁边将会有一个生成图像的选项;它将改变图像搜索的方式,图像搜索结果中将会有一个生成图像的选项;它使得复制视觉风格变得非常容易,这将改变人们对视觉风格的区分方式。 NLW: 我同意Balaji的观点,ChatGPT的新图像模型带来了广泛而深远的影响。它不仅改变了滤镜的使用方式,更重要的是它可以将一种美学风格应用于整个体验,例如一个网站。在模因方面,虽然模型提升了模因的质量,但目前大多数人只是用新模型来重新制作旧模因,而不是创造全新的模因。该模型可以用于创作漫画或图画小说,并可以对图像进行精细的控制和修改,这将改变人们阅读和消费内容的方式。它将与vibe coding工具结合,改变软件开发的方式,文本到代码和文本到UI设计能力的结合将产生强大的效果。它可以改变电影的制作方式,例如可以将旧电影以新的视觉风格重新制作。最重要的是,它将彻底改变在线广告行业,降低广告制作成本,改变创意流程,并增加创意测试的可能性。

Deep Dive

Chapters
The new model allows applying aesthetics to entire experiences, not just single images. This is exemplified by the transformation of websites with a consistent aesthetic, showcasing the model's impact beyond simple Instagram filters.
  • Effortless application of aesthetics to entire experiences.
  • Transformation of websites with consistent aesthetic.
  • Impact beyond simple Instagram filters.

Shownotes Transcript

Translations:
中文

Today on the AI Daily Brief, 10 things that are transformed by ChatGPT's new image model. Hello, friends. Welcome back to another long reads episode of the AI Daily Brief. Although once again, today we're doing things a little bit differently. This week's big topic of conversation has, of course, been ChatGPT's new image generation model. The tangibility of image generation made it sweep aside even other really important news like Google's Gemini 2.5 release.

What's more, this was one of those model moments where the new performance was not just incremental, but actually opened up entirely new categories of use cases that to the extent that they had been explored with previous models relied on either complex wrapper software or complicated workarounds and workflows, but are now just built into the model at a core level.

And so what we're going to do today is read a long tweet from Balaji Srinivasan about 10 things that this new model release changes. I'm then going to pick out a few of them that I think are most important or most interesting to discuss and build the conversation from there. So let's do this. Let's read through Balaji's tweet first, and then I'll dig in for myself. Balaji writes,

1. This changes filters. Instagram filters require custom code. Now all you need are a few keywords like Studio Ghibli or Dr. Seuss or South Park. 2. This changes online ads. Much of the workflow of ad unit generation can now be automated. 3. This changes memes. The baseline quality of memes should rise because a critical threshold of reducing prompting effort to get good results has been reached. 4.

Four, this may change books. I'd like to see someone take a public domain book from Project Gutenberg, feed it page by page into Claude, and have it turn it into comic book panels with the new ChatGP team. Old books may become more accessible this way.

Five, this changes slides. We're now close to the point where you can generate a few reasonable AI images for any slide deck. With the right integration, there should be less bullet point only presentations. Six, this changes websites. You can now generate placeholder images in a site-specific style for any image tag as a kind of visual lore on Ipsum. Seven, this may change movies. We could see shot-for-shot remakes of old movies and new visual styles with dubbing just for the artistry of it. Though these might be more interesting as clips than as full movies.

8. This may change social networking. Once this tech is open source and/or cheap enough to widely integrate, every upload image button will have a generate image alongside it. 9. This should change image search. A generate option will likewise pop up alongside available images. 10. Visual styles have suddenly become extremely easy to copy, even easier than front-end code. Distinction will have to come in other ways.

All right, so that's the frame set. I'm not going to talk about all of these. I'm going to bop around a little bit to the ones that I find most interesting to explore a little bit more deeply.

First of all, let's talk about biologies first, the idea that this changes filters. Now, obviously, we have seen this happen over the past few days, where a huge number of people have gibblified themselves or their families. Sam Altman himself has a Studio Ghibli-style image now as his avatar for X. But I think it's not just changing filters. I think it's the fact that filters can now apply to entirely new domains.

Basically, instead of just applying a filter to a single image or photo, you can now effortlessly apply an aesthetic to an entire experience.

Take, for example, VC and builder Yohei of Untapped VC, who gibblified their entire website. For those of you who are listening, not watching, this is another time that it's really worth checking out the visual, even if it's just you going to untapped.vc. In addition to the background of the website feeling like a Miyazaki movie, all of the Portco logos are once again a cover image that looks like a Studio Ghibli film.

Now, on the one hand, you could dismiss this as just a very in-touch VC, an AI community member riding the AI trend. But I think what it shows is the idea of being able to port entire aesthetics onto big categories of content on the scale of an entire website. So Apology is right that it does change filters, but it's not just Instagram filters. Filters can now be applied to a much wider range of assets and domains.

Next up, let's talk about memes. Bhaji says the baseline quality of memes should rise because a critical threshold of reducing prompting effort to get good results has been reached. What we don't have yet, now just whatever it is, four or five days after this model was released, is the first example of a specific meme. We have a meme template in that we have gibblified everything, but we don't have a native chat GPT image generation meme that has arisen specifically because of the new capabilities.

Instead, where everyone's been for the last couple of days is just copying old memes in the new style. Dan Romero did the classic bar scene from Good Will Hunting, obviously in Studio Ghibli style, with the text, Of course that's your contention. You're a first-day chat GPT image prompter. You just got finished converting popular internet memes to anime. Studio Ghibli, probably. You're going to be convinced of that till next week when you get to SpongeBob. And then you're going to be talking about how the visual styles of late 1990s Nickelodeon translate perfectly to the format.

That's going to last until next month. Then you're going to be in here regurgitating diffusion models are actually better exposed talking about, you know, the superior techniques available in the upcoming mid journey V7. Now there is a very specific audience for that meme of which I am probably the epicenter. But the point is that every old meme that has ever been on the internet at this point is being gibblified in this way.

Pixlossopher got even more meta when they said, "Okay, this is a meme created by GPT when I asked it to make a meme about humans using AI to make memes. It shows a four-panel cartoon titled 'The Evolution of Meme Creation.' In 10,000 BC, a caveman draws a mammoth and says, 'Me draw funny mammoth tribe laugh.' In 2005, a programmer writes 'Me using cool cool fonts for memes.' In 2025, a reclining person says, 'Hey, I make a meme about humans using AI to make memes.'

And in 2030, a humanoid robot says, wait, am I making fun of myself? Am I the meme now? Next up, let's talk about number four. This may change books. A couple of things here that are interesting to me. First of all, you are seeing a lot of comic book or graphic novel style creation already. Midas Quant, for example, gave ChatGPT four images and asked it to turn it into a comic book and actually got something back.

I saw other people using the character consistency dimension of this to make storybooks for their kids. Basically, in other words, one of the capabilities of this new model is that because it is natively integrated with the text model, you can use text to have fine-grained control and change very specific parts of the image.

So you can start with one base image and then ask to put that same character in a new pose or a new context. And it's going to do that in a much better way than the previous versions of the model, which had to go outside to the other separate Dali model and then bring it back in could actually do. So right there on your own already, this is going to be way better for any sort of visual storytelling like that.

I think Balaji is right, though, that there may be some other types of capabilities that aren't just generating totally new books from scratch, but actually change the way that we interact with existing material as well. Interestingly enough, Ryan Hoover from Product Hunt posted separately, Request for Startup, Audible 2.0.

Books are too verbose. Voice readers are often sterile. Note-taking is clumsy. But thankfully, we have LLMs today that can rewrite to be more concise and adapted to my preferred style of communication. Allow me to select a preferred reader. Morgan Freeman, please. Bookmark key concepts via dictation. E.g. save the point about X. Now, Ryan says he doesn't think that this would be a good business, and obviously the licensing is tricky, but he still wants it.

I do think that the choice that's going to be offered in the future around how to consume content is really powerful and what this new model opens up is the visual aspect of that. Next up, let's talk about coding for a minute. In number 10, Balaji writes, "In general, visual styles have suddenly become extremely easy to copy, even easier than front-end code. Distinction will have to come in other ways."

What I think is interesting about this is the way that this tool is going to hybridize and blend with the rise of vibe coding tools. For example, Riley Brown fed in a bunch of code to ChatGPT and asked it to render it as an image, which it did flawlessly. I've seen other people go the other way, asking it to design a particular UI and then turn it into code, which it once again did really well. And in general, this is one more thing that is transforming what it's going to mean to build production software.

On the one hand, we have text-to-code capabilities coming up. And on the other hand, we have text-to-UI design capabilities coming online via this sort of image generation. And where those two meet will be a very powerful place. Now, as an aside, the CEO of Replit has officially come out saying that he no longer thinks you should learn to code, which is probably a longer conversation. But as these categories of tools converge, you can kind of see why he might feel that way.

Number seven, this may change movies. We could see shot-for-shot remakes of old movies and new visual styles with dubbing just for the artistry of it. Though these might be more interesting as clips than as full movies.

While count on the internet to get this one sorted right away, AI filmmaker PJ Ace posted within hours of this model going live, What if Studio Ghibli directed Lord of the Rings? I spent $250 in cling credits and nine hours re-editing the Fellowship trailer to bring that vision to life. And sure enough, we have the full Fellowship of the Ring trailer as a Studio Ghibli film rendered incredibly impressively.

Now, one could be tempted to write this off right now as just simple novelty or toy. But novelties and toys are so often the way that we experiment with what will eventually become transformational.

I would expect that the first wave of this transformation will be things exactly like this, scoring viral hits by applying one aesthetic filter to a popular media asset in a different aesthetic. But I'm also quite sure that that's not where this will stay. And this sort of weird blend and hybridization will just become something that has a bigger, more fundamental impact on creation.

Finally, let's talk about number two. This changes online ads. This has maybe been the most obvious transformation and the one that feels like it has the most disruption to an existing business. Lorenzo Green writes, the AI image generation war for ads is over.

Ad teams are about to get smaller, way smaller. By way of example, he took a book, Dopamine Nation, and asked ChatGPT to create an image of Mark Zuckerberg reading the book, which it did flawlessly. He took Liquid Death in an Apple ad and said basically create an ad for Liquid Death in this style.

He points out that if you have an asset like a shoe, but no model, that is no longer a problem, creating an image of a happy nurse wearing a particular shoe and so on and so forth. In fact, after Studio Ghibli memes, this is probably the most prominent type of generation that you've seen on your timeline.

What's significant too is that while people are mostly showing their one-shot generations, again, the native capability of the model to custom modify very particular pieces of a generation means that you're not just stuck hoping your one-shot generation gets it right. You can go back and actually have fine-grained editing. So where does this leave the ad industry? I do not think that it just ends it overnight. The world is awash in visual ads of all types.

and some are better than others. Taste, creativity, concept, these are not things that are limitless even when you introduce AI. Think about Super Bowl ads. Super Bowl ads are literally the most important ad asset of any given year.

Everyone who's making a Super Bowl ad has spent at least, and I'm not joking, at least $10 million on that ad between the ad time and the ad creation process. And usually it's closer to $15 or $20 million. And still most of them are absolute garbage. Still, what does absolutely change is that there's no way that the cost structure for visual or print ads doesn't come down. There's no way that the creative process around these assets doesn't change.

We're back once again to the Doctor Strange theory of AI work, where I think part of what will be different is that creatives will test out a huge variety of ideas. Instead of sitting there in pitch meetings with a very small number of mock-ups, creatives will test hundreds of concepts. They'll design swarms of agents to test concepts based on dozens or hundreds of different styles.

They'll probably have other agents which test all of those ads against panels of theoretical people. And then ultimately, they'll take all of the advice and the ideas from AI and use their human taste to make a judgment call. Still, it is undeniable that this is a massive, massive structural change moment for the ad industry. And trying to view it as anything less than that is sure to be trouble for businesses in that space who take that opinion.

Now, again, we are just a couple days out after this release. We're barely scratching the surface of what it can do. And already we've got these 10 areas or more where things really have changed. I, for one, can't wait to see what comes next. But for now, let's close this so we can go mess around and giblify all of our family photos before that trend dies entirely. Appreciate you listening or watching as always. And until next time, peace.