We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Authoring Creativity With AI: Researcher Patrick Hebron

2024/6/12

Me, Myself, and AI

AI Deep Dive AI Chapters Transcript

People

Patrick Hebron

Sam Ransbotham

Topics

Patrick Hebron: 本书探讨了机器学习与设计的交叉领域，机器学习可以帮助设计师解决复杂的配置问题，但其固有的不精确性对软件设计提出了挑战，需要设计师重新思考用户体验，尤其是在软件出现错误或误解时。应对AI系统不确定性的方法包括：当AI系统失败时回退到传统功能；让机器向用户展示其理解的内容，避免基于误解采取行动。AI工具的可发现性也是一个挑战，与传统软件的菜单系统相比，AI工具的功能可能隐藏更深，难以找到。基于归纳学习的AI系统，其结果存在不确定性，因为永远无法保证已经考虑了所有可能性。作者的背景是哲学和电影制作，这让他对设计工具和AI的结合产生了兴趣。作者学习编程是为了创建设计工具来制作电影，这让他对设计工具和AI的结合产生了持续的兴趣。AI可以通过改进现有工具的功能，在不改变用户体验的情况下提升设计效率。Adobe的内容感知填充功能就是一个例子，AI可以通过神经网络修复技术来改进其功能。AI带来了一些前所未有的设计能力，例如从文本生成图像或重新摆放人体姿势。潜在空间导航是一种强大的设计机制，它允许用户在机器学习模型的内部表示中探索和发现新的设计可能性。成功的工具应该能够被用户以意想不到的方式使用，而不仅仅是按照预期的方式使用。Minecraft中构建8位计算机的例子说明了开放式工具的价值，用户可以利用工具创造出意想不到的结果。AI的应用范围正在不断扩大，从最初的垃圾邮件分类到现在的创意领域，甚至可能扩展到科学和工程领域。AI可以帮助人们更好地进行设计和工程工作，就像艺术家在创作过程中不断调整和完善作品一样。AI有可能在科学和工程领域发挥变革性作用，帮助人们解决复杂的问题。AI不一定是零和博弈，它可以与人类合作，共同解决复杂问题。将AI应用于科学领域的一个挑战是模拟所作用的系统。无所不知的AI也存在缺点，它可能缺乏独特的视角和观点。通过强化学习和人类反馈来训练语言模型，可以使模型更具个性化和观点。 Sam Ransbotham: AI工具的用户界面设计面临挑战，需要在向用户展示新功能的同时避免信息过载。AI工具的用户界面设计需要在熟悉性和开放性之间取得平衡。

Deep Dive

Chapters

Patrick Hebron discusses the integration of generative AI in creative fields, emphasizing its potential to elevate human creativity and the challenges of designing user-friendly interfaces.

Shownotes Transcript

Translations:

中文

Today, we're airing an episode produced by our friends at the Modern CTO Podcast, who were kind enough to have me on recently as a guest. We talked about the rise of generative AI, what it means to be successful with technology, and some considerations for leaders to think about as they shepherd technology implementation efforts. Find the Modern CTO Podcast on Apple Podcast, Spotify, or wherever you get your podcast.

How does the use of generative AI in creative fields translate to opportunities for the future? Find out on today's episode. I'm Patrick Hebron, author of Machine Learning for Designers, and you're listening to Me, Myself, and AI. Welcome to Me, Myself, and AI, a podcast on artificial intelligence and business. Each episode, we introduce you to someone innovating with AI. I'm Sam Ransbotham, professor of analytics at Boston College.

I'm also the AI and Business Strategy Guest Editor at MIT Sloan Management Review.

And I'm Sherven Kodubande, senior partner with BCG and one of the leaders of our AI business. Together, MIT SMR and BCG have been researching and publishing on AI since 2017, interviewing hundreds of practitioners and surveying thousands of companies on what it takes to build and to deploy and scale AI capabilities and really transform the way organizations operate.

Hi, everyone. Today, Sam and I are excited to talk with Patrick Hebron. He's the author of Machine Learning for Designers and he's held roles at NVIDIA Omniverse, Stability AI, and Adobe. Patrick, thanks for taking the time to talk with us. Thanks for having me. It's great to be here. So I have to say right off the bat, I'm curious about why machine learning is different for designers. What does the for designers part of that mean?

When I wrote the book, this was not an intersection that was a very sensible one to most people. So I'd been working on that intersection since the time of my master's, and I sort of got into the idea that in design there can be really challenging configuration problems, and machines can help to play a role in figuring out how to sort through lots of different permutations and come to an arrangement that might be useful to people. So that was what was happening in my own work at the time.

And then as the technology was starting to advance quite a bit, it seemed to me that there was going to be some really big differences in how we thought about the production of software as a result of AI. Conventional software is always correct about mundane things like, say, 2 plus 2.

And machine learning, of course, enables you to do much more complex things like identify faces in photos or a million things. But it's not always right about those things. There's an inherent imprecision to that. And this fact alone carries a huge implication when you're designing software, thinking about how the user navigates through a process and particularly what happens when they hit a dead end or a misunderstanding.

O'Reilly approached me about writing that book, and I was really excited to tackle this subject and start to help designers to think about how this would transform their practice. That is a fundamentally different approach because we're used to software being very deterministic. We're used to processes working the same way they work whenever you test them in the lab. But then they work differently when...

You introduce noise and fuzziness into the whole thing. So how do people deal with the fact that what they test and what they work on isn't necessarily what happens when it goes into production? Yeah, it's funny because...

I don't want to liken machine learning models too much to a human, but I guess one thing we do have in common with them is this kind of imprecision. We're capable of grand notions, but you can never guarantee that what's in someone else's head is cohesive in exactly the same way that it is in yours. Yeah, I find most people are not cohesive with what I think. Right, same.

So one thing is to remember that there is conventional software still around, right? And so having sort of a backup plan or reverting to conventional functionality when a more complex AI system fails is one mitigation. Of course, there's a challenge with that, which is that probably if what your software is doing required AI in the first place, then the fallback may be difficult because the conventional software is not up to the job.

But having the machine sort of present back to the user what it's understood, I think is very important. So it doesn't just sort of go off and act on a misconception. Another challenge is discoverability. We see this with, say, Alexa, right?

There's all these features, but they're hidden somewhere in there. And so how do you know what you are or not able to do? This, I think, is in certain ways a regression from traditional software. Giant menu systems have been sort of the enemy of my career, I guess. But at the same time, they do have a certain upside, which is

There's kind of an obvious path to learning what the software can do, right? So you go find some particular feature that you need at the moment. You find it in this menu system, and it's probably adjacent to some related features. And so this exposes you at least to seeing their names, and perhaps this will lead you to explore towards them. You don't necessarily have that with, say, an emergent feature set or the ability to speak to your computer and ask for something.

That's an interesting dimension. I hadn't really thought about that. But as you were saying that, I was thinking back on my own past life. And there used to be a product called Microsoft FoxPro, which was one of the early database systems. And I, at the time, was super into knowing every possible thing that that piece of software would do. And one of the things we did was we opened up the executable and looked for signatures of commands that were in there, even if they were not in the documentation.

But that doesn't exist anymore. I mean, the world you're talking about here is very different. There is no executable to open. There is even often no documentation to go through. So, you know, this rapid evolution of everything at the same time seems really fascinating. I hadn't thought about that.

Yeah. And there's another kind of devious point that comes with what you're saying, which is that, you know, the software could work time and time and time again in relation to, say, some particular type of query. And then the thousandth time, it doesn't understand correctly. It completely goes in a different direction. You know, I think that's just the nature of inductive learning is you never have sort of a strong guarantee of

that you have kind of seen all possible things, right? Like by learning from experience, you know, we see, say, two cars, and now we have some sense of the range of car sizes, right? We see, you know, a million cars, and now we feel pretty confident that we have a real understanding, a mapping of the range of sizes that a car could be. But we really never have a guarantee that we've seen the largest or the smallest car, right?

And this kind of fuzziness at the edges is a challenge, but at the same time gives us everything else that we get with AI. Yeah, that's pretty fascinating, too. Again, I hadn't really thought about that sort of how we go through that inductive process there. Patrick, how did you become interested in these types of problems? Can you tell us a bit more about your background?

My path is actually probably a little bit unconventional. As an undergrad, I studied philosophy, particularly aesthetics and semiology, and then also did a second major in film production. For my film production work, I ended up making a narrative film, so not sort of that connected to this work. For my philosophy work, I got really interested, particularly in the American philosopher Charles Peirce, a semiologist working with the theory of science,

For that project, I talked about special effects, essentially, as an artistic medium, thinking about the implication of a medium where you have the ability to portray anything imaginable, as you do in painting, but with the apparent credibility of photography. That combination is a really kind of interesting, powerful, perhaps even dangerous thing.

At the time, I was imagining how this would come together through essentially an advancement in computer graphics. AI wasn't really on my radar at that point. And also, it was a long time from it reaching that kind of ability. But this was really interesting to me. And so when I got out of undergrad, people said, what kind of movie would you point me to that kind of goes along with what you're saying? And I said, well, I don't know that any has actually been made, really. It fits this description yet.

So, you know, I started to try to make those movies. And pretty quickly, this led me to feel that the tools that were out there for CG production were not really set up for the things I was trying to do. And so as a kid, I had done like, you know, a little bit of like electronics experimentation and a little bit of like programming and stuff, but not a ton. So it was then that I started to learn how to write software in order basically just to build these design tools to make the movies that I was trying to make.

And then very quickly, you know, I realized that that was my real interest. Thinking about how design tools and particularly AI and design tools could work became my central interest. And that interest has kind of paid off. Maybe tell us a bit about some of the projects you've worked on in the past. Sure. Shortly after writing the O'Reilly book, I...

was approached by Adobe. And at the time, the vice presidents of research and design were talking, and they were sort of anticipating what was going to come in the next couple of years with AI. They could see that there were many places in which AI could come in under the hood of Adobe products and have a transformation of the quality of the capability without necessarily changing the user experience in any particularly meaningful way.

An example of that might be something like Content-Aware Fill. Prior to the last couple of years, that feature was implemented using kind of a pattern extension algorithm.

And so, you know, this would work pretty nicely if you were, say, trying to remove a beach ball from sand. Because, of course, sand is a pattern that extends perfectly nicely. And so, you know, it works well. But if for some reason you were trying to fill in a missing like mouth region of a human face, then extending the cheeks is, of course, not going to give you what you want. And so instead using neural inpainting,

You'd be able to do that much better because, of course, you're sort of learning what to paint there from a large statistical sample about how these different image features relate to one another. And so there, of course, you can get much better functionality. But from a user perspective, this tool doesn't need to operate in a very different way.

That was the stuff that would be sort of easy for Adobe to integrate into its products. The things that would be more difficult are the things that we're starting to see now that have no real direct predecessor because previous technologies just couldn't approach them at all. Give us some examples there. Oh, sure. For example, being able to generate an image from text or do something like completely repose a human body.

Perhaps another area that I should touch on in answer to your question is latent space navigation. So within the mind of a machine learning model, it produces a kind of interior representation of the variability in the things it's seen and that it's learned from.

And so then that space can be kind of navigated in linear traversal. In one part of that space, you might have things that look like sneakers, and then fairly nearby, you might have things that look like work boots. And then, you know, very far away, you might have things that look like, I don't know, teddy bears. And so, you know, navigating from one to the other is

It's a process by which you can sort of explore and discover what you're looking for. This could be a really, really useful design mechanism because it's like, well, that's not quite right, but I want something close to that. Being able to sort of look at that and kind of move in that space, it really lowers the barrier to exploration and experimentation in design, where traditionally maybe you drew a sneaker and now you want to try out a work boot. You have to completely redo your entire drawing, and this is

a very involved process. This is a really powerful feature. But of course, this way of thinking about designing something is...

without precedent, perhaps. And so thinking about what the interfaces for that should look like is sort of a bit of a new design exercise. That seems kind of interesting, too, because if you go back to your generative fill, yeah, it's pretty clear that, hey, I just want to fill this area and you could use a new algorithm to do that better or stronger, faster, quicker through some MLMAC. But there's not an analogy in the user interface for some of these other tools or some of these other ways of working. So that

That seems complicated to get a user, you know, in the end, there's a human sitting there. How do you let them know that they can twist a person or they can move something in space or they can make one shoe look like a work boot or move them towards the teddy bear spectrum? How do you let them know that in an interface without overwhelming them?

Yeah, it's a great question. It's funny because I have opinions about this sort of going two completely opposite directions. In one case... That's always fun when there's tension. Exactly. You need the tension in design, especially.

From one side, as I was alluding to a little bit just a moment ago, like you want to use familiarity where you've got it. If there's some trope from some related area, then why not help acclimate the user to that? So in the case of something like latent navigation, this could look a lot like a map, right? These sort of destinations of sneaker and work boot and stuff, right? You could think about them as existing on a surface. And if you sort of

drive a little further west if you want to get to the beach, or similarly drive a little further in this direction if you want to get to this type of shoe, right? So, you know, I think those kind of cues are really useful. At the same time, I think you have to be careful there because, you know, an artistic medium or a design medium is something where the properties of the medium itself are going to have a huge impact on the nature of the output. Going back to, say, Clement Greenberg, the art theorist,

who basically sort of said, you know, you shouldn't make a sculpture that should have been a painting to kind of truncate that. I think similarly, you don't want to necessarily forever have people making art with AI in sort of the same mindset as they would with pre-AI Photoshop. I think

You want to try to engender sort of some open-endedness. And of course, the users are going to end up doing most of the work for you there. Because generally speaking, I think at first what they will do is kind of something very close to the previous paradigm, right? Just like film editing tools really kind of borrow from like Steamback tabletop film editors. You know, similarly, AI generation first kind of looks really close to what people were doing with the previous generation of tools.

And then they start to explore outward. And so that's what you want to not get in the way of is their ability to explore outward. To me, as a creator of tools, the most important thing is we always in a business setting talk about use cases and user needs and user pain points. And we try to work out workflows that are well researched in terms of what a user is trying to do.

But I always feel that if the tool gets used in exactly the way that we anticipated, then we have really failed catastrophically. I think that the most interesting things that people do with software are things that are kind of at the margins of what they were supposed to be for.

An example I like to give with that is people building things like 8-bit computers in Minecraft. You know, it's kind of crazy from a practical perspective, right? Like it's, I mean, obviously this is one of the most inefficient ways you could possibly produce the simulation of a processor. But at the same time, it's so great, right? It's fascinating.

Actually, can you go back and explain a little bit about that? I think we all have kids who pretty much understand what that is. But maybe explain the 8-bit processor in Minecraft, maybe for people who don't know. Yeah, absolutely. Minecraft is a low-fidelity looking kind of, I guess you could say, building block, open world game.

And in it, a user is able to place these blocks or remove blocks. And most of those are static, like sort of meant to represent concrete and things like that. But also there can be water blocks and fluids and dynamic systems. And so this means that you can move blocks around the space.

As a result, you can simulate data flow. And as a result, it's actually possible to sort of build a kind of simulation of how electrons would be moving through a chip. And therefore, you could build essentially kind of an emulation of a computer processor. So these kinds of projects in an environment like that are, to me, kind of the best of an open-ended tool. A lot of what you come up against here is

I'm going to frame it in an exploration exploitation where you want to make it easy to do these incremental improvements because that's the filling in the beach ball in the sand. On the other hand, you want to also support the ability to do something crazy. And it seems like most of what you've talked about is in the space of a visual design. Does this tension play out elsewhere? What other ways are we...

We started off with machine learning to classify spam, yes, no, one, zero, yes, no. Is this spam? Is this fraud? And now we're talking about filling in mouths and faces. Where's this design going? What else can we use these tools to design? We've seen over the last two or so years, this real explosion of applicability of AI to creative fields.

Perhaps in our education system, we've come to see too stark a contrast between the arts and the sciences or between design and engineering or art and science or whatever you like. I'll use an art example because I think it's easier to talk about here. But, you know, if you're trying to, say, draw a portrait of a human face, I think most artists would say that it would be a mistake to try to, say, draw the nose directly.

to full resolution and detail before moving on to the next feature of the face. Instead, it's probably advisable to kind of plot out, okay, the nostrils will be approximately here and here, and the two eyes will be approximately here and here. And now we sort of take a step back, we look at the whole picture. Okay, these look about right in relation to one another. Okay, now we start to drop into one of those features and add some detail, perhaps one of the eyes, right?

Okay, but now we go back to the big picture view and we realize that, you know, the eye now looks great, but it's a little wrong in proportion to the nose. So we've got to go adjust that, right? And so we're always kind of moving back and forth between these different considerations. And I think that's very much true in software engineering or in scientific work that we have to rejigger all of the pieces in relation to one another and then always return back to how this all fits together.

If you think about a technology like that can reason from first principle, it's not just say reading books that we've written about a disease. It can sort of do trial and error from scratch and come up with solutions that way. Naturally, this would be groundbreaking in the sciences and perhaps sort of lead us to all the things that we have blind spots about. So process wise, I think that there's a lot to be learned from design tools and how we think about tools for engineering and for the sciences.

I think we're really on the precipice of AI playing a very, very transformative role in those fields as well. I'm particularly excited about this because

It seems to me that naturally and understandably, many people are concerned about the role AI is playing in the world. I think particularly if you look at its embedding in consumer products today, a lot of that does feel very much like sort of a replacement of the human role. This thing can write an essay for you. This thing can draw a picture for you. So it kind of seems like one in, one out.

But, you know, it needn't be a zero-sum game. And particularly when we think about things like pharmaceutical discovery or, you know, curing diseases, there's no reason to not want more help. And I think this can be a very much a positive-sum game where we use this technology to help us in areas where we just can't deal with the complexity of the possibility space.

We, of course, are by design the people who sort of point this towards something. Right. So I think the motivation, the ethos, if you will, comes from us. But the AI can play a very meaningful role in how we get there.

So what makes it hard? This all sounds wonderful, but we all know it's not easy. What makes it hard? Fair question. One thing that has actually been a real problem in the application to the sciences is the ability to simulate the system that we are acting on. If we look at something like reinforcement learning and say DeepMind's application of it to games such as Go and chess,

These games are very easily simulated. Their rules are not particularly complicated, and so of course the game can be fully simulated. So then, you know, a reinforcement learning system, which is basically sort of a learning system that operates similarly to how you might train a dog, where if it takes the right action, you give it a reward, and if it takes the wrong action, you give it sort of a punishment. In the case of a machine learning model, that's a numeric reward as opposed to a food, but same basic idea.

These kinds of systems are able to navigate possibility spaces that are just astronomically large. The possibility space in Go is like larger than the number of atoms in the universe. In Go, you just have to play out, you know, millions and millions of games inside of a computer. I think about compared with the large language models right now, those things have read more than I will ever read. I mean, just fundamentally, they have ingested much more information

understanding of language than I will ever, than any single human ever will yet.

you know, I outperform it in much of the writing. So there must be some missing link in there that we're not quite hooked on. And maybe it's that scaffolding and transfer that is a chunk of that. It's true. Your point leads me to something that I find very, very interesting. Something that I don't know that would have been in any way obvious to me if it weren't for what has happened in the last couple of years of AI, which is that omniscience actually has some downsides.

I'm well aware of that. It turns out that reading everything can actually be sort of a way to be kind of unopinionated or not really have like a perspective in the world. And so, you know, there's this process that is used as part of the training of language models.

called RLHF, which stands for Reinforcement Learning from Human Feedback. And this is used for multiple purposes. One of them is to kind of condition the model to speak in a more kind of conversational way, as opposed to a kind of autocomplete way, where it ends the sentence for you.

And that's important from a user experience point of view. But it also kind of helps to perspectivize the model to understand what is a good answer. The way this process works is basically you sort of take an already largely trained model. You

give it a prompt, you know, some kind of query. You ask it to generate multiple responses, and then you get a human to give a score to which answer they found preferable. And so this can, you know, help to condition things like how wordy or chatty or friendly or unfriendly to be in the response. But it also kind of has the effect of kind of singularizing the perspective of the model, right? So that it's not kind of looking at everything from all angles, which is kind of equivalent to looking at it from no angle.

That's an interesting analogy there. Let me transition, given your omniscience here. We have a segment where we try to ask some rapid fire questions. So we're just looking for the first thing that comes off the top of your mind here. What do you think is the biggest opportunity for AI right now? Advancing science. What's the biggest misconception that people have? That there's no future for humans. What was the first career that you wanted?

- You know, when I was about five, this was when virtual reality was kind of coming around for the first time, and I was very excited about that. I thought it was very cool and would make kind of replicas of VR headsets out of aluminum foil. Not functional ones, but just of the design. So VR designer was really my first major career aspiration. - When is there too much artificial intelligence? - That's a great question.

When we reach the point that we sort of don't feel the motivation to try to make the world better ourselves, if we lose touch with that, then there's kind of no point in AI being able to do it. All right. So what's one thing you wish artificial intelligence could do for us now that it cannot?

It's funny because like in my own life, I feel pretty content about things. I don't feel like, oh, there's some kind of missing capability. So I get very excited about the technology, but at the same time, it's almost sort of for the sake of just seeing what's possible more than feeling like there's something missing, but perhaps more tangibly.

Can you build an 8-bit computer in Minecraft? Exactly. Exactly. It's, yeah, kind of an intellectual conquest, I guess. But no, I mean, I think it would be very cool to have home robots.

This is quite fascinating. I can't believe all the topics we've covered. I think you've really opened my eyes to the potential for design. I mean, you've got a lot that we've done so far with design, but the potential is really pretty fascinating. Thanks for taking the time to talk with us today. Thanks so much for having me. It's been really a pleasure. Such a fun conversation.

Thanks for listening. Next time, Sam and I close out Season 9 by speaking with Joelle Pinault, Vice President of AI Research at Meta. Please join us.

Thanks for listening to Me, Myself, and AI. We believe, like you, that the conversation about AI implementation doesn't start and stop with this podcast. That's why we've created a group on LinkedIn specifically for listeners like you. It's called AI for Leaders. And if you join us, you can chat with show creators and hosts, ask your own questions, share your insights, and learn more about AI.

and gain access to valuable resources about AI implementation from MIT SMR and BCG, you can access it by visiting mitsmr.com forward slash AI for Leaders. We'll put that link in the show notes and we hope to see you there.

Authoring Creativity With AI: Researcher Patrick Hebron 29:52 Share

Me, Myself, and AI

Deep Dive

Shownotes Transcript

Authoring Creativity With AI: Researcher Patrick Hebron