We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
People
S
Swix
Topics
Swix: 本期节目与Practical AI播客合作,探讨了AI领域的最新趋势,特别是大型语言模型的应用和挑战。Swix强调了Practical AI播客在AI领域的长久历史和丰富的资源,以及其提供的AI领域口述历史的价值。 Alessio: 介绍了Dan Whitenack及其播客Practical AI,并简要介绍了Dan Whitenack的背景和职业经历,特别是他在SIL International的工作以及他专注于低资源AI场景。Alessio还提到了Dan Whitenack目前在PredictionGuard的工作。 Dan Whitenack: 介绍了Practical AI播客的创立过程,以及其宗旨是实用性,目标是让听众学到有用的知识。Dan Whitenack还介绍了PredictionGuard项目,旨在帮助企业以合规的方式使用生成式AI技术,解决合规性和结构化输出的问题。他分享了他个人最喜欢的几期Practical AI播客节目,包括那些深入探讨特定AI模型的节目和关于AI在非洲应用的系列节目。他还讨论了从MLOps到LLMOps的转变,以及大型语言模型评估的挑战,特别是基准测试与实际应用之间的差异。Dan Whitenack还介绍了Masakane项目,这是一个由非洲NLP研究人员组成的基层组织,关注满足非洲语言社区的实际需求。他分享了他如何关注最新的AI模型,以及他通过datadan.io网站提供的研讨会和咨询服务。他建议企业用户深入研究提示工程和LLM操作,并遵循一个层次结构来使用LLM,从提示工程到微调再到训练自己的模型。他还讨论了“提示工程”这个术语被过度炒作了,但围绕提示和LLM的工程和操作是一个真实的工作流程。他认为AI工程正在成为软件工程的一个子专业,并讨论了传统机器学习工程师和软件工程师在转向AI工程师时面临的不同挑战。他还讨论了NLP数据集的演变,特别是Label Studio等工具的出现,以及无标签数据集在自监督学习中的作用。最后,他鼓励人们动手实践,并探索各种工具。 Swix: Swix强调了Practical AI播客在AI领域的长久历史和丰富的资源,以及其提供的AI领域口述历史的价值。Swix还分享了他个人最喜欢的几期Latent Space播客节目,包括那些新闻驱动的节目,特别是关于ChatGPT插件发布的节目。Swix还讨论了AI UX在AI应用中的重要性,以及大型语言模型的通用性超出了他的预期。他认为在AI领域,英语和汉语仍然占据主导地位,而其他语言的模型性能还有待提高。 Alessio: Alessio介绍了Dan Whitenack及其播客Practical AI,并简要介绍了Dan Whitenack的背景和职业经历。Alessio还提到了Dan Whitenack目前在PredictionGuard的工作,以及他认为AI UX在AI应用中非常重要。

Deep Dive

Key Insights

Why did Dan Whitenack start the Practical AI podcast?

Dan started Practical AI with Chris Benson to create a podcast that focused on practical, hands-on AI applications, as opposed to overly hyped or theoretical discussions. They wanted to provide actionable insights that listeners could use in their daily work.

What is PredictionGuard, and what problem does it aim to solve?

PredictionGuard addresses the challenges enterprises face when implementing generative AI technologies, such as data privacy, compliance, and the need for structured, consistent outputs. It provides tools for running AI models in a compliant manner and offers layers of control for structuring and validating model outputs.

What are some of the key trends in AI that Dan Whitenack has observed?

Dan has observed the shift from traditional MLOps to LLMOps, the growing importance of multilingual and low-resource language models, and the increasing use of models to evaluate and generate data for training other models. He also notes the rise of AI engineering as a distinct skill set.

What are some of the favorite episodes of Practical AI according to Dan Whitenack?

Dan's favorite episodes include those that focus on fully connected discussions between him and Chris Benson, such as episodes on ChatGPT, Stable Diffusion, and AlphaFold. He also highlights episodes on AI in Africa and the use of AI in low-resource scenarios.

What is the most popular episode of Practical AI, and why?

The most popular episode is the one featuring Ville Tuulos discussing Metaflow, a Python package for full-stack data science developed at Netflix. The episode resonates with listeners because it addresses the challenges of moving from notebooks to production, which is a common struggle for data scientists.

What does Dan Whitenack think about the term 'prompt engineering'?

Dan believes that 'prompt engineering' as a term is overhyped, but the engineering and operations around large language models are very real. He emphasizes the importance of understanding how to structure prompts, chain processes, and fine-tune models to achieve practical results.

What are the unique challenges for engineers transitioning into AI engineering?

Engineers transitioning into AI engineering face challenges with non-deterministic systems and the lack of control over model drift, as well as the need to explore the latent capabilities of models. They also need to adapt to the new workflows required for working with large language models.

What does Dan Whitenack think about the role of AI UX (User Experience) in AI applications?

Dan believes that AI UX is crucial and can make or break the adoption of AI technologies. He gives the example of ChatGPT, where the UX innovation played a significant role in its success. He also mentions GitHub Copilot as an example of how UX can enhance the integration of AI into software development.

What are some of the trends in NLP datasets that Dan Whitenack has observed?

Dan has observed trends towards using augmented tooling for fine-tuning models with human feedback and the increasing use of models to generate data for training other models. He also notes the challenges of data quality and the need to filter and curate datasets to improve model performance.

What is something that has already happened in AI that Dan Whitenack thought would take much longer?

Dan is surprised by the generalizability of large language models beyond traditional NLP tasks. He found that these models could be applied to tasks like fraud detection without needing traditional statistical models, which he thought would take much longer to achieve.

Chapters
This chapter introduces the crossover episode between Latent Space and Practical AI podcasts, highlighting the significance of exploring the history and trends in AI. It emphasizes the value of podcasts offering a comprehensive overview of the AI field.
  • Crossover episode between Latent Space and Practical AI podcasts.
  • Focus on AI history and trends.
  • Recommendation of podcasts with longer backlogs for comprehensive learning.

Shownotes Transcript

Translations:
中文

Hello again, it's Swix back again with the second of our crossover episodes. This time with the Practical AI folks who have been covering the AI space since before it was cool as we often like to say. Something that Alessio and I are pretty mindful of every time we take to the mic is that we're pretty new podcasters to this space and there is a longer history and heritage behind all this as well as a lot of

areas and niches that we'll probably never even touch. And so I always like recommending these podcasts with a much longer backlog that actually give you a way to travel back in time for kind of an oral history of AI, which is very rare and very valuable. These are tokens that are generated in real time, just like we're doing it now for this current wave of AI trends. And so I

What we did, podcaster to podcaster, I always love talking to other podcasters because they know what's up, is we went through our respective stats and picked out the crowd favorite episodes as well as some personal picks. I do have to note that this was recorded just before our George Hotz episode, which obviously is now our top downloaded episode across podcasts, as well as our newly launched YouTube. Go subscribe, please, if you haven't seen it.

But anyway, if you're looking for good AI podcasts, these guys have been covering it for a long time, very, very consistently and covered a lot of topics. Definitely check them out. And please enjoy our special crossover episode with Practical AI in the studio.

Welcome to Practical AI. If you work in artificial intelligence, aspire to, or are curious how AI-related technologies are changing the world, this is the show for you. Thank you to our partners at Fastly for shipping all of our pods super fast to wherever you listen. Check them out at Fastly.com. And if you're new to the show, welcome to Practical AI.

And to our friends at Fly, deploy your app servers and database close to your users. No ops required. Learn more at fly.io.

Well, hello. We have a very special episode for you today. I got the chance to sit down with the guys from Latentspace, Swix, and Alessio out in San Francisco. They were kind enough to let me into their podcast recording studio, and we got a chance to talk about our favorite episodes of both of our shows and some of the overall takeaways we've had from those discussions. We cover some of the trends that we've been seeing in AI,

And they even get a chance to grill me on my opinions about prompt engineering. So enjoy the show. Hey, everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my co-host Swix, writer and editor of Latent Space.

And today we're very excited to welcome Dan Whitenack to the studio. Welcome, Dan. What's up, guys? It's great to be here. This is a podcast crossover. If you recognize this voice, Dan is the host of Practical FM. He's been in my ear on an offer recently.

the past five years covering the latest and greatest in AI before it was cool. Yeah, yeah, yeah. Before the AI hype back in these like weird data science times, whatever that is now. Yes. Everything is merging and converging. So I'll give a little bit of your background and we can go into a little bit on your personal side. You got your PhD in mathematical and computational physics.

And then I spent 10 years as a data scientist, most recently at SIL International, which I actually, I thought it was like an agri-tech thing. And then I went to the website. It's actually a nonprofit. - International NGO, yeah. So they do language related work all around the world. So I spent the last five years building up a team that's been working on kind of low resource scenarios for AI, if people are familiar with that. So doing like machine translation or speech recognition, that sort of thing in languages that aren't yet supported. - Yeah.

Yeah, and we'll talk about this later, but I think episode three on Practical AI was already featuring the global community that AI has and addresses. Yeah, yeah, yeah. It's been an important theme throughout the whole time, throughout over 200 episodes. Yeah, yeah. And you recently left SIL to work on PredictionGuard, which we can talk a little bit more about that. You are also interim senior operations development director at NT Candle Co. Yeah, yeah.

And yeah, what else do people know about you? Yeah, I mean, I, as probably can be noted from the intro, I love working on various projects and having my hands in a lot of things. But yeah, I code on the side for fun, and that's how I usually get into these side projects and that sort of thing. But outside of that, yeah, I

I live in Indiana. I was telling you guys that I'm trying to coin the term cerebral prairie. So we'll see if that catches on. Probably not. You're a second guest in a row from Indiana. Linus from Notion was Indiana. We were talking about how there's a surprising number of international students in there. Yes, very true. Purdue is a...

strong university. Yeah, yeah, very strong university. It's a great place to spend time. And there's a lot of fun things that happen around that area too. So I'm also very into music, but not any sort of popular music. I play like mandolin and banjo and guitar and play folk music. Low resource. Yeah, low resource music, low resource languages. Yeah, all those things. Anything low resource is in my territory for sure.

And maybe we can cover the story of Practical AI. How'd you start it? Tell us what the early days were like and just fill everyone in. Yeah, it was kind of a winding journey. Some people might be familiar with the ChangeLog podcast, which I think they've been going now for like 11 or 12 years. It's pretty...

prolific in around, I think originally around more open source. Now it's kind of software development in general, but they have a network of podcasts now and at a Go conference actually. So I'm a fan of the Go programming language. That's another fun fact. But

But at GopherCon, I think it was in 2016, maybe, I met Adam Stachowiak, who is one of the hosts of the ChangeLog. At the time, I was giving a talk about data science, something, I forget. But he kind of pitched me, he's like, we've been thinking a lot about doing like a data or data science podcast. And at the time, he had a name, it was like Harkin.

I think it was hard data or something like that, which never caught on for obvious reasons. But I kind of stored that away. It didn't really do anything with it. But then over the next couple of years, I met Chris Benson, who's my co-host on Practical AI.

and helped him with a couple of talks at conferences. We met through the Go community as well. And eventually, he was working at a different company at the time. Now he's a strategist with Lockheed Martin working in AI stuff. But he reached out to me and said, hey, would you ever consider doing kind of like a co-host podcast thing? And at that point, I remembered my conversation with Adam. So I reached back out to Adam with the changelog. And

And then we kind of started working on the idea. We wanted it to be practical. So at the time, well, there's a lot of people doing things now with AI, like hands-on. Back then, there were kind of some podcasts that were really hyped AI, like not practical at all, which is why we kind of came to practical AI, something that would actually benefit people. And that's like a great thing.

thing to hear from people that when they listen to the show, they do actually learn something that's useful for their day to day. That's kind of the goal. Yeah. Nice. And I think that's one of the things in common with our podcast. You know, there's a lot of content out there that can get a lot of clicks with a

Fear of AI, you know, and all these different things. And I think we're all focused on more practical and day-to-day usage. Yeah. Tell us more about PredictionGuard, you know, that kind of fits into making AI practical and usable. Yeah. Yeah, sure. Appreciate that. So, yeah, PredictionGuard is what I've been working on since about Christmas time-ish.

Originally, I was thinking a lot about large language model evaluation and model selection, but it's kind of morphed into something else. What I've realized is that there's this market pressure, there's internal company pressure for people to implement these kind of generative AI technologies.

version of models into their workflows because enterprises realize the benefits that they could have. But in practice, when they go and they go from like chat GPT, they type in something, it's amazing. And then like, how do we do this in our enterprise where we have maybe rules around data privacy or compliance issues?

And also like we want to automate things maybe or we want to do data extraction, but I just get like text vomit out of these models. Like what do I do with that? It was based on structured text. How do I build a robust system out of inconsistent text vomit?

So Prediction Guard is really focused on those two things. One is kind of compliance and running kind of state-of-the-art AI models in a compliant way. And then layering on top of that layers of control for structuring output and validating output. So some people might be familiar with projects like guardrails or guidance or these things. So we've

integrated kind of some of the best of those things into the platform, plus some ways to easily do like self-consistency checks and factuality checks and other things on top of large language model output. Nice. We did have a Shreya Rajpal from GoRails as a guest. Yeah, yeah. So yeah, that's another episode that people really like. Yeah, maybe, you know, just to give people a sense of what practical AIS as a podcast, you want to talk about maybe like the two, three?

favorite episodes that we have. And we can go maybe alternate, you know, like our favorites. We've done some prep for this episode. Yes, yes, yeah. So this is kind of, I think our conception of this is kind of like a review for listeners who are new to us, either of our podcasts, to go back and revisit the favorites. Yeah, yeah. I think I can talk about some personal favorites of mine and then maybe like favorites from the audience. I think some of my personal favorites have actually been

We call them like fully connected episodes where Chris and I actually talk through a subject in detail together without a guest. To be honest, those are great episodes just for like me to learn something, like have an excuse to learn something. And we've done that recently, like with chat GPT and instruction tune models. We did it with stable diffusion and diffusion models. We did it with alpha fold. So all of those are episodes with us too. And just talking through like

How practically can you form a mental model for how these models were trained and how they work and what they output? Those are some of my favorites just because I learn a lot because I do a little bit of prep. We talk through all the details of those and it helps me form my own sort of intuition around those things.

Another personal favorite for us was that we did a series about AI in Africa. That was really cool. You mentioned like the global AI community. We did actually a series of those. They're all labeled AI for Africa, highlighting things like Masakane. So people don't realize that like some of the models that we develop here, like

in the West Coast or wherever, they don't work great for all use cases around the world. And there's a lot of thriving grassroots communities like Masakane and Turkic Interlingua and other communities that are really building models for themselves. Machine translation, speech recognition,

models that work for their languages around the world or agriculture, you know, computer vision models that work for their use cases around the world. So those are a couple of highlights on my end. Do we go with our personal highlights? Yeah.

Go ahead. I think you already picked one out. Yeah, I think mine is definitely the episode with Mike Conover from Databricks, who's the person leading the Dahlia first there. I think obviously the content is great and Mike is extremely smart and prepared, but I think the passion that he had about these things, you know, the red pajama data set came out the morning that we recorded it.

And we're all kind of like nerding out or like, yeah, why is that so interesting? Like he was so excited about it. And it's great to see people that have so much excitement about things that they work on. You know, it's kind of like an inspiration in a way to do the same for us. Yeah.

I think personally, so I tend to drive the news driven episode ones, like the event driven ones where something will happen in AI and like I'll make a snap decision that we'll have an episode recording on Twitter spaces and we'll have just a bunch of people tune in. I think the one that stood out was the ChatGPT app store, the ChatGPT plugins release where like 4,000 people tuned in to that one. That's crazy. And we did like an hour of prep, right? And yeah,

I think it's important for me as a "journalist" to be the first to report on something major and to provide a perspective on something major, but also capture an audio history of how people react at the time.

because this is something that we're talking about in the prep, Chatuby plugins have become a disappointment compared to our expectations then. But we captured it. We captured the excitement back then. And we can sort of compare and contrast where we thought things were going and where things have actually ended up. It's a really nice piece of, I guess, audio journalism. Yeah. Yeah. I mean, it was just last year. I mentioned stable diffusion and all that. We were talking about this. It was like,

I always had in my mind, oh, everything's going to image generation. Like, should I quit doing NLP and start thinking about image? And now all I do is NLP and language models. But at the time, that was, you know, that's what was on our mind. Same thing. I was working on a web UI for Stable Diffusion, just like a thousand other front-end developers were. And yesterday was the first time I opened Stable Diffusion in six months.

And a lot has changed and it's still an area that's developing, but it's not, yeah, it's not driving thought process at the moment. Yeah. Well, especially cause I think just, it depends on what you're, what you think you want to do. And I'm definitely in less visual. I'm more of a text driven person. So I naturally lean towards LLMs anyway, like NLP. Yeah. Um,

I can hit some listener favorites. Yeah, crowd favorites. So we have like one clear favorite, which is actually, I would say it's a surprise to me. Not because the guest wasn't good or anything, but just the, so the topic was Metaflow. So I don't know if you've heard of Metaflow. It's a Python package,

for kind of full stack data science modeling work developed at Netflix. And we had Ville Toulouse on who was the creator of that package. And that has had so... It's like maybe a 30% more listens than any other episode. And I think the title... So we titled it from... I think from notebooks to production or something like that. Yeah. So it's like this idea of...

From notebooks to production, there's all sorts of things that prevent you from getting the value out of these sorts of methodologies. And my guess would be that talking about that is probably like the key feature of that episode. And Metaflow is like really cool. People should check it out. It is one way to kind of do this, both versioning and orchestration and deployment and all of these things that are really important. But I think a takeaway for me was that

like practically bringing into the, some people might call it like full stack data science or like model life cycle things. Like the model life cycle things interest people so much. So beyond making like a single inference or beyond doing like a single fine tuning,

What is the life cycle around a machine learning or an AI project? I think that really fascinates people because it's like the struggle of everyday life in actual practical usage of these models. So it's one thing to go to Hugging Face, try out like a Hugging Face space and like create some cool output or even just pull down a model and get output done.

But how do I handle model versioning and orchestration of that in my own infrastructure? How do I tie in my own data set to that and do it in a way that is fairly robust? How do I take these data scientists who use all this weird tooling and mash them into an organization that deals with DevOps and non-AI software and all of that? I think those are questions people are just wrestling with all the time. Yeah.

Yeah. It feels a little bit in conflict with the trends of foundation models where the primary appeal is you train once and then you never touch it again. Or you'd release it as a version and people kind of just prompt based off of that. And I feel this evolution moving from essentially the MLOps era.

into, for lack of a better word, LLM ops. How do you feel about that? No, I think you're completely right. I think there will always be a place for these models in organizations that are task-specific models, like scikit-learn models or whatever that solve a particular problem. Because organizations like finance organizations or whatever will always have a need for explainability or whatever it might be.

But I do think we're moving into a period where I've had to rebuild a lot of my own intuition as a data scientist.

from thinking about gather my data, create like my code for training, output my model, serialize it, push it to some hub or something, deploy it, you know, handle orchestration to now thinking about, okay, which of these pre-trained models do I select and how do I engineer my prompting and my chain? Maybe going to fine tuning, like that is still like a really relevant topic.

But some of these things that like I've been working on with prediction guard, I think are the things that have a parallel in ML ops, but they're slightly like, there's just a slightly different flavor. I think it's,

how ml ops is graduating to something else versus like people are still concerned about ops it's just like you say it's a kind of a different kind of ops yeah and i think that's reflected in our most popular episodes too so i think all three of our most popular episodes are model based they're not more like infrastructure based so i think

I think number one is the one with Reza Shabani, where we talked about how they trained the Replit code model and the Amjad vibes that they used to figure out whether or not the model was good. And I think, you know, that makes sense for our community. It's mostly software engineers and AI engineers. So code models are obviously a hot topic. Yeah, that was really good. And I think like it was one of the first times where we kind of went beyond just listening traditional benchmarks, you know, which is why we did a whole thing about Amjad.

I'm Judd Eval. It's like a lot of companies are using these models and they're using off-the-shelf benchmarks to do it. And what, you know, in other episodes that we'll talk about is like the one with Jonathan Frankel from MosaicML. And he also mentioned a lot of the benchmarks are multiple choice, but most production workloads are like open-ended text generation questions. So how do you kind of reconcile the two? Yeah.

Did you all get into at all, you know, the whole space of LLMs, evaluating LLMs and sort of, this was something on a recent episode we talked to Jerry from Llama Index about in terms of, on the one hand, generating questions like you're talking about to evaluate LLMs.

LLMs or using an LLM to look at a context and a response and provide an evaluation. I think that's definitely something that I think is interesting and has come up in a few of our episodes recently where people are struggling to evaluate these things. And so, yeah, we've seen a similar trend in

one direction thinking about benchmarks and in another direction thinking about this sort of on the fly or model based evaluation, which has existed for some time, like in machine translation, it's very common. So like Unbabel uses a model called Comet and

That's like one of the most popular, highest performing machine translation evaluators as a model. It's not a metric and that sort of thing like blue. So yeah, that's a trend that we've seen is evaluation and specifically evaluation for LLMs, which can kind of get dicey. Yeah, we did a Benchmarks 101 episode that is also well-liked.

And we talked about this concept of like a benchmark driven development, you know, like the benchmarks used to evolve every like three, four years. And now the models are catching up every like six months. So there's kind of this race between,

the benchmarks creators and like the models developers to find, okay, the state-of-the-art benchmarks is here and GPT-4 on a lot of them gets like, you know, 98 percentile results. So, you know, GPT-4 is not a GI. Therefore, to get to a GI, we need better evals for these models to start pushing the boundaries. And yeah, I think a lot of people are experimenting with using models to generate these things, but I don't think there's a clear answer yet. Something that I think we're quite...

Surprised to find was specifically in Hello Swag, where the benchmarks, instead of being manually generated, were adversarially generated. And then I was very interested in our, I mean, this is kind of like segueing, we're not really going in sequence here, segueing into our second most popular episode, which was on RoboFlow.

which covered segment anything from Meta. I think you guys had a discussion about that too. Yeah, it's been mentioned on the show. I don't think we've had a show devoted to it. Well, the most surprising finding when you read the paper is that something like...

Less than 1% of the data of the masks that they released were actually human generated. A lot of it was AI assisted. So you have essentially models, evaluating models, and the models are trained themselves, trained on model generated data. Very few, very many layers in at this point. Yeah. Yeah.

And I know that there's been a few papers recently about the sort of things that were done with LAMA and other models around model-generated output and data sets. It'll be interesting to see. I think it's still early days for that. So I think at the very minimum, what all of these cases show is that models, either evaluating models or using simulated data, I think...

Back a few years ago, we would probably call this simulated data, right? I don't think that term is quite as popular now. Augmented? Yeah, or augmentation, data augmentation, simulated data. So I think this has been a topic for some time, but the scale at which we're seeing this done is kind of shocking now and encouraging that we can do quite flexible things by combining models together, both

at inference time, but also for training purposes. Well, have you ever come across this term of mode collapse? What I fear is, especially as someone who cares about low resource stuff, is that stacking models on top of models on top of models, you just optimize for the median use case or the modal use case. Yeah. Yeah. I think that one maybe... So yeah, that is a concern. I would say it's a valid concern. I do think that

these sort of larger models, and this gets, I guess, more into multilingualism and the makeup of various datasets of these LLMs. The more that we can have linguistic diversity represented in these LLMs, which I know, I think Cohere for AI just announced a community-driven effort to increase multilinguality in LLM datasets.

But I think the more we do that, I think it does benefit the downstream lower resource languages and lower resource scenarios more because we can still do fine tuning. I mean, we all love to use pre-trained models now.

But like in my previous work, when you were looking at maybe an Arabic vernacular language rather than standard Arabic, there's so much standard Arabic in data sets. Making that leap to an Arabic vernacular is much, much easier if that Arabic is included in

the LLM datasets because you can fine tune from those. So that is encouraging that that can happen more and more. There's still some major challenges there. And especially because most of the content that's being generated out of models is not in, you know, central Siberian Yupik or one of these languages, right? So we can't

We can't purely rely on those, but I think my hope would be that the larger foundation models see more linguistic diversity over time. And then there's these sort of grassroots organizations, grassroots efforts like Masakane and others that rise up kind of on the other end and say, okay, well, we'll work with our language community to develop a data set that can fine tune off of these models. And hopefully there's benefit both ways in that sense. So

Since you mentioned Masakani a couple of times, we'll drop the link in the show notes so people can find it. But what exactly do they do? How big of an impact have they had? Yeah, I would say, so if people aren't familiar, if you go to the link, you'll see it. They talk about themselves as a grassroots organizations of African NLP researchers creating technology for Africa. So we have our own kind of biases as researchers.

in an English-driven sort of literate world of what technology would be useful for everyone else. It probably makes sense for maybe listeners to say,

Well, wouldn't it be great if we could translate Wikipedia into all languages? Well, maybe, but actually the reality on the ground is that many language communities don't want Wikipedia translated into their language. That's not how they use their language. Or they're not literate and they're an oral culture, so they need speech, right? Text won't do them any good. So that's why Masakane has started as a sort of grassroots organization that

of NLP practitioners who understand the context of the domain that they work in and are able to create models and systems that work in those contexts. There's others, you can hear them on the AI for Africa episodes that we have that talk about agriculture use cases.

Agriculture use cases in the US might look like John Deere tractor with a cam. I don't know if people know this, but John Deere tractors or these big tractors, they literally have a Kubernetes cluster on. Some of them have a Kubernetes cluster on the

tractor. It's like a at the edge Kubernetes cluster that runs these models. And like when you're laying down pesticide, there's cameras that will actually identify and spray like individual weeds rather than like spraying the whole field. So that's like at the level that, you know, maybe is useful here in Africa. Maybe the more useful thing is around disease or drought identification or, you

disaster relief or other things like that. And so there's people working in those environments or in those domains that know those domains that are producing technology for those cases. And I think that's really important. So yeah, I would encourage people to check out Masakane. And there's other groups like that. And if you're in like the US or Europe or wherever, and you want to get involved, there's open arms to say, hey, come help us do these things. So yeah, get involved too.

What else is in your top three? Oh, yeah. So one recent one from Raj Shah from Hugging Face. Some people might have seen his really cool videos on LinkedIn or other places. He makes TikTok videos about AI models, which is awesome. Yeah.

And his episode is called The Capabilities of LLMs. And I thought it was really a good way to help me understand the landscape of large language models and the various features or axes that they're kind of situated in. So one axis is language.

For example, closed or open, right? Can I download the model? But then on top of that, there's another axis, which is, is it available for commercial use or is it not? And then there's other axes like we already talked about multilinguality, but then there's like task specificity, right? Like there's code gen models and there's language generation models. And there's, of course, image generation models and all of those as well.

So yeah, I think that episode really helps set a good foundation, no pun intended, for language models to understand where they're situated. So you can kind of, when you go to Hugging Face and there's, what is there like 200,000 models now? Maybe there's, I don't know how many models there are. How do I like navigate that space and understand what I could pull down? Or do I fit into one of those use cases where it makes sense for me to just connect to OpenAI or Cohere or Anthropic?

helps kind of situate yourself. So I think that's why that episode was so popular is he kind of lays all of that out in an understandable way. How do you personally stay on top of models? You know, there's leaderboards, there's Twitter, there's LinkedIn. Yeah, I think it's a little bit spread out for me between the sources that you mentioned. As podcasters, I think that's one of the... Yeah, it's our job. Yeah, well, it's also a benefit for us. I think like if I didn't have...

every week on Wednesday, like I'm going to talk about this topic, whether like I'm planning to think about, think about a certain thing or not. It kind of helps you prompt and look, look at what's going on. So I think that is an advantage of like content creators is it is kind of a responsibility, but it's also an advantage that we can have to like have the excuse to have great conversations with people every week.

But yeah, I think Twitter is a little bit weird now, as everybody knows, but it's still a good place to find out that information. And then sometimes, too, like, to be honest, I go to Hugging Face and like I'll search for models, but I also search and I look at the statistics around the downloads of models.

because generally when people find something useful, then they'll download it and download it over and over. So sometimes when I hear about like a family of models, I'll go there and then I'll look at some of the statistics on Hugging Face and like try a few things. Yeah. And some of these forks, I see the download numbers, but I've never heard of them outside of Hugging Face. Yeah, it's true. It's true. Yeah. And some of them, like there'll be a fork or...

like a fine tune or something. And it's a, you do have to do a little bit of digging around like licensing and that sort of thing too, but it is a useful, like there's tons of people doing amazing stuff out there that aren't getting recognized at the like, you know, Falcon or MPT level, but there's a lot of people doing cool stuff that are releasing models on hugging face. Maybe that they've just found interesting. Yeah.

Any unusual ones that you recently found? Well, there's one that I'll highlight, which I thought was cool because I don't know if you all saw the meta released this. The six modality model. Yeah, yeah. And it was interesting because we did this work with Masakane when I was at SIL. We did this work with Masakane and Koki, which is a speech tech company to create these language models in like six African languages.

And, um, I was like, okay, that's, that's cool. Like we did that. We formed like the data sets. It was satisfying, but now I'm like learning that then Meadow went and found that data on, on hugging face. And that's kind of incorporated in these, uh,

these new models that Meta has released. So it's cool to see like the full cycle thing happen where there was grassroots organizations seeing a need for models, gathering data, doing baselines. And now there's like extended functionality in kind of like a more influential way, I guess, at like that higher level. Yeah.

Yeah, I think, I mean, talking about open and closed models, when we started the podcast, it kind of looked like a cathedral kind of market where we had coherent, entropic, open AI, stability, and

those were like the hottest companies. I think now, you know, as you mentioned, you go on Hugging Face, like I just opened there right now. There's the Sutter's news research, 13 billion parameters model that just got released, fine tune on over 300,000 instructions. It's like the models are just popping up everywhere, which is, which is great. And yeah, we had an episode with, as I mentioned, with Jonathan Frankel and Abhinav from Mosaic ML to introduce MPT 7B and some of the work that they've done there. And I,

I think like one of their motivation is like keeping the space as open as possible, like making it easy for anybody to go, obviously, ideally on the almost HMOS platform and turn their own models and whatnot. So that's one that people really liked. I thought it was really technical. So I was really a little worried at first. I was like, is it going to fly over most people's head? But

It was actually super well received. MARK MANDEL: No, we're going more technical. FRANCESC CAMPOY: Exactly. Now that was a good learning. MARK MANDEL: Leaning in. FRANCESC CAMPOY: Exactly. MARK MANDEL: And Jonathan is super passionate about open source. He had this rant halfway through the episode about why it's so important to keep models open. And I actually edited in the crowd applause

into the podcast, which I kind of love. I love little audio bonuses for people listening along. And I think the changelog guys do that really well, especially in their newer episodes. Yeah, we need to, there is a way for us to integrate some of those things. Yeah, like the soundboard thing. And we've never got into it too much. I need to work with Jared from the changelog and see. It just spices it up. Exactly, exactly. You can only have so many hour-long conversations about ML. Yeah.

We're, we're, yeah, I, I keep thinking that, but then we keep going. Right, right, right, right. Sorry. I didn't mean, I didn't mean like it was like, no, it just switches it up and makes it audio interesting to, to add variety. Cool. I don't know. I don't know if there are any other highlights that we want to do for, um,

I'll just highlight maybe one more. Kirsten Lum was on, she had an episode about machine learning at small organizations. I think that's a great one. Like if you're a data scientist or a practitioner or an engineer at like either a startup or a mid-sized company where I think the thing that she emphasized was these different tasks that we think about, like

Whether it's curating a data set or training a model or fine-tuning a model or deploying a model. Sometimes at a larger organization, those are functions in and of themselves. But when you're in this sort of mid-range organization, that's like a task you do, right? So to think about those tasks as tasks of your role and timebox them and understand how

how to do all of those things well without getting sucked down into any one of those things. That was an insight that I found quite useful in my day-to-day as well as to sort of start to get a little bit of like spidey sense around, hey, I'm spending a lot of time doing this, but which probably means,

I'm like stuck in too much like I'm making my MLOps too complicated right to track versions and like tie all this stuff together maybe I should just like do a simple thing and like pay

paste a number in a Google sheet and move on or something. I think that's a good segue into some of the other work that you do. You run the datadan.io website, which is kind of like a different types of workshop and advising that you do. I think a lot of founders especially are curious about how are companies thinking about using this technology? There's a lot of demos on Twitter, a lot of excitement, but when founders are putting together something that they want to sell, they're like, okay,

What are the real problems that enterprises have? What are like some of the limitations that they have? We talked about commercial use cases and something like that. Can you maybe talk a bit about, you know, two, three high-level learnings that you had from these workshops on like how these models are actually being brought into companies and how they're being adopted? Yeah, I think maybe one higher level comment on this is

Even though we see all these demos happening, everybody's using chat GPT. The reality in enterprise is most enterprises still don't have like LLMs integrated across their technology stack, right? So that might be a bummer for some people like, oh, it's not quite as pervasive, but I actually find it as refreshing maybe because some of us feel like stuff happens every week. It's, it's

exhausting to keep up like oh if i don't keep up with this stuff then like i'm getting left behind but it takes time for these things to trickle down and not everything like we were talking about the stable diffusion use case and others like not everything that's hyped at the moment will be a part of your like day-to-day life forever right so you can kind of take some comfort in that

I think it's really important for people to, if they're interested in these models, to really dig into more than just kind of a single prompt into these models. The practical side of using generative text models or LLMs really comes around either what some people might call prompt engineering models.

but understanding things like giving examples or demonstrations in your prompt, using things like guardrails or regex statements or prediction guard to structure output, doing fine-tuning for your company's data. There's kind of a hierarchy of these things. I think

I think you all know Travis Fisher. He was a guest on Practical AI and talked about this hierarchy from prompt engineering through like data augmentation to fine tuning to eventually like

training your own generative model. I've really tried to encourage enterprise users and those that I do workshops with to think something like that hierarchy with these models, like get hands on, do your prompting. But then like if you don't get the answer that you want immediately, I think there's a tendency for people to say, oh, well, it doesn't work for my use case.

But there's so much of a rich environment underneath that with things like link chain and llama index and, you know, data augmentation, chaining, customization, fine tuning, like all this stuff that can be combined together.

it's a fun new experience, but I find that enterprise users just haven't explored past that very most shallow level. So I think, yeah, in terms of the trends that I've seen with the workshop, I think people have gone to chat GPT or one of these models. They've seen like the value that's there, but they have a hard time connecting these models to a workflow that they can use to solve problems. Like before we all had intuition, like,

I'm going to gather my data. It's going to have these five features. I'm going to train my scikit-learn model or whatever. I'm going to deploy it with Flask. And now I have a cool thing. Now all of that intuition has sort of been shattered a little bit. So we need to develop a new workflow around these things. And I think that's really the focus of the workshops is kind of rebuilding that intuition into a practical workflow that you can think through and solve problems with practically.

You have a live prompt engineering class. Prompt engineering overrated or underrated? Yeah, I think prompt engineering as like a term is probably too hyped. I think engineering...

and ops around large language models though is a real thing. And it is sort of what we're transitioning to. Now, how much you wanna say is like, that term gets used in all sorts of different contexts. It could mean just like, oh, I wrote a good prompt and I'm gonna like sell it on Twitter or something.

Prompt base. The marketplace of prompts. I wonder how they're doing, to be honest, because they get quoted in almost every article about prompt engineering. They got really, really good PR. Yeah, yeah. I mean, if people can sell their prompts, I mean, I'm all for that. That's cool. I got prompts right here. You know.

But I think it goes like some people might just mean that. And I think that's maybe overhyped in my view. But I do think there's this whole level of engineering and operations around prompts and chaining and data augmentation that is a real workflow that people can use to solve their problems. And that's more what I mean when I'm referring to like, whatever, however you want to combine the word engineering. Yeah.

with prompting and language models. Yeah, I've just been calling it AI engineering. AI engineering, that's good. Wrangle with the AI APIs, know what to do with them. That is a skill set that is developing that is a sub-specialty of software engineering. Yeah, yeah. It is what it is. And I think part of something I'm really trying to explore is this,

is this spillover of AI from the traditional ML space, like where you needed a machine learning researcher or machine learning engineer, it's spilling over into the software engineering space. And there's this rising class of what I'm calling AI engineer that is specialized in conversant in the research, the tooling, the conversations and themes. What do you think are the unique challenges that like someone coming from that latter group, like engineers that are advancing into this AI engineer position versus like

Probably more like my background where I was in data science for some time and now I'm kind of like transitioning into this world What do you what do you think are the unique challenges for both groups of people? Oh, I mean so I can speak to the software side and you can speak about the data science side It's simply that we are for many of us dealing with the non deterministic deterministic system for the first time that by the way, we don't fully control because

there's this conversation about the GPT-4 regress in its quality. And we don't know because model drift is not within our control because it's a black box API from OpenAI. But beyond that, there's this sense that the latent space of capabilities is not fully explored yet. There's 175 billion or 1 trillion parameters in the model. We're maybe using like 200 of them.

It's literally where is that meme where like we're using 10% of our brain. We are probably using 10% of what is capable in the model. And it takes some ingeniousness to unlock that. Yeah, I think from the data science perspective, there's probably a desire to quickly to jump to these other things around fine tuning or training your own models, where if you really do take this

prompting chaining data augmentation seriously. You can do a lot with models sort of off the shelf and don't need to like jump immediately into training. So I think that is like a knee-jerk reaction on our end. And fine tuning is going to be around for the foreseeable future as far as I can tell. But data scientists have maybe a different perspective

Because we've been dealing with the uncertainty or non-deterministic output for some time and have developed some intuition around that. But that's mostly when we've been controlling the data sets, when we've been controlling the model training and that sort of thing. So to throw some of that out but still deal with that, it's a separate kind of challenge for us. I just remembered another thing that we've been developing on the Latent Space community, which is this concept of AI UX.

That the last mile of showing something on the screen and making it consumable, easily usable by people is perhaps as valuable as the actual training of the model itself. So I don't know if that's an overstatement, to be honest. Obviously, you're spending hundreds of millions of dollars training models and putting it in some kind of React app. It's not the biggest innovation in the world. But a lot of people from OpenAI say ChatGPT was mostly a UX innovation. Yeah.

Yeah, I think like leading up to chat, like when I saw the output of chat GPT, it wasn't I don't think I had the same earth shattering experience that other people had in believing like, oh, this output is coming from a model like that. Sure, it came from a model.

But the reception to that interface and the human element of the dialogue, that was... So maybe it's both and, right? You're not going to get that experience if you don't have the innovation under the hood and the modeling and the data set curation and all of that.

But it can totally be ruined by like the UX. I typically give the example like one day in Gmail, I logged in and like I was typing my email and then like had the gray autocomplete. Right. I did not get like a pop up that said, like, do you want us to start writing your emails with AI? Like it just like was so smooth and it happened and it made like.

it created value for me instantly, right? So I think that there is really a sense to that, especially in this area where people have a lot of like misgivings or fear around the technology itself. And we're going to have Alex Gravely on in a future episode, but GitHub, when they had the initial codex model from OpenAI, they spent six months tuning the UX just to get Copilot to a point where it's not a separate pane, it's not a separate text box, it's kind of in your code as you write the code.

And to me, that's more the domain of traditional software engineering rather than ML engineers or research engineers. Yeah. Yeah. I would say that is probably, yes. To circle back to what we were talking about, challenges that are unique to engineers coming into this versus data scientists coming into this.

That's something data scientists, I think, have not thought about very much at all. At the very most, it's data visualization that they've thought about, right? Whereas engineers, generally, like there's some human, I mean, unless you're just a very pure backend systems engineer, like thinking about UI, UX is maybe a little bit more natural to that group. Yeah.

You mentioned one thing, which is about dataset curation. We're in the middle of preparing this long overdue episodes on Datasets 101. Any reflections on the evolutions in natural and NLP datasets that have been happening? Yeah, great question. I definitely like, I think, are you all familiar with Label Studio? And that is one of the most popular kind of open source frameworks for data labeling. And they've been, I think they've been on

We have them on the show. We try to have them on the show every year as data labeling expert. Maybe it's time for that. It's just reminding me. They just released. So Erin McHale is in the Latentspace Discord. I think you had her on at the ODSC. Yeah, she was at ODSC. Yeah, that's right. So they just released new tools for fine-tuning generative AI models. Exactly, yeah. It's a good occasion. I think maybe that being an example of this is maybe a trend now

that we're seeing there is around augmented tooling or tooling that's really geared towards

an approachable way to fine tune these models with human feedback or with customized data. So like I know with Label Studio, a lot of the recent releases had somewhat to do with like putting LLMs in the loop with humans during the label process, similar to like, I think Prodigy has been doing this for some time, which is from Spacey.

So this sort of human in the loop labeling and update of a model, they brought some of that in. But now like this new kind of set of tooling around specifically instruction tuning of models, I think before maybe people and I've seen actually this misconception. I was in a advising call with a client and they're really struggling to understand like

okay, our company has been training or fine-tuning models. Now we want to create our own instruction-tuned model. How is that different from what we've been doing in the past? And kind of what I tried to help them see is,

Yes, some of the workflow that happened around reinforcement learning from human feedback is unique, but reinforcement learning is not unique. There's an element of training in that. There's data set curation in that. There's pre-training that happened before that whole process happened. So the elements that you're familiar with are part of that. They're just not packaged in the same way that you saw them before. Now there's this...

clear pre-training stage and then the human feedback stage and then this reinforcement learning happens. So I think the more that we can bring that concept and that workflow into tooling, like what Label Studio is doing, to make it more approachable for people to where it's not like this weird, like reinforcement learning from human feedback sounds very confusing to people, like PPO and helping people understand like how reinforcement learning works. It's very difficult.

So the more the tooling can just have its own good UI UX around that process, I think the better and probably label studio and others are leading the front on leading the way on that front. I was thinking like, so labels are one thing. And by the way, okay, I'll take this side tangent on labels and I'll come back to the main point. I actually presume that scale would win everything. Yeah. And it seems like they haven't.

Yeah. And, sorry, there's Scale, there's Snorkel, there's this generation of labeling companies that came up. Like data-centric AI companies. Right, right. What happened, like, how come there's still new companies coming up? There's Labelbox, there's Label Studio. I don't have a sense of how to think about these companies. Like, obviously, labels are important. Yeah. Yeah, I think also, even before that, there was...

like tool at least features even from cloud providers or whatever like auto ml like came before that like upload your own data create your own custom model so I think that maybe it's that like companies that want to create this sort of custom models and this is just my own opinion I'll preface that maybe they don't want like when they're thinking about that problem

They're not thinking about, oh, I need a whole platform to create custom models using our data. They're more thinking about like, how do I use these state of the art models with my data? And so it's still if those statements are very similar. But if you notice, like one is more model centric and one is more data centric.

So I think enterprises are still thinking like model centric and augmenting that with their data, whether that be just through augmentation or through fine tuning or training. They're not necessarily thinking about like a data platform for AI. They're thinking about bringing their data to the AI system, which is why I think like

APIs like Cohere, OpenAI that offer fine tuning as part of their API. It's sort of like people love that. It makes sense. Like, OK, I can just upload some examples and it makes the model better. But it's still like model centric, right? Yeah. I get the sense that OpenAI doesn't want to encourage that anymore because they don't have fine tuning for 3.5 and 4. And then so the last thing I'll do about data sets and we can go into the lightning round is I was actually thinking about unable data sets.

for unsupervised learning or self-supervised learning, right? Like, that is something that we are trying to wrap our heads around. Like, Common Crawl, Stack Overflow Archive, the books, you know, like, I don't know if you have any perspectives on that, like, the trends that are arising here, the best practices. And, like,

As far as I can tell, nobody has a straight answer as to what the data mix is. And everyone just kind of experiments. Yeah. Well, I think that's partly driven by the fact that the most popular models, you don't really have a clear picture of what the data mix is, right? So the people that are trying to recreate that and they're not achieving that level of performance, right? Then one of the things they're thinking about is, well, what are all the different data sources?

mix options that I can try and try to replicate some of what's going on. Right. So I think it's partly driven by that is like, we don't totally know what the data mix is like sitting behind the, the curtain of open AI or, or others. Yeah.

But I think there's a couple of trends, I guess, which you've already sort of highlighted. One is like, how can I mix up all of these public data sets and filter them in unique ways to make my model better? So I listened to a talk, I believe it was at last year's ACL conference.

And they did this study of common crawl, right? And they found that actually a significant portion of common crawl was like mislabeled all over the place, right? Like trash. Yeah. So like, I think it was 100% of the data that was labeled as Latin character Arabic. So Arabic written in Latin characters was not Arabic, like 100% of it.

And there was like all sorts of other problems and that sort of thing. So I think there's one side, one group of people who,

or set of experiments that you could think about as like, how do I take these existing data sets, which I know have data quality issues, or maybe other data biases or problems that I would like to filter out, like not fit for work data, that sort of thing. So how do I create my own special filtered mix of these and train a model? So that's one kind of genre.

And then there's the other genre, which is like maybe taking those but augmenting them with this simulated or augmented data that's out of a model, like a GPT model or something like that. So I think you could combine those in all sorts of unique ways. And I think it is a little bit of like the Wild West because we don't totally have a good grip on what is the winning strategy there. And so I think that's where I would also encourage people to try a variety of models.

So this is maybe a problem with benchmarks in general, right? Like you can see like a

open large language model benchmark on Hugging Face and these models are at the top. And you could come away with that and say, well, anything below the top three I'm not even going to use. But the reality is that each of those had a unique sort of flavor of this data under the hood that might actually work quite well for your use case. So one example that I've used recently in

in some work is the camel 5 billion model from writer. You know, it doesn't work great for a lot of things, but there's certain things around like marketing copy and others that it does a really good job at. And it's a bit smaller model that I can host and run. Um,

And I can get good output out of it if I put in some of that workflow and structuring around it. But I wouldn't use it for other cases. But that has a lot to do with the data. And I'm guessing writers focus on that copy generation and such.

So, yeah, I would encourage people specifically on this topic to maybe think about what's going on under their hood and also give some models a try for different, like gain your own intuition about how a model behavior might change based on like how it was trained in the mix of data that went in. Awesome. Let's jump into the lightning round. We have three questions for you.

It's lightning, but you can take 30 seconds to answer. All right, cool. So the first question is around acceleration. What's something that already happened in AI that you thought would take much longer? Yeah, I think the thing that I was thinking about here was like how general purpose these large language models are beyond traditional NLP tasks. So it doesn't surprise me that maybe they could do like sentiment analysis or even like NLI or something like that.

These are things that have been studied for a long time. But the fact that I can, like at ODSC, I was in like a workshop on fraud detection and they were using like some, I forget the models they were using, some statistical models to do fraud detection. I was like, I wonder if I just like,

do a bit of chaining and like insert some of the examples of these insurance transactions into my prompts. If I can get the large language model to detect a fraudulent insurance client, um,

And it seemed to like, I got pretty far doing that. So that fact of like, you can do something like that with these models, they're that generalizable beyond traditional NLP techniques, I think is surprising to me. Awesome. Exploration. What are the most interesting unsolved questions in AI? Yeah, I think there is still such a focus on English and Mandarin. It's like,

that like you're kind of large language model wise if you look at the drop off and performance after you get past like English Mandarin German and

It's Spanish to some degree, but German is actually better than Spanish because of how much it's been studied in NLP. And of course, Mandarin has a lot of data. Spanish still does good, but like there's languages, even in the top hundred languages of the world that are spoken by millions and millions and millions of people around the world that don't

like perform well in these models. So that's like thing one, but even modality wise, I know there's a lot of work going on in the research community around sign language, but like there's all of these different modalities of language. Written text is not, does not equal communication, right? Written text is a synthesis of communication into a written form that some people consume, but like,

combination of all of these modalities along with all of these languages, there's just so much room to explore there and so many challenges left to explore that will eventually, I think, help us learn a lot about communication in general and the limitations of these models, but is an exciting area. It's definitely a challenge, but an exciting area. Awesome, man. So

So one last takeaway, what's something or a message that you want everyone to remember today? Yeah, similar to when you were asking about my workshops, I think I would just encourage people to get hands-on with these models and really dig into the new sets of tooling that are out there. There's so much good tooling out there to go from like a simple prompt to

to inject your own data, to form like a query index, to create like a chain of processing, even like trying agents and all those things, like get hands on and try it. That's the only way that you're going to build out this intuition. So yeah, that's, that would be my encouragement. Excellent. Well, thanks for coming on. Yeah. Thank you guys so much. This is awesome.