We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode 879: Serverless, Parallel, and AI-Assisted: The Future of Data Science is Here, with Zerve’s Dr. Greg Michaelson

879: Serverless, Parallel, and AI-Assisted: The Future of Data Science is Here, with Zerve’s Dr. Greg Michaelson

2025/4/15
logo of podcast Super Data Science: ML & AI Podcast with Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

AI Deep Dive AI Chapters Transcript
People
G
Greg Michelson
Topics
Greg Michelson: Zerve平台在过去一年中取得了显著进展,推出了许多新功能,例如Fleet功能,它利用无服务器技术实现了代码执行的大规模并行化,极大地提高了处理速度,尤其是在处理大型语言模型调用时,无需编写额外的代码,也不会增加成本。此外,Zerve集成的AI助手可以帮助用户编写代码、构建模块,并简化了整个编码过程。Zerve的图架构允许同时运行多个代码块,支持多人协同工作,提高了团队协作效率。Zerve并非低代码/无代码工具,而是代码优先的数据科学环境,它通过协作式图环境和并行化等特性,帮助代码优先的数据团队将模型开发周期缩短多达9倍。每个节点都是一个代码窗口,用户可以全屏查看代码、输入和输出,方便代码预览和调试。Zerve支持与多种大型语言模型交互,包括OpenAI、AWS Bedrock和Hugging Face,用户可以根据自身需求和数据安全考虑选择合适的模型。Zerve的AI助手是一个可以操作画布的代理,可以根据自然语言指令创建整个项目工作流程,极大地提高了开发效率。 Jon Krohn: 作为主持人,Jon Krohn主要负责引导话题,提出问题,并对Greg Michelson的回答进行总结和补充。他从用户的角度出发,提出了许多实际问题,例如如何理解Zerve的DAG架构,如何集成LLM,以及如何解决SaaS公司面临的挑战等。

Deep Dive

Shownotes Transcript

Translations:
中文

This is episode number 879 with Dr. Greg Michelson, co-founder of Zerve. Today's episode is brought to you by Tranium 2, the latest AI chip from AWS. And by the Dell AI Factory with NVIDIA.

Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week, we bring you fun and inspiring people and ideas exploring the cutting edge of machine learning, AI, and related technologies that are transforming our world for the better. I'm your host, John Krohn. Thanks for joining me today. And now, let's make the complex simple.

Welcome back to the Super Data Science Podcast today, the highly technical but also highly hilarious Dr. Greg Michelson returns to this show for the first time. Greg is a co-founder of Zerve, a super cool platform for developing and delivering AI products that launched to the public on this very podcast a little over a year ago.

He previously spent seven years as DataRobot's chief customer officer and four years as senior director of analytics and research for Travelers Insurance. He was a Baptist pastor while he obtained his PhD in applied statistics from the University of Alabama.

Today's episode is on the technical side, and so we'll appeal most to hands-on practitioners like data scientists, AI or ML engineers, and software developers. But Greg is such an engaging communicator, a Baptist background coming in handy, that anyone interested in how the practice of data science is rapidly being revolutionized may enjoy today's episode.

In it, Greg details how Zerv's collaborative graph-based coding environment has matured over the past year, including their revolutionary fleet feature that allows massive parallelization of code execution without additional cost.

He talks about how AI assistants are changing the coding experience by helping build, edit, and connect your data science projects, why the rise of LLMs might spell trouble for many SaaS businesses as building in-house solutions becomes increasingly viable, and the innovative ways companies are using RAG, Retrieval Augmented Generation, to create more powerful AI applications. All right, you ready for this entertaining and educational episode? Let's go. ♪

Greg, welcome back to the Super Data Science Podcast. It's great to have you back. Where are you calling in from this time? I'm home. I'm in Elko, Nevada. Very nice. Which is a small town. We're about three hours west of Salt Lake City. Cool. So that's like the nearest city, like you're nowhere near Las Vegas, for example. Vegas is like eight hours south. Oh boy, yeah. Yeah, and then Reno is probably four hours to the west. So we're about midway between Reno and Salt Lake City.

Turns out Elko is the gold mining capital of the U S so. Oh, really? Not, not many people have heard of it, but they pull an awful lot of gold out of the ground here. No kidding. Still today. Yeah. Still going on. That's cool. Well, you know, we still need it. So we still desperately need it. Uh,

I guess there are actually real world applications of gold, but I think most of it goes into jewelry, which I guess some people also. I don't know. I think most of it probably goes into electronics. Oh, yeah. Yeah. I have no idea. I'm speaking out of complete ignorance here. Well, listeners can let us know. Dunning Kruger at work here. Yeah. Yeah. We're not going to we're not going to do research right now to figure that out. But it's not it's not data science related enough.

Yeah, this is a Joe Rogan. So what's been happening since you were on the show a little over a year ago? I hear you have big personal news, actually. Nevada-appropriate personal news. What?

Well, I did get married last week. Went down to the courthouse and got hitched for the second time. Nice. Congrats. Congrats, Greg. Thanks. That's big news. So we did have you on, as I said, a little over a year ago. That was episode 753. And so in that, our listeners did get an overview of what Zerve was like. But I understand that you've released quite a few new features since then.

Yeah, it's been a wild year. We did some real basic stuff like integrating with GitHub and Bitbucket and stuff like that.

But we also have added a feature called the fleet, which is coming out like this week. And that's a way to massively parallelize code execution using serverless technology. So you can, you know, if you wanted to make, let's say you wanted to make a call to a large language model, but you wanted to do it, say, a thousand times.

you know, they're slow, right? Everybody's used chat GPT. You know, you can type it, ask it a question. It might take, you know, 20, 30, 40 seconds for it to come back. So if you're trying to do that a thousand times, doing it in series is a kind of a pain, but.

So you could do like multi-processing or something like that, but then you got to like manage the pools and like figure, you know, there's a coding challenge there. Not, maybe not a challenge for the experts, but it takes more than one line of code. But Zerv is a block-based, you've seen it. It's the code is arranged as a DAG. And so each block when,

when you execute it, spins up serverless compute and executes. Well, it turns out that it's dead easy to just parallelize by spinning up lots of serverless compute. And the upside is, you know, you don't have to write any code to do it. And it doesn't cost more because it's the same amount of compute. It just happens all at the same time. So the fleet is pretty awesome.

And then our AI assistant that kind of like can help you to write code within the app, can build its own blocks, can do some of that stuff. It's sort of modern coding, you know, because the large language models really revolutionize the way people code. It's just not at all the same.

same kind of endeavor as it was. So yeah, some really, really cool stuff. 100%. And so Zerve in general is about making creating code faster, easier to understand, easier to collaborate on. So we'll get into those details like the AI system, which accelerates that even more. We'll talk about the LLMs and that kind of parallelization, the surplus aspects of that as well. But

quickly first, we should for folks who haven't listened to episode 753, we should fill them in a bit more. So now we know it's a directed acyclic graph. So maybe we should kind of explain that term a little bit. So for example, so a directed acyclic graph, it's basically what it says, everything's in there and in the term. But to break it down, you have a graph, which means you have nodes and edges connecting those nodes.

It's directed, which means you have some kind of flow. It's not just that the points in the graph are connected, meaning like, oh, these are all my friends who are connected in a graph. No, it's directed. There's some kind of flow of information between all those no's in the graph. So it's like L

outlining a process, like a data science process, a data modeling process, a data engineering process, all that stuff could be nodes and directed edges in your DAG. And then the final term there, the A, means that nowhere in your directed graph is there a loop. Do you

You got it. Yeah, that's a good description. Yeah. In Zerv's graph, we call it a canvas, the code lives in the nodes in your graph, and then the connections indicate

data and memory flow, right? So if I execute my first block and it is say a SQL query to a Snowflake database, then the data frame that gets created will pass down that node to the subsequent blocks. And you can fork and go one to two, and then you can merge back together and go two to one and so on.

The nice thing about the architecture is that you can run as many blocks simultaneously as you want. So you can have multiple people in the same canvas, all running code at the same time, writing code in Python, in R, in SQL. It's just kind of a wild mix and match environment where anybody can do anything. It's really cool.

That's one of the cool things about it. It reminds me of the shift, and I can't remember if I said this in 753 or not. So my apologies to listeners who listened to that if I'm recycling this analogy again. But what Zerv reminds me of is the shift from having Microsoft Docs that you emailed around to each other and you completely lose track of what versions you're on. Somebody's working on, you know, you have a big legal document.

and your legal team is making corrections while their legal team is making corrections and you're reading it and taking notes. And then that merger becomes a nightmare. Deserve.

Does that does the same kind of thing in data science that Google Docs did where you could all of a sudden be collaborating altogether on the same, say, legal document? And you can see when you know that somebody is typing, you can see that they're leaving a comment. You can do things in real time. You could be potentially on a Zoom call at the same time and talking it over. And so Zerve allows that same kind of interactivity and visibility and parallelization that Google Docs did.

Yeah, exactly. So if you've ever worked in like a Jupyter notebook, then you know that sharing and collaborating on those is a nightmare, right? Like, oh, and commit. And like, every time you open it, the metadata changes. So yeah, you end up with tons of merge conflicts and, and all that stuff, which is a nightmare.

And so you end up with files that are like, you know, document final, document final, final, final, like document really final this time. Use this one. So just the file name is hilarious.

Yeah. And so the net effect with a tool like Zerv is that according to your materials, and so you're going to have to explain this to me, it says that it empowers code first data teams to cut cycle times by up to nine times. So there's a couple of things that's interesting about this. So the first thing is code first data teams. It sounded like when you described that DAG that it could be a low code, no code tool.

But in fact, we're talking about empowering code-first data teams. So that's something to clarify first. And then how does Zerv empower code-first data teams to cut cycle times on model development up to, yeah, basically 10x? Well, yeah, code-first, it's definitely not a no-code, low-code tool. Although large language models are making that a little bit fuzzier. The 2010s was really like the era of the low-code, no-code tool.

I was at DataRobot, you know, one of the original low-code, no-code data science tools. I was there for seven years or so. And I think everybody has pretty well realized that low-code, no-code, you know, the emperor's got no clothes on that.

Uh, you know, like you, anytime you run into a complicated problem, you're going to, you're going to exceed the bounds of what you can do in a low code, no code tool. And you're going to have to, and that's why like they all like data robot bought a, uh, a notebook environment, Databricks introduced a notebook environment. Like they're all kind of shifting to coding environments because they're realizing that the only people that actually generate value from data are the experts, the ones who are writing code.

And so we very much started out as a coding environment, a place where you could write code.

We still are. Like we have, there's been no pivot. We are a coding environment, but now with large language models, it's, you know, it's not exactly the same type of thing. Like you're not coding the same way that you were where you'd sit down at a blank, you know, a blank screen and just type code. Instead, you're sort of describing to the large language model, what you want to build. And then you're taking that code and maybe you're tweaking it, or maybe you're interacting with the large language model to make it do what you want it to do. And so it's still very, very code based, but it's,

It's just a different experience. It's really wild how coding has changed in such a short time over the last two, three years. For sure. It's completely transformed. It does allow people who have never code before to suddenly kind of be able to get into that. We had an episode recently with Natalie Monbiot. It's episode 873. And in that episode, she talked about how she never learned how to be a developer, but now she can pick up

tools and use something like Claude or ChatGPT and be able to generate code yourself. Now, it's interesting because

It's not as turnkey as you might like it to be today, where she described how she kind of, she was like, okay, cool. I can just code. But then she quickly realized, well, you know what? I actually do also need to read like an introductory Python book at the same time. Yeah. Yeah. There's, there's like little, like you couldn't sit down with your grandma and just do it. Right. But you know, after, after some time, grandma could pick it up probably, you know, but there's, there's still, there's still value in experts and you still need to know how to read it and,

And stuff like that, especially particularly when projects get much more complex. You know, it's good for good for getting started. But then at some point, you know, we're not we're not exactly to the place where it's it's no expertise required. And I don't think we will be for a while.

And so it's very easy for me to imagine how in a tool like Zerv, you can arrange your directed asythic graph so that you're taking in some data input or maybe multiple data inputs. You're doing data pre-processing on each of those streams separately. And because they're different things, like one is weather, another one's stock prices, another one's images, you have all these different data inflows that need completely different kinds of data pre-processing. And so it makes sense to see that laid out in a Zerv DAG.

And then you talked about how you can have multiple streams in that day kind of combining into each other or forking away from each other. And so, for example, if you have three data inputs, you might want to have them all go into the same models. You merge them together onto a node and then you have them flow into a neural network or something like that. That's all easy for me to visualize and probably for our listeners to visualize. So how does the code component work?

manifest in that environment? How do you see it? - How do you see the code? - Yeah, like how do you, like, you know, if you have these nodes, I feel like it's now a dumb question with the way you reacted, but you know, like, do you like click on a node and then you see some code under the node? - Sure. So the node itself is a text editor. It's a code window. We use Monaco, it's open source. It's the same code editor that VS Code uses.

So each node is a little text window. And it turns out when we looked at the way people are using it, they actually go into full screen mode. So for each node, you can click into a full screen mode that gives you, imagine kind of a heads up display on that particular block. Like the central area is the actual code.

And then on the left side of the screen, you have all of the inputs, like what data is being fed in from upstream blocks. And then you have all the outputs on the right side of the screen. So you can actually, one of the cool things about working in Zerv is that if I run my code and I say I do something to a data frame, maybe I convert a variable from character to numeric or something like that, then I can click on it on the left side and have a preview. And I can click on it on the right side and have a preview. And I can compare what did my code actually do

without having to type like, you know, print df.head and, you know, all that sort of stuff in order to actually see it. Because, you know, if I go back and look at, you know, notebooks or projects that I'd worked on in the past, those are everywhere. And this is a whole task around like, okay, how do I take out all my print statements? Because I need to look at my variables. So yeah, it's neat to be able to preview. This episode of Super Data Science is brought to you by AWS Tranium 2, the latest generation AI chip from AWS.

AWS Tranium 2 instances deliver 20.8 petaflops of compute, while the new Tranium 2 Ultra servers combine 64 chips to achieve over 83 petaflops in a single node. Purpose built for today's largest AI models.

These instances offer 30 to 40% better price performance relative to GPU alternatives. That's why companies across the spectrum from giants like Anthropic and Databricks to cutting edge startups like Poolside are choosing Tranium 2 to power their next generation of AI workloads. Learn how AWS Tranium 2 can transform your AI workloads through the links in our show notes. All right, now back to the show.

- So let's talk more about the LLM aspect of this now. So you kind of got into it a little bit there. Many companies are struggling with leveraging large language models

into their businesses. And part of why that is a struggle for them, despite the obvious value that you get as a data scientist or a software developer, when you're working on your own projects outside of a business. And part of what makes that tricky for companies is that they're worried about their intellectual property. They're worried about sending off their company software, their most potentially, if they're a software company, the most valuable IP, just sending that off

to open AI. In a prompt. Yeah, in a prompt. Yeah, exactly. And it should be. Yeah. So how do you resolve that with now your integration of LLMs into the product? And actually, it would be helpful to know in the same way that you just... Here's a term. I just realized as node and code rhyme so nicely with each other. You talk about no code environments. You've created a node code environment. Yeah.

I appreciate that so much, actually. It's a code node. Yeah. And so then how do you integrate LLMs into your code nodes? How do those manifest? How do you experience them? And then how can companies feel comfortable using those? Yeah, so there's a lot to say there. There are really three options at the moment inside Observe for interacting with large language models. Well, four, really.

The first is OpenAI. So at the end of the day, OpenAI's models are the best. They're better than all the open source ones, at least in my experience. I'm not sure if I believe all the benchmarks that are out there about evaluating performance of LLMs and which ones give better answers and all that sort of thing.

But anyway, so we have open AI and, you know, if you interact with chat GPT, then you are sending stuff to open AI and there's kind of no way around that. So many people are not comfortable with that. And so we also integrate with AWS bedrock, which is Amazon's hosted large language model service. And they have some security stuff around that so that those models are, are open source or hosted by Amazon.

AWS. And so you, you know, if you can trust AWS, all your data lives there anyway, you know? So, so maybe you're a little more comfortable with the IP issues there. And then the third thing is hugging face. So the, the open source models that are out there, you can actually instantiate those within a project of yours and use GPUs. And so in order to have that thing actually physically hosted in your environment so that you're sending your prompts to yourself and

And I guess the upside there is you can do some fine tuning. You can really make those make those models yours in a significant way. So Zerv is designed to be self-hosted. So all of the all your data and all your compute lives in your environment anyway. So you've got some choices around, OK, where do I where can I send my prompts? Do I need is it low, low risk so I can use OpenAI? Is it am I comfortable with Bedrock?

or do I want to actually host the things on my own infrastructure? It's nice that you offer that flexibility. And so it seems like that would suit kind of everyone. Is it more complicated for somebody to get set up if they're running, say, Hugging Face models on-prem, or that's kind of just as easy, just as turnkey as using OpenAI API? Yeah, so if you can see it, then you can talk to it. So if you have it hosted somewhere else, it doesn't have to be hosted in Zorv.

Although, yeah, I guess it depends on the infrastructure. That's complicated. Everything that we do is serverless. So you don't have like long running servers. And so it may be more cost effective to operate that way.

But, you know, that's a complex conversation. So you might be running on like Kubernetes stack and you've got like existing hardware and stuff. So it doesn't, you know, all that sort of stuff. So, yeah, we work with each individual customer to figure out their situation in terms of where do they want their data to live? What compute do they want to use? Which cloud provider? All that kind of stuff to figure out. Perfect. Sounds great. Okay. And then kind of my last question around.

features, new features in Zerv since you were last on the show is you talked about having an AI assistant. And so how does that, is that kind of the same thing? Is this LLM that's helping you generate code or is that something separate, a conversational kind of thing in natural language? Yeah, it's different. So we started the first LLM integration that we built were what we call Gen AI blocks.

So in those blocks, you're not typing code, you're typing in a prompt. And those prompts can be dynamic. So you might take and connect a code block to a Gen AI block and pass it some variables that you want to include in your prompt in order to get that sort of query back. And then you can take the output of that query and use it

in downstream. So those are sort of internal. The AI assistant is above that. So you have a prompt, like a text field, an interaction, a chat space with a large language model where the AI assistant is actually an agent and it can do stuff to your canvas.

so i might say hey build me a canvas that does you know that that takes this data frame and you could reference a a table in a in a snowflake cluster or databricks setup or whatever and you know write me an analysis that does whatever you know and then it would actually create a plan around that and then uh actually it would be able to create the blocks and connect them and all that sort of stuff so

So yeah, the AI assistant is really an agent that will actually do stuff for you. You can kind of direct it to edit different parts of your project or create new bits. That sounds pretty damn cool. That sounds really cool. So you could potentially have, so if you had like three people, three humans collaborating in real time on a Zerf canvas, you could have three people plus their three assistants all kind of generating nodes and figuring out how these flows work in some complex machine learning flow.

Yeah, exactly. Nice. Okay. So this now starts to feel like we're talking about a completely new era in data science. Like Zerve is at the forefront.

of building this new industrialized data science, as opposed to a more artisanal data science where you have individuals working alone in the Jupyter notebook typing out each character of code. Yeah, locally. Exactly. And so this is now this kind of industrialization of data science that involves adopting platforms,

machine learning ops systems and automation. What role does Zerve play in that? And obviously you have a ton of experience with kind of automating things with your years of experience at DataRobot before that. And so fill us in on how you see this transformation happening, how organizations can

best harness that transformation and be more successful on developing and deploying AI projects? Yeah, we think about ourselves as kind of a full stack kind of data science environment. We don't do like the hardware and the data warehousing type stuff, but we connect to all of those solutions. But everything else we do, you know, we've played with the term operating system for AI for

uh in terms of like how do we talk about ourselves but we really do all the things so you can connect to data you can explore you can create visualizations you can publish like reports uh you know we integrate with a bunch of dashboarding things i just did an aws quick site dashboard for a conference that i was speaking at you can train models you can use gpus you can

And then when it comes to deploy, you can really easily build your own APIs that you can host within Zerv or download and take somewhere else. You can deploy using SageMaker. You can schedule jobs like...

The whole lifecycle of these data science projects is built into one application, and you can do it in any language you want. So you could have your data engineers writing SQL. You could have your old school statistician folks writing R code. You could have your machine learning engineers doing Python, and it all syncs to GitHub or Bitbucket or whatever.

source control you use so that if you know you've got an engineer down the line and they want to stay in vs code it's fine you know you it all it all pulls together on on uh on github so yeah it's awesome it's full stack very cool uh yeah any thoughts on the kind of my question there about how

I don't know if you have any guidance for organizations who are trying to take advantage of this kind of shift. So, you know, it could be Zerve or maybe some other kind of tool that's industrializing data science in some way. How can organizations be successfully adopting those, taking advantage of them to get more AI projects into the real world that are profitable? Yeah, there's definitely this...

businesses have always had to sort of juggle between build versus buy, right? There's a million vendors out there that are doing a million different things. And, you know, some organizations are like, okay, we'll go buy that. Others are like, okay, let's staff a team that can build some of this stuff. And so you still have that sort of set of trade-offs when you, when you go down that road, but the large language model stuff has made it so much easier to build and

that I think the calculation on build versus buy is super different now than it was a few years ago. It's just so much, it's going to be so much cheaper to build than it is to buy these sort of things. I actually think it's sort of an existential crisis for a lot of these SaaS vendors that are building kind of custom built use case type solutions.

and then charging them in for them. Yeah, who knows what the space is going to look like in a few years. I think it's got ramifications for the VC space, certainly for the software as a service space. Yeah, it'll be interesting to see what happens. Yeah, it is interesting. Everything is moving really quickly. And so hopefully people are taking advantage of things like their favorite data science podcasts to stay on top of all those kinds of things. You mentioned earlier how...

how Zurb integrates with existing data stacks. And so what this kind of flexibility, it seems like it's kind of obvious to me, but maybe you can give us some specific examples, maybe with existing clients that you have or other experience that you have in your career on why it's important to be able to integrate with such a broad range of different kinds of data stacks. What are the main data stacks out there that you see people using? And yeah, what's the advantage of having flexibility across all

all those different stacks. By data stacks, you mean like warehousing type stuff, like Snowflake Databricks type? Yeah, exactly. Yeah. Well, I mean, it's kind of a tail wagging the dog type situation, I think. Like there are a lot of considerations in terms of like picking your applications that you use to store and interact with your data.

that are more than just like, okay, where does my data science happen? You've also got integrations with your finance systems and all the different applications that actually interact with your data. It's not just data science. There's way more that goes on there. So it's not often that the data scientists get to decide

you know, what, what storage is used, particularly in larger organizations when, you know, they've been storing data for years and years. And so there's entrenched sort of legacy systems where, where it's stored. So newer companies tend to be using, you know, things like Databricks and, and Snowflake and things like that. Whereas older companies might have, you know, you might even see mainframes at a big bank or, you know, whatever it is. So,

So our approach is just like people's data is going to be where it is. And, you know, you can access it via code. And so whatever, wherever, wherever it is, we want you to be able to get to it.

This episode of Super Data Science is brought to you by the Dell AI Factory with NVIDIA, delivering a comprehensive portfolio of AI technologies, validated and turnkey solutions with expert services to help you achieve AI outcomes faster. Extend your enterprise with AI and GenAI at scale, powered by the broad Dell portfolio of AI infrastructure and services with NVIDIA industry-leading accelerated computing. It's

It's a full stack that includes GPUs and networking, as well as NVIDIA AI enterprise software, NVIDIA inference microservices, models, and agent blueprints. Visit www.dell.com slash superdatascience to learn more. That's dell.com slash superdatascience.

Nice. All right. So let's talk about an application that takes advantage of all those kinds of data stacks. You could have tons of information stored in whatever kind of database and a really kind of buzzword technology today, which I would say was even buzzier in 2024, is this idea of RAG, retrieval augmented generation. But it is a really powerful thing. And me and companies that I've worked for, we've had a lot of great experience leveraging RAG. So

For example, as a concrete example that will make the value of RAG clear at a company that I co-founded, Nebula, they are a human resources platform that's allowing you to search over all the professional profiles of everybody in the US in seconds. And so you're talking about hundreds of millions of professional profiles, and each one of those professional profiles is a document filled with natural language.

And so we can pre-compute vectors. So you can take a deep learning model of some kind, often today a large language model. And so you can take the natural language on each of those hundreds of millions of documents, encode it into a vector, so just a series of numbers.

And so every single one of those hundreds of millions of documents gets encoded as a vector. And so you can imagine you kind of end up with this huge table with, say, 100 million rows representing each of my 100 million profiles that I have in the database. And then you have

however many number of columns you think is important for your vector space. And so basically there's a classic computer science trade-off of you can double the number of columns, but then that's going to double the amount of compute that's required. So you kind of find this sweet spot for your particular application. And yeah, so you might have 64 or 128 or 3000 columns depending on your needs.

But basically, you end up with 100 million rows, something on the order of hundreds or thousands of columns representing the location of each of those 100 million documents in your high dimensional space, in your 100 or your 1,000 or 3,000 dimensional space is kind of the highest number I went to there in my little example. So you precompute all those.

And then a user comes into the platform. In our case at Nebula, it was, I'd like to hire a data scientist in New York. Then we can convert that in real time in milliseconds into its own vector and find related documents across the hundred million profiles. And

Yeah, and then you can with today's big context windows, you could take all of those documents potentially like you know the top hundred documents that come back, throw them all into the huge context window of a generative large language model. And then you can be the generative LLM can be answering questions or pulling out information across those hundred documents so hugely powerful technology.

Definitely worth exploring for people. There's probably some listeners out there that are like, I know John, I know what right is. But for those who don't, it's definitely worth exploring and understanding. And so,

Zerv enables RAG to be scalable. So it distributes compute workloads automatically because there's a lot of a RAG that can be quite technical and difficult to get right. So yeah, how does Zerv's approach to parallelizing these RAG workflows compare to if you tried to do that on your own, if you tried to figure out all the pieces of what I just described on your own? Yeah. The...

Every example is different. We worked with an organization that was doing media recommendation and it was something similar there. So you would want to go and type in, you know, like recommend for me sports comeback movies or whatever. And they would come back with, oh, you should watch Rocky, you know, whatever it might be.

But it turned out that naively querying these large language models would often give you results that weren't very good. So, like, I recall one example, we fed it Dune, you know, the movie Dune. I like Dune. Give me more movies like that. And it ended up giving the top five responses. Four of them were other versions of Dune.

So I was like, all right, that's not ideal. So we ended up doing something similar on the rag front where we went out and we got, you know, like top podcasts and New York Times reviews. And we brought in all sorts of other documents to add into the

to the context window, like you say, and we pulled it all back together into a recommendation that was then significantly better. And it included things like video games and podcasts and books and magazines and stuff like that. So more than just movies. So it became kind of a much livelier, much richer source of recommendations.

So yeah, that rag stuff is super, super convenient. Being able to lay it out in a Canvas-style graph as a DAG makes it really easy to see what's going on. And it makes it really fast to experiment with how it works. And then to the extent-- like in your example, where you have 100 million people processing all of that stuff, parallelization is key for a lot of different use cases.

So we like one use case that we just worked on was try to evaluate the performance of a call center based on audio files.

uh audio recordings of like mp3 file recordings of conversations with uh with customers as was at a bank you know that's a lot of processing right converting all of the all the recordings into text that you could then pass to a large language model you need to be able to do parallelization for stuff like that otherwise it's going to take you know it's it's uh it's not not feasible to do it

So the parallelization turns out to be key for a lot of these use cases. But it doesn't have to be just large language model stuff. It could be heavy compute loads of other types. We're agnostic in terms of what loads are actually getting parallelized.

We just make it easy to do it. All right. So yeah, that's cool, Greg. And so I've got another great soundbite for you here. So in addition to your code notes, did you realize that you have ragdags?

Oh, I don't hate that. That is definitely going to be reused. Yeah. And so it sounds to me like the key, whether it's a reg dag or some other kind of high compute load dag, one of the key wonderful things that Zerv is doing is distributing those workloads automatically, which is cool. Nice. So another kind of tricky thing that data scientists, maybe even myself, have difficulty with is

is deploying AI models. So something that's been intuitive for me for literally decades is opening up

some kind of IDE, Jupyter Notebook, something like that, and getting going on inputting some data, doing some EDA, and building a model. But the thing that hasn't been intuitive to me, and it's probably just because I haven't been doing it as much, had the luxury of working at companies where machine learning engineers or software developers, backend engineers, take then the model weights that I've created, and they put them into a production system. So

on a smaller team or on a team where there's huge demand for software engineers, which is often the case, you can end up having more data scientists creating models than there are software engineers to deploy in a lot of companies. That creates a bottleneck. So how does ZURV's built-in API builder and GPU manager remove those kinds of barriers?

Yeah, it's not just a bottleneck. It's also kind of a problematic dependency because at the end of the day, the software developers that are deploying these things are probably aren't data scientists. So it's not obvious that they are going to understand what is supposed to be done. And, you know, there's a lot of subtlety to this sort of thing. So you can get mistakes introduced in really easily here as well.

So, yeah. So like if you think about the deployment process and, you know, you're there's a lot of a lot of hurdles to overcome. If you've ever been slacked or emailed a Jupyter notebook and tried to run it, you know what some of them are. Right. Like you have the wrong version of this package installed. Oh, you got to pip install a whole bunch of other stuff to make that work. And so you might spend an hour trying to get your your.

trying to even get the code to run, assuming that you have the data and that all the folders and file paths are the same and all that sort of stuff. So, you know, at the end of the day, what data scientists spend most of their time doing today is building prototypes. And then those prototypes get handed off to another team to kind of like recode

in another environment with, you know, you know, Dockerized and deployed and managing servers and stuff like that. But it's not obvious to me that data scientists know how to do that. And it's really not obvious that they have the privileges to do those kinds of things in terms of just like the infrastructure and all that kind of stuff. So Zerf kind of like,

handles all of those problems. So every canvas inside Observe has a Docker container that's supporting it. So anybody that logs into that canvas doesn't have to worry about dependencies because it's all saved in that project. And so those environments are reusable and shareable and so on. So if I wanted to start a new project using the same Docker container that the

Another project was in, it's really easy to do that. And so, you know, when you have a new data scientist join your team, they don't have to spend their first week getting Python installed and making sure everything, oh, we use NumPy 0.19 and you've got 0.23 installed. And like none of those conversations have to really happen anymore because we manage all of that.

And then let's say that I did train like a random forest. I mean, you mentioned using your weights. Like if I train a linear model or a logistic regression or something, then maybe it's just a vector of weights that need to be handed off. But if it's a more complicated model, like a random forest or an XG boost or a neural network or something like that, it's not as simple as just like, here's some weights to put into a formula.

It's a more complex thing. And so then you've got to figure out, okay, I'm going to serialize this model, pickle it, and then dump all the dependencies out and dockerize it and then hand that thing off. And that's also beyond the skill set of a lot of data scientists too. So Zerv handles all of that. So every block inside of Zerv, when you execute it, it creates serialized versions of all of the variables that you've worked through.

So if I train a random forest in a model or in a block, then it's there and it's accessible. So I can access it from external to Zerb using like an API. I can reference it in other layers. So when it comes time to say make an API, maybe I want to make a post route where I send in a payload of predictor columns.

and then I want a prediction back from that random forest. Well, then I just say, hey, remember that random forest? And I just point at it instead of having to figure out, you know, like how to package that thing up so that it could be deployed as an API. So we...

We handle all of that stuff. And then when you deploy and serve, you also don't have to worry about the infrastructure stuff because all of our APIs utilize lambdas, like serverless technology again, so you don't have long-running services that are out there. It's just there. So a lot of the infrastructure stuff and the DevOps stuff and the kind of picky engineering stuff that can trip you up is stuff that we've just sort of handled so that it's easy for the user.

And that means that data scientists can start to deploy their own stuff. But in some organizations, they still might not be allowed. So then we have like a handoff system where it's really easy to take something that a data scientist has done, who, by the way, aren't building prototypes anymore. Now they're building software that can actually be deployed in Zerv. And we can hand that off to other teams to actually do the deployments. Awesome.

Awesome. That does sound like something that would be useful to me. And I'm sure a lot of data scientists out there helping me get my models into production and feel confident about what they are. I like how you kind of turn that on its head as well and made me feel good a bit about myself, about the skills that I do have and a software engineer might not in terms of understanding the model that I've built. So I really appreciate that.

I think that's the end of my questions that are directly related to Zerve in any way. But kind of as I transition more broadly, how can somebody get started with Zerve today? How can a listener who's heard all these great features that Zerve offers, this completely new way of working on data science, how can they pick that up and get going today?

Well, we've got a free tier that folks can get in and utilize. And so you have all the flexibility and stuff. There are some caps with respect to like compute and stuff like that. We originally had a free tier, but we had to shut it off for a little bit because of the Bitcoin miners. They went a little bananas. So we had to turn that off and build some controls and stuff like that. But the free tier is back.

And so anybody can get on there and get in and give it a go. Those damn Bitcoin miners. What won't they ruin? And so, yeah, so now moving beyond Zerv specific or questions that could really be directly related to Zerv in any way. You mentioned earlier in this episode how AI could kill, how large language models could kill a lot of SaaS, software as a service businesses. So some companies like Klarna are communicating

are combining AI standardization and simplification to shut down SaaS providers. Specifically, they said, I mean, they have a specific quote in Inc.com that I'll be sure to have in the show notes about that. Similarly, Microsoft CEO Satya Nadella predicts that business logic will move from SaaS applications into AI agents. And so...

It sounds like you're in that same kind of boat as Klarna and Satya Nadella around thinking that with LLMs empowering them, internal processes can be simplified, standardized, and lots of different SaaS vendors can be removed, cutting down costs. There are other views out there that I want to highlight just quickly. So for example, the CEO of Zoho,

argues that in contrast, AI will fuel new vertical SaaS companies because you kind of get new problems to tackle. So we'd love to hear, you know, kind of more about what you think of this and maybe which kinds of SaaS businesses should be most concerned. Yeah, I think this goes back to the whole build versus buy thing. It's just becoming so much easier to build stuff in-house. So

You know, nobody's going to ever go out and go, OK, let's, you know, like a CRM system like Salesforce. There's no risk that somebody is going to go, OK, I'm just going to rebuild Salesforce from scratch. Although as a user of Salesforce, I say that with some regret. So, you know, there are big, big complex things that that exist. And it's not likely that an organization is going to rebuild that sort of stuff.

But there are an awful lot of vendors out there that do things that could easily be built by an organization if they wanted to do it in-house. So I think the newer stuff, the more cutting edge stuff, every SaaS company seems to be bolting on generative AI onto their products. But I think there's a difference between like a simple bolt on where you add a prompt, a place where you can write a prompt or something like that.

That's kind of like a dime a dozen as compared to a SaaS offering that is kind of like

deeply integrated with these kinds of things that sometimes, you know, actually seem magical in some sorts of ways when they, when the agents begin doing things that, that are just sort of intuitive and just sort of works. So, so yeah, it's hard to say what, what's going to happen, but I just see it being so much easier for organizations to build their own stuff these days. And it seems to me like that's going to be a real problem for a lot of software vendors.

Thank you.

But regardless of who you are, if you're looking for a team that combines decades of commercial experience in software development and machine learning with internationally recognized expertise in all the cutting edge approaches, including Gen AI, multi-agent systems and RAG, well, now you've found us. We have rich experience across the entire project lifecycle from problem scoping and proof of concept through to high volume production deployments. If you'd like to be one of our first clients, head to why carrot.com and click partner with us.

to tell us how we can help. Again, that's Ycaret, Y-C-A-R-R-O-T dot com. It seems like part of what's so tricky for those SaaS vendors is that instead of wanting to pay a monthly fee, that's just kind of this flat fee. You instead, we've kind of gotten used to now with calling APIs and paying per number of tokens that we send or receive back from that API.

We as developers of AI products and decision makers within software businesses, or I guess businesses in general, we're getting more and more used to this idea of an economic model where it's consumption based on specifically what I need instead of subscription based. I'll give you an example that's kind of interesting. So a friend of mine, Sean Kosla, brilliant software developer,

He has built a really simple user interface that looks exactly like ChatGPT or pretty close. You know, it has the same look and feel as ChatGPT. But instead of paying a $20 a month subscription for ChatGPT+, he uses that platform.

simple interface, and he calls OpenAI models in the backend. And he's like, my cost of using OpenAI APIs went from $20 a month to $2 a month. And yeah, so it's kind of an interesting thing there. I'll let you speak now. I've been speaking for way too long. Yeah, I think I've used something like that, a product called Typing Mind, I think is the name of it. It's doing the same stuff. You put in an OpenAI key and then you might spend 20 cents a day or something like that.

Of course, I'm on the $200 a month plan for OpenAI. Me too. Me too. I dig it. It's so good. It's crazy. I mean, it's like, I can understand for listeners. I did an episode entirely on this recently. So that episode number was number 870. And so I make the case in that episode. It sounds like you agree 100%. Yeah, I know it's expensive. $200 a month sounds like a lot.

But as soon as you start using it, there's so many everyday use cases where a single report that it creates for me, I'm like, that is worth $200 to me. And I can do it 100 times a month. So it's a no brainer.

Yeah, totally. I was on there yesterday and I was looking at, I was having it review and shorten a document to do like a first pass of it. And it opened up this canvas mode, which was interesting. I couldn't, I didn't, I couldn't figure out how to actually use it, but it looked like you could edit the document in real time while it was editing it and work together somehow. I didn't have time to kind of like explore it, but yeah, they're putting out new stuff all the time. It's cool. Yeah, they are.

Um, but yeah, I think I kind of interrupted you. You were talking about typing mind. It's just another, just a product that, uh, I, I think that's the name of it. I could be wrong, but, but yeah, I'll try to find it and include it in the show notes. Um, but, uh, it is kind of interesting there. So for me as well, like Sean Gosling, who I was just describing, um,

He was like, why don't you just use this? I'll just give you another login. He's like, it costs me nothing for you to be doing here. But I was like, eh, but can it do deep research like this? And he's like, no, can't do that. Yeah.

So, yeah, so I think that's definitely worth it. It is highly interactive. Like you said, I haven't seen that canvas yet. But, you know, the level of interactivity, the way that it asks me before it goes off and does a long research request for clarification on some of the key points, even just those questions that it asks, I'm like, those are incisive and exactly spot on the questions you should be asking as a well-educated analyst.

or a PhD student or a PhD graduate being asked to take on this task. So in addition to the broad topics that we've already covered already, another big topic related to some of the tools that we've just been talking about, like LLMs, which you both incorporated into Zerve as well as we use in our professional and personal lives through things like AchatGPT, ProSubscription, and DeepResearch,

Billions and billions of dollars have been spent on developing these LLMs in aggregate across the planet, LLM-based services and projects. But there are still some shortcomings today around bias and inaccuracy. And the LLMs, you know, I start to notice it so less frequently that...

that I start to just trust the AI systems, which is maybe risky. So when they, when they were frequently, you know, inaccurate or having biases, I was like, you know, you, you were constantly had your, um, you were constantly on the lookout for those kinds of issues. But, uh,

But now that I rarely see those, especially something like deep research or an O1 or an O3 mini or a deep seek R1 kind of model that can iterate over its responses and fact check. It just seems like they're basically always right.

That's maybe a different issue. Well, I mean, it's funny, though. I was, I don't know, maybe six months ago, I was trying to write some code that would do, I wanted to use the API for the OpenAI API to submit an image.

and have it edit the image, which is not something that they could do then. In fact, I don't know that they do that now even. I don't do much image stuff, but I asked ChatGPT for some code to do that, and it invented an API out of nothing that didn't exist.

And it was like, oh, here's the code to do that. And I have no doubt that when they do introduce that feature, that the code would run. But no, it was just made up. It just completely hallucinated because they want to be so helpful. But the one thing I found about these models is that if they get it right or at least close to right the first time, you're probably OK. But like, you know, there are some problems that I've given them and it's like.

all right, it didn't get that right at all. And then I'm like, okay, there's no hope. Like you take another route or something. Cause they don't, if they don't get it, then, then they're not in my experience anyway, easy to kind of correct. Yeah. That makes perfect sense. Speaking of your inventing an API to call similarly, I had the experience actually with deep research recently where I was trying to use the O one model. And at least at the time that I was doing this a week or two ago,

It turns out O1 doesn't have internet access yet. Or was it O3? Oh, it was O1 Pro. It was O1 Pro specifically that I was trying to use. And so I was like, I want to use this as a big problem. I want to get the chunkiest model that OpenAI can give me running in the back end in my deep research.

And it turns out that they, at least at the time that I ran this, that didn't yet have internet access. So if I use something like O3 Mini High, I could get internet access, but not with O1 Pro. And so I didn't know that though. And so I provided a link. I said, go to this link and summarize the information or something like that.

And luckily, I at least have that trace. Like when you do deep research, it's kind of an explanation of what it's thinking about. And so in what it's thinking about, it was like, because I don't have Internet access, I'm just going to assume what kind of thing was there. And I'm like, no, that's like. So at least I could see that it had that trace, because otherwise with the output itself, you're kind of like, oh, cool.

That makes sense. And so if I hadn't gone and looked at the website or looked at the trace, I would have been led completely astray. So yeah, so interesting, these kinds of issues that we do still have. Yeah, I don't know. Do you have any insights yourself on how we could potentially be navigating this thorny situation? So as we increasingly automate judgment and decision-making, how should we balance these kinds of biases that happen with both humans and AI evaluation processes to create more fair, transparent systems?

Yeah, I've done a lot of the iterative stuff. Like one kind of fun project that I was interested in is I wanted to figure out if the models were biased towards their own answers, like if they thought that their own answers were better than the answers coming out of other large language models.

And so I built a project where I asked for like 100 SAT essay questions, like give me a list of essay questions. And then I passed those essay questions to the large language models and asked them to answer them.

So hundreds of answers coming from, I think I used four different large language models. And then I asked those same four large language models to grade the sets of four, like rank them, whose answer was the best. And so hundreds more requests there.

And then I sort of plotted it and looked to see, you know, does the GPT like Titan or did, you know, whatever, right? Like evaluating them. It was really interesting and wildly unstable. Like I could rerun that analysis and it would, it would, it was, it would dramatically change. But there didn't seem to be a bias there in terms of like chat GPT likes its own answers, that sort of thing. But in terms of like,

I found it really effective to use LLMs to evaluate the answers from LLMs. So that whole iterative thing that they're now sort of doing under the hood, kind of explicitly doing that. I heard a talk at a conference last week, the AWS conference in Ireland. They were talking about

distilled models where you have a teacher model and a student model. And you're kind of like, sort of like a reinforcement learning kind of thing that they, that they get going on there. I don't know the specifics of it. I haven't looked into it, but that was another thing that was interesting. So kind of playing the models off one another, a good cop, bad cop kind of situation, or like maybe you have a cheap model and an expensive model.

and you sort of use cheap models to evaluate your answers and you only call the expensive models when the answers look a little questionable, that sort of thing. So yeah, so it's LLM committees, I guess, is the way to go. Nicely answered. As I asked that, I was like, this is maybe a really out there question that Greg might not have any ideas on. And you nailed that. That was a really great answer. I think that's absolutely right. An LLM committee

And depending on your particular use case, there's clever ways of potentially doing it. Like you said there of having a cheap LLM for most use cases, but then another cheap LLM that's evaluating that and bringing in a more expensive LLM to handle those trickier situations. - Yeah, I think there's a lot of research that needs to be done to figure out what the most effective committee structures are. But I think thinking about it like a committee is actually a really interesting kind of metaphor for figuring it out. - Ask your three favorite LLMs how to tackle the problem.

So this is an exciting time, obviously, to be a data scientist now that we have all these kinds of tools, tools like Zerv that allow us to deploy models, to parallelize models easier than ever before, to collaborate on data science workflows much more easily than ever before.

Without going into, you know, your proprietary plans, I'm not asking for you to go into that. I just, you know, kind of generally speaking, you must spend time, Greg, thinking about where the future is going with all this. Like, where do you see in five years, 10 years, 20 years, what does a data scientist or a software developer's workflow look like?

Yeah, it's so crazy. I'm not looking five years ahead. I'm looking one year ahead at this point. It's just moving so fast. No five-year prediction is going to be reliable the way things are going. We're trying to stay flexible because these models are just getting better and better. I do think large language models are kind of the, they're going to be the kind of core of how things move forward in the engineering space, in the data science space.

And so I think the most important priority for us right now is to build the best AI-assisted coding experience that you can so that it becomes very streamlined and very easy for a user to just type what they want and then they get it.

And so, you know, that's the, that's kind of the path that we're on. We're very focused on the, the data space. So, you know, we're like, we're not looking at, you know, JavaScript or, you know, any of the other, because there's lots of programming languages out there and you could kind of do the same thing there. But we're, we're very focused on the, on the data space because I think that's where, well, I mean, that's, that's the audience that we're working with, code first data users. Yeah.

But yeah, we want to build the best AI powered coding environment that you possibly can. Nice. That sounds great. Very exciting future ahead. No question about that. And fast moving. It makes sense that you would not try to make a speculation about a five year or a 10 year plan. Assuming we're not all on universal basic income and just painting pictures in the park. It's the 10 year prediction. Playing fetch with our dogs. Yeah.

Yeah, maybe the five-year prediction. So very cool. I guess we might check in with you again soon. It's always so much fun having you on the air. I had a lot of laughs today. I really enjoyed this. So maybe we'll be checking in with you again in the future, in the near future. And we'll see how your predictions have played out. In the meantime, we do need another book recommendation from you, if you may. Ah, fiction or nonfiction? Nonfiction.

Uh, whatever, whatever comes to mind first. I just finished a book called Mickey seven. Uh, and it's funny because I saw it on like my recommended list and it must've been there because it's about to come out as a movie, but Robert, Robert Pattinson is supposed to play the main character in this book and it hasn't come out yet, but I really enjoyed it. It's about a guy. They build the technology to imprint your consciousness and like save it.

And so on a dangerous mission, they have what are called expendables, which are people that you they're basically immortal, but they're not immortal because you just reprint their bodies every time they die.

Uh, so as it's sci-fi, it was an entertaining, entertaining read a little lighthearted, I guess. Nice. It sounds interesting. All right. And then for our interested listeners who enjoyed your good humor and your incisive insights on the field of data science, how should they follow you after this episode? Uh, well, I'm on LinkedIn. Um, I haven't been doing much TikTok stuff lately. Not since, uh, uh, my serial, I, I sold my serial business. Um, but yeah, I'm on LinkedIn.

You can always see me there. Try to post interesting projects, interesting stuff from time to time. And of course, Zerve. Yeah, yeah, yeah. And so we'll have, there's probably other Greg Michelsons out there, so we will disambiguate by having a link specifically to the Greg Michelson at Zerve in the show notes on LinkedIn. Although you can also just type that. That seems to work pretty well in LinkedIn for disambiguating. Greg Michelson Zerve, I'm sure will work.

Awesome, Greg. Thank you so much for being on the show. And yeah, as I said, hopefully we'll be checking in with you again soon. Really enjoyed this today. Always a pleasure. Thanks for having me.

What a mensch Dr. Greg Michelson is. In today's episode, he covered how Zerv organizes code in a directed acyclic graph, DAG, where nodes contain code, code nodes, and edges show data flow, enabling real-time collaboration across Python, SQL, and R. He talked about their new fleet feature and how it parallelizes code execution using serverless technology, dramatically reducing processing time for tasks like LLM calls without requiring additional code.

And he talked about how Zerv's AI assistant can now act as an agent that creates entire project workflows based on natural language instructions. Greg predicts that with LLMs making custom development increasingly accessible, many SaaS businesses may face an existential threat as companies find it cheaper to build rather than buy solutions. And when working with LLMs, Greg recommends using committee approaches

where multiple models evaluate each other's outputs to reduce bias and improve accuracy. As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Greg's social media profiles, as well as my own at superdatascience.com slash 879.

And next month, if you want to meet with me in person instead of just on social media, I would love to meet you in real life at the Open Data Science Conference ODSC East, which

which is running from May 13th to 15th in Boston. I'll be hosting the keynote sessions and along with my longtime friend and colleague, the extraordinary Ed Donner. You've got to see him speak if you haven't already. Seriously, we'll be delivering a four-hour hands-on training in Python to demonstrate how you can design, train, and deploy cutting-edge multi-agent AI systems for real-life applications.

All right. Thanks to everyone on the Super Data Science podcast team, our podcast manager, Sonia Breivich, our media editor, Mario Pombo, our partnerships manager, Natalie Zheisky, our researcher, Serge Massis, our writer, Dr. Zahra Karcheh, and our founder, Kirill Eremenko.

Thanks to all of them for producing another entertaining and educational episode for us today. For enabling that super team to create this free podcast for you, we are deeply grateful to our sponsors. You can support this show, listener, by checking out our sponsors' links. Give them a click. They are in the show notes. And if you'd ever like to sponsor an episode yourself...

You can go to johnkrone.com slash podcast to get some information on how to do that. Otherwise, support us by sharing the episode with someone who would love to listen to it or view it. Review the episode on your favorite podcasting platform or YouTube. Subscribe if you're not a subscriber. Edit videos into shorts.

whatever you want. But most importantly, I just hope you'll keep on tuning in. I'm so grateful to have you listening and I hope I can continue to make episodes you love for years and years to come. Until next time, keep on rocking it out there and I'm looking forward to enjoying another round of the Super Data Science Podcast with you very soon.