We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

859: BAML: The Programming Language for AI, with Vaibhav Gupta

2025/2/4

Super Data Science: ML & AI Podcast with Jon Krohn

AI Deep Dive Transcript

People

Jon Krohn

Vaibhav Gupta

Topics

Vaibhav Gupta: 我是Boundary的创始人兼首席执行官,我们开发了一种名为BAML的编程语言,旨在简化与大型语言模型(LLM)的交互,并显著提高开发效率。BAML的设计理念借鉴了网页开发的早期经验,它提供了一种更具表达能力的跨语言兼容方案,解决了现有AI SDK受限于编程语言本身的问题。BAML通过算法工作,重构了用户处理提示工程的方式,就像TypeScript提升了JavaScript一样。在开发BAML之前,我们经历了13次转型,最终选择专注于开发AI编程语言。 BAML的优势在于其清晰的语法,即使非技术人员也能轻松使用。此外,BAML的热重载循环极大地缩短了测试时间,使开发者能够在更短的时间内尝试更多提示。我们还解决了RAG(检索增强生成)中的痛点,例如如何有效地将上下文添加到提示中,以及如何处理提示中的歧义和错误。BAML通过强类型和语义化分块来处理这些问题,提高了AI应用的可靠性。 BAML通过模式对齐解析技术提高了令牌效率,并与不支持函数调用的模型兼容,例如DeepSeq R1和OpenAI O1。我们已经看到客户轻松节省了20%到30%的令牌。BAML的模式对齐解析技术可以处理模型输出,而无需显式JSON格式,这使得它与各种现有模型兼容。未来,我们将继续改进BAML,使其支持更强大的功能,例如条件语句和循环语句,并改进数据平台以简化数据管道管理。我们还将提供更多工具来辅助用户使用BAML,例如在线playground和多种语言的安装包,以及代码生成工具。 Jon Krohn: 通过与Vaibhav Gupta的对话,我深入了解了BAML编程语言及其在提高AI效率和降低成本方面的优势。BAML通过提供热重载循环、令牌效率改进和与不支持函数调用的模型的兼容性等功能,显著简化了与大型语言模型的交互。此外,BAML的内置类型安全性和错误处理机制,提高了AI应用的可靠性。Boundary公司独特的招聘流程也给我留下了深刻印象,他们不进行技术面试,而是通过候选人自荐、深度参考面试和试用期来评估候选人。

Deep Dive

Shownotes Transcript

Translations:

中文

This is episode number 859 with Vaibhav Gupta, founder and CEO of Boundary. Today's episode is brought to you by ODSC, the Open Data Science Conference.

Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week, we bring you fun and inspiring people and ideas exploring the cutting edge of machine learning, AI, and related technologies that are transforming our world for the better. I'm your host, John Krohn. Thanks for joining me today. And now, let's make the complex simple.

Welcome back to the Super Data Science Podcast. Today we've got a highly technical episode, I know a lot of you love that, with the brilliant, well-spoken engineer and entrepreneur Vaibhav Gupta.

Vaibhav is founder and CEO of Boundary, a Y Combinator-backed Seattle-based startup that has developed a new programming language called BAML, B-A-M-L, that makes working with LLMs easier and more efficient for developers. Across his decade of experience as a software engineer, Vaibhav has built predictive pipelines and real-time computer vision solutions at the likes of Google, Microsoft, and the renowned hedge fund D.E. Shaw. He holds a degree in computer science and electrical engineering from the University of Texas at Austin.

As mentioned at the episode's outset, this is a relatively technical episode. The majority of it will appeal to folks who interact with LLMs or other model APIs hands-on with code.

In today's information-dense episode, Vaipav details how his company pivoted 13 times before settling upon developing a programming language for AI, why creating a programming language was quote-unquote, his words, really dumb, but why it's turning out to be brilliant, including by BAML already saving companies 20-30% on their AI costs.

He talks about fascinating parallels between today's AI tools in the early days of web development and his unconventional hiring process. I've never heard of anything remotely close to it and the psychology behind why that unconventional hiring process works. All right, ready for this awesome episode? Let's go. Vibe Hub, welcome to the Super Data Science Podcast. I'm so excited to have you here. Where are you calling in from today?

I'm actually an SF, but normally I would be in Seattle. You know, you say that, but in fact, I have only ever seen you in San Francisco. That is true. And in fact, many people have only ever seen me in San Francisco, mostly on Tuesdays while I'm here. My sample size is small, though. It's an N equals one sample size of where your location is. I've only met you in person one time. That was in December at an event run by the Gen AI Collective.

It was a really cool event. And you were already, you'd already surpassed quite a big hurdle to be invited to be one of the 10 startups presenting at the Gen AI Collective. Hundreds of startups applied. And then from that, you won an award as well, didn't you? You were one of the companies that won an award.

best technology award, which was kind of surprising, but also very felt awesome to see the community react well to what we had been building. Yeah. And then I would say for my own personal award to you, I don't know if I mentioned this in person and I was about to embarrass you with this right before we started recording, but I was like, let's do it on air, which is that your presentation was easily the funniest.

And some of the things that I remember, so you had things like, I don't know if these are like zingers that you've like learned to bake in over time. You get like audience reactions and you're like, that one I've got to remember for future pitch stand up opportunities like this one. But things like you started off by saying, we did something really dumb or really stupid, which was created a programming language. Definitely something you shouldn't do.

Yeah, maybe it's funny, but it's also kind of true. You probably should never make a programming language. And you're kind of setting me up a failure because now I have to meet the expectation of trying to be funny on this podcast. And now we'll see what the people expect while they're listening in. Yeah, you've got it naturally. I spent some time with you afterward and I know you'll nail it. Although sometimes people probably just have like bad days. You could be having the worst day. Yeah.

So you're the CEO and co-founder of Boundary, which is the creator of BAML, B-A-M-L, a programming language. And it's an expressive language for text generation specifically. So our listeners, we probably have a ton of listeners out there who are

calling LLMs, fine-tuning them for various purposes, and BAML is designed for them. So tell us about BAML, what the acronym means, why you decided to do this stupid thing. Yeah, so let's start with the acronym first. BAML stands for Basic Ass Machine Learning, but if you tell your boss, you can say basically a made-up language. So...

But the premise of BAML really came in from this theory around how web development started out. So when we all started coding, at least for me, when I started coding websites, it was all a bunch of PHP and HTML kind of hacked together to make website work. And then I remember interning at Meta and they were the ones that made React. I think part of the reason why they made React was because their code base was starting to get atrocious to maintain.

Imagine having a bunch of strings concatenating your HTML syntax, and now an intern comes in, like myself, forgets a closing div, and now your newsfeed is busted. It's not really the way we want to write code, where multi-billion dollar corporations rely on an intern closing strings correctly. And it's not really even the intern's fault, because how could they really read a giant blob? I barely read essays. How could the intern do that? But a compiler like React...

could actually correct for those mistakes. If you add HTML and JavaScript into the same syntax by creating a new syntax, those ideas become much more easily expressed. And now in two milliseconds, you get a red squiggly line saying unit closes div tag. And in that web development loop, it just reframed the way we all started thinking about web development. Instead of being like things are going to be broken, we could do state management because React handled it for us.

We could do things like hot reloading a single component and having the state around it persist because React did that for us. It was tastefully done, even though it required learning something new. And we asked, in this AI world that we're all headed towards, we think a few things are going to be true. One, every code base will have more prompts in every subsequent year than it did have in the previous year.

And if that is true, we probably don't want all these unclosing div type of mistakes existing around forever. And when you say prompt, you mean like an LLM prompt? Yeah, like an LLM, yeah, calling an LLM of some kind. And LLMs I think are one start, but I think all models in general are going to kind of be used long-term. Models are only going to become more easy to use for people that know nothing about machine learning in the future.

So, yeah, so we've done episodes recently, for example, people can listen to episode 853, where we talked about this generalization of LLMs to foundation models more broadly, maybe a vision model, for example, where you don't necessarily need to have a language input or output. But even with that kind of model, even kind of in a vision use case,

It could be helpful. It could make things easier for people calling on that vision model if instead of having to write code, they can use a natural language prompt. And so I 100% agree with you. More and more often, the models that we're calling, whether they're big foundation models or

including specifically LLMs or the smaller models, having natural language prompts in there to just very easily kind of get what you're looking for, maybe even just out of a plot. Yeah, exactly. And I think the thing that we have to think about as this stuff becomes more and more prevalent is actually the developer tooling that has to come with it. Just like how React had to exist for Next.js, Tide Trip, and all these other things to come out and make our lives a lot better.

in the web development world. We ask what has to exist in the world of LLMs and generally AI models as a developer, not as the people perhaps producing the models because that's a different world, but just the people consuming the models. And no matter how good the models get, at some point you have to write bits on a machine that flip and that's code. And it has to plug into your code base in a way that makes sense.

And just like JavaScript sucks, TypeScript is way better because type safety and static analysis errors that we get. We wanted to do a bunch of algorithmic work that reframes the problem for users when we made BAML. Nice. I learned from you just before we started recording, this was probably something that I intended to have later in the episode, but I feel like it's kind of interesting now as you're talking about why you created BAML, is that

You were a Y Combinator company. And when you were accepted to Y Combinator, you were expecting to be a competitor to Slack. That's what was on your application deck. In reality, you had 13 pivots before landing upon BAML. Do you want to tell us about that adventure? I mean, you could even kind of tell us about why you did Y Combinator at all. Yeah. And yeah, the initial idea and all those pivots.

My co-founder and I had met over eight or nine years ago while we were just graduating from college in 2015. Both graduated at UT Austin. And we just became friends because we're both like, we're both gluttons for misery. And like eight or nine years later, he worked for a while at Amazon. I worked for a while at like some other companies, Microsoft, Google, and a few hedge funds. And I was like, that was really fun. But I kind of want to do something dumb.

And the dumb thing was I wanted to go and innovate and create something from scratch that no one had ever seen before. Because we're both builders and building things just gives the level of satisfaction. So when we started out, I remember playing out in the ed tech space for some time. We tried out in the creator economy.

And we all thought we had this edge because I did stuff in that tech space before that. My co-founder used to run a YouTube channel, funnily enough, to hundreds of thousands, I think millions of views for a while. So we kind of played on our edge. And then eventually we're like, oh, we both hate remote work. Let's go solve this Slack thing because Slack sucks. I don't know. Slack is way better than email for sure.

but it's so distant and everyone on Slack feels like a colleague. And what I love about work is having friends. Same reason me and Aaron, my co-founder, are friends. So we tried to solve that problem and we applied to Y Combinator with it. I've actually applied like four times to Y Combinator over the last 10, 12 years. And this was the first time where not only... So we've gone to interview a few times. I've gone to interview like three of those four times. When I got it this last time,

That's the first time we got in. But I remember very distinctly the partners in the interview being like, "This Slack thing is not going to work. Do you guys have other ideas?" It's fortunate that we had both worked on hard technical problems beforehand so we could talk about them. But I also remember distinctly saying, "We think it's going to work. We probably shouldn't change our idea." But the key insight to us was just that Aaron spent nine years in distributed systems,

I spent like nine or 10 years building algorithms and assembly. We probably should be building UIs, like just a small hunch. And a while later during the Y Combinator batch, we actually decided to move away from Slack and try and go do something in the machine learning space. And the reason was during the batch, the batch we joined was Winner 23. That was a batch when ChatGPT came out.

It was wild in SF that month, I guess three months when we were there. Everything was moving so fast. There was so much information. But one thing was constant. We had, just due to our backgrounds and us being slightly more seasoned engineers than some folks, I mean, not saying that they're bad, just we had worked in industry a little bit more than a lot of other folks. We were naturally just answering a lot of questions to a lot of our batchmates.

And we figured out most of the questions were machine learning and we just help them. And it was fun. We do it on top of our startup, which is not related to AI at all. So then it kind of just made sense for us to move into that direction. And when we realized the real direction we wanted to play in, it wasn't fine tuning. It wasn't even like any of like

the SDK layer, because a fundamental problem was no matter if you build a Python SDK, whether you build a TypeScript SDK, like there's Vercel AI SDK, there's Pydantic AI and a whole bunch of other systems. The problem is naturally expression. The expression that you can make in those SDKs is restricted by the language.

And lastly, I think every person in the world, every software in the world is going to use AI in some form or another. Whether you're a C++ shop, whether you're a Java shop, whether you're a Go shop, all of them will use something. And anytime we need something that has cross-language compatibility, it's always been a new syntax. JSON, YAML, TOML, they're all new syntaxes that are globally supported.

And I think that's a really underlooked point in a lot of systems that I've seen. AI is transforming how we do business. However, we need AI solutions that are not only ambitious, but practical and adaptable too. That's where Domo's AI and data products platform comes in. With Domo, you and your team can channel AI and data into innovative uses that deliver measurable impact.

While many companies focus on narrow applications or single-model solutions, Domo's all-in-one platform is more robust with trustworthy AI results, secure AI agents that connect, prepare, and automate your workflows, helping you and your team gain insights, receive alerts, and act with ease through guided apps tailored to your role. And the platform provides flexibility to choose which AI models to use.

Domo goes beyond productivity. It transforms your processes, helps you make smarter, faster decisions, and drive real growth. The world's best companies rely on Domo to make smarter decisions. See how you can unlock your data's full potential with Domo. To learn more, head to ai.domo.com. That's ai.domo.com. Nice. That was really great background on the problem that BAML solves.

One of the key things in there is obviously this prompt, this solving some of the interfacing problems around prompts. And we'll get to a number of details related to that. But first, I wanted to kind of address this

of prompt engineering more generally and just get some of your thoughts. It's kind of, our researcher here wrote that it's a joke discipline, which is, I mean, it is in the sense that it's definitely not engineering. There's no future where there's people doing mechanical engineering and electronics engineering and then you've got a four-year major in prompt engineering. Yeah.

So, you know, it is a kind of engineering is being applied to this idea of prompt engineering. When you use that strong of a word, it is kind of a joke. And so, yeah, how can this prompt engineering, which is part craft, part alchemy, it's getting better all the time with LLMs.

kind of anticipating what humans are providing, but you probably have, you've been studying prompt inputs and outputs in a granular level for a long time. How can creating high quality prompts become a rigorous practice with the boring reliability we come to expect from other software tasks? - Yeah, I think the key part that you mentioned is reliability, 'cause that's what we all want. The prompt is just a way to transform some data that we provide to some other data that we really want.

In the case of the classification problem, you take a transcript and you categorize it into, is this a conversation between a doctor and a patient? Is this a conversation about healthcare, general well-being, or some other category? When I think about the rigor and the exercise there, it's funny that you say that you feel that this isn't rigorous enough or not complex enough. Because I used to say the same. I remember when this stuff first happened and Aaron was like, AI engineers becoming a term.

And I was so vehemently against it. I was offended that that term was being used to refer to someone prop engineering because the term that I wanted to use was like AI engineers are people that study networks and actually have a PhD and can describe and have a detailed conversation around this. And I felt like the old grandfather on the lawn saying, get off the lawn. It was like attacking me at my core. But one thing I've actually come to appreciate

over time is I actually think there's a little bit more nuance that goes into using a model that people give credit for. And for example, we think of prompt engineering as just this idea of putting strings into a model. But we used to think about web development as just this idea of putting plain text out there. And it evolved into reactive components, which have all sorts of complexity over time.

And I think if it's the same way I used to look at people at my websites before, I was like, ah, that's not hard. That's so easy. Like it's not a real software engineering. Front-end engineering isn't real. But now when we look at front-end engineering, it's complex. It's an art. It takes tremendous amount of skill to build a beautiful website. And when I think about prompt engineering, I think we're still in the early days of that plain text HTML, no CSS, no interactivity kind of world. But I think we're headed there.

towards the level of where eventually there will be a term called prompt engineer. I do think it'll be a well studied discipline over time. Interesting. Yeah. Okay. So hot take, I guess. Yeah. I mean, that's, that is interesting. Um,

What are the kinds of things, you know, we've talked about the kinds of problems that BAML is solving. Can you give us maybe some key use cases or provide us with some color? And we actually, we were talking about how, you know, you've been on other podcasts where you've done screen sharing to make it visual. I think we're going to try to, if we can here, stick mostly with audio descriptions. Yeah.

What are the kinds of things that BAML is doing that I wouldn't be getting if I just go and I use an OpenAI API and I provide it with a natural language prompt? What am I missing out on that I would be getting if I was using BAML instead? I think there's two big things that people usually, at least people have told us that they love about BAML.

The first one is the syntax is so clear that even their PMs can use it, which is high value for... Zing. Which is what I find to be high value for folks. The second part is really the hot reload loop.

And again, I'll allude back to web development because I think it's a really good analogy for how machine learning works. When I do web development, I change my code, I look at the browser. If it doesn't match what I want, I go back, change my code, hit Command S, and the thing is refreshed. And in about 10 minutes, I can try 15 different styles and make it work with Tailwind and React and whatever else I want to go do. Today, when I see people trying out AI pipelines without using BAML,

I see them try maybe five prompts in 20 minutes because their testing loop is totally busted. They have no hot reload loop. With BAML, in 20 minutes, they can do 240 prompts because it takes five seconds to test each one. And that hot reload loop is...

Like you're just not going to be bored because you're not sitting around twiddling at that XKCD of like my code is compiling is so true in the world of prompt engineering. You're literally just waiting for the model to run. You're like, Oh, I have to run these five commands to run this, run my test case. Or I have to run my whole pipeline end to end to experience this one prompt that's in the, in like the deep end of it. When really you could just write a quick little unit test, but unit tests aren't really fun to write in Python or TypeScript or any of these languages because they're not designed for it.

Rust has this really cool snippet where you can write a test in any file. In any Rust file, you can write a test. And when you write a test in any Rust file, in VS Code, they have this little plugin that says run test. You just click it and it runs the test for you. It's so fast, you just write more code. Rust code doesn't just magically work. It's just more testable, so it more magically works because people write better code.

We do the same thing for prompting. Nice. I like that soundbite. That's great. Maybe we can turn that into a YouTube short.

Nice. So thanks for the kind of general overview of why BAML would be helpful. Something that is super trendy right now, other than agentic AI, another big trend right now is RAG, retrieval augmented generation, where people have some large number of documents that they want to be able to search over and get natural language responses over. So let's say...

You've got a million documents in your company. They're all like insurance claims or something. And you can use an encoding to

large language model to take each of those million documents and convert them into this vector representation, which allows you then to search very rapidly for related documents. So now you're an insurance claims person. I don't know what people, I don't know what titles are in insurance companies. I don't know why I went into insurance. I don't know anything about it. But you're an insurance company employee and

And you want to like, you have this new claim come in and you kind of want to look at similar claims to that one. So you can use retrieval augmented generation to say, you know, here's a question I have, or here's a claim that I have. How does this relate to previous claims that we've dealt with in the past? And that new claim that you have, or that new natural language inquiry that the insurance company employee has is,

is in real time, very rapidly, you know, in hundreds of milliseconds converted into a vector representation as well. And then you can do really simple, fast math. And there's tricks to have this work over very large spaces, even if you have billions or trillions of documents. And

and you bring back the most relevant documents. So let's say in that insurance company example, we had a million. There's maybe six documents that are determined to be closely related in semantic meaning and the natural language is similar to the user's query. Those six documents come back and then we can use all of those six documents as context to a large language model that can come up with a great answer based on that. So retrieval augmented generation

linking together a bunch of different technologies and coding LLMs, generative large language models, um, in order to be able to give potentially great insights over vast amounts of documents that a human could never, you know, look over, uh,

manually and that doing keyword searches would miss a lot of information. So RAG, very cool, very powerful. So what are some kinds of insights or pain points in RAG, in retrieval augmented generation? It sounds like from some research we did on past interviews you did, that RAG itself was part of what led you to pivot towards developing BAML. Yeah, so we actually started off heavily indexing on the RAG pipelines.

as kind of the original journey that we started as 1/13 PIVOTS. And the reason that we really moved away into the more general world of just using LLMs is the thing that makes RAG really good is really high quality data. And one element of really high quality data that you touched on was this ability to pull the relevant documents.

And that's just a thing that someone just has to do. You can't really help them with it. It's very data dependent, which is kind of why we moved away from it because we don't believe that there's a general rag solution for everything. Just like there's no general web component for the perfect accordion everywhere. You have to use Shad CN and build your own accordion that matches your theme and your styles. One element of rag that not a lot of people think about is really around this idea of how you actually put the context into the prompt itself.

So imagine if I was saying like and um between every other sentence. You guys would immediately tune out. This would not be a fun podcast and conversation to listen to. But whenever you add a bunch of JSON blobs, for example, as context into your prompt, you're doing the same thing. You're putting a lot of things that the model doesn't care about. You're putting a bunch of quotation marks. You're putting a bunch of escape characters, a bunch of colons.

And that doesn't actually make the prompt easier for the model to read. It just makes it possible for you to run JSON.parse on it. And those are likes and ums that the model has to go remove. If you put an image into the prompt, the way that you orient the image, the sizing that you use, even like the text you put around the image to help it understand what the context of the image is, whether you put it above or below the image, matters.

And these are things we found not a lot of people pay attention to, but with BAML we kind of made it obvious what you're doing and where you're doing it, and we made it possible to detect these lexanoms. Because right in VS Code, like you do in Markdown files, you can see what the Markdown file renders as. We actually show you what the prompt renders as with the images, with the audio, as you're typing.

So you can find these likes and ums and actually be like, oh, that looks ugly to me. That probably means it's ugly to the model, which probably means it's hard for the model to understand because these models are trained on human data.

All right, so in addition to RAG, another kind of common thing that happens, in addition to likes and ums, those kinds of verbally meaningless things that can end up being in a prompt, in addition to those which are relatively innocuous, they're like a benign tumor, there are also malignant tumors in amputees.

ambiguity and errors. So how does BAML handle instances where input data might lead to ambiguous or incorrect outputs? How does this capability improve reliability in AI-driven applications? Yeah, I think that's where I think the syntax really shines a lot. Because the problem with English is that it's a really, really poor language for actually describing what you want. It's amazing for rapid, fluid conversation like we're having right now. It's horrible for written instructions.

If you've ever given someone like a 20 bullet point list of instructions of things you want them to follow, they will mess up on one of them. And they do it for a couple of reasons. One, because you probably have some contradictions in there just naturally by having written them out. And you're relying on the user's inference to know which ones are relevant to your context. And the more informed they are with you, the better they'll do at it. But what we do is we said every prompt is really just a function.

A function takes in parameters, and every parameter is type-safe. So it's not just a message, it's a message that is a class that has a role and a content. It's not an invoice, it's actually an invoice and all the parameters that exist in it. So it's strongly typed. And every function has to describe what it's going to return. So if you're going to return a bunch of categories, you're returning enum, which has a specific set of categories that are described.

And instead of injecting your prompt as a giant string, you actually break down your prompt into semantically meaningful chunks. So in the case of like, let's say that insurance example you're talking about earlier, you might want to process your insurance company might be processing millions of different types of documents. One of those documents might be a new claim. One of those might be an update to an existing claim. And one of those might be just a regular like new customer inbound.

Instead of describing all those rules in the core prompt, which is in English, you would actually attach a description next to each one of those categories in the enum. So now your code becomes more readable. And to understand what it means to be a new customer inbound, you only have to read that one section of the prompt. And that natively makes you write shorter prompts and almost makes you write your prompts like code, but with the flexibility of English. And I think that gives a balance of both worlds, but we want these prompts to be able to do anything.

But we also don't want a two-page essay that no one ends up reading actually over time and we just keep adding to it and eventually realize we have a list of contradictions and no wonder the model's been behaving poorly the whole time for the last six months.

Do you ever feel isolated, surrounded by people who don't share your enthusiasm for data science and technology? Do you wish to connect with more like-minded individuals? Well, look no further. Super Data Science Community is the perfect place to connect, interact, and exchange ideas with over 600 professionals in data science, machine learning, and AI. In addition to networking, you can get direct support for your career through the mentoring program where experienced members help beginners navigate.

Whether you're looking to learn, collaborate, or advance your career, our community is here to help you succeed. Join Kirill, Adelaine, and myself and hundreds of other members who connect daily. Start your free 14-day trial today at superdatascience.com and become a part of the community.

Nice. Yeah. So you've provided lots of examples here of ways that BAML makes calling an LLM or some other kind of model better with a prompt, getting better results back. So we've talked about things like being able to do prompt testing rapidly, being able to iterate quickly over those, being able to handle reg use cases, being able to handle ambiguity errors. One last thing

that I want to get into in terms of an advantage. And there might be others that come up organically, but something that you talked about at the Gen AI Collective pitch day that I was at in San Francisco in December,

Yeah.

Yeah, and this was really inspired by just having spent like 10 years in performance optimizations and like hand rolling assembly for a while. And what I really learned in that journey was I was a pretty damn good performance engineer. But the compiler beat me every time, not because it wrote better code than me, but because just on a time per dollar value, the compiler could in the same amount of time optimize way more code than what I could do.

So it made sense for me to optimize some parts of the code, but not all the code. And I think with prompting and token efficiency, we have a similar take. You should probably hyper-optimize one or two of the prompts that are super, super critical to you. But for 90% of the prompts, you just want something to do a really damn good job at it. And what we thought about with performance optimization is the idea of like, one, everyone is using structured outputs. And structured outputs is this idea where, or function calling, is this idea where an LLM

It's given a bunch of tools. Let's say I give it access to the weather API and it's described the weather API takes in either a zip code or a city and a state. And then the LLM also has access to a restaurant booker where it has to take in the name of the restaurant and its address of some kind. And then lastly, it gets a restaurant finder or something where I give it again a city and a state. And I ask the model, what's the weather today?

It should pick out the restaurant tool and fill out the parameters based on whatever context provided. And the idea of that is the best way to send that data between the model and your software today is JSON. JSON, as we talked about earlier, has a bunch of likes and ups. And it doesn't make sense that we're going to enforce the model to follow the standard that we've built that was amazing for web development.

with all these quotation marks, this strictness in its definition of you have to have that quotation mark there. You have to only have single line strings. You can't even put comments inside of a JSON file. And to us, what we said was, what if there was a different format? So what we did is we spent about eight months writing a new algorithm called schema aligned parsing, which is actually able to take the model's response

and automatically infer it to be the data model that you provided it against. If it made some mistakes, like it forgot the quotation marks on your data model, it forgot a comma at the end of the line, it gave you a string when you expected a number, all sorts of mistakes that models will make because they're probabilistic in nature. We algorithmically correct for that in under five milliseconds. And again, not saying that it's perfect,

But it does the same thing that C compiler does, which is it just does it more often, more correctly than I do. Nice. And then so by having token efficiency, even if it's, I don't know, like, so on average, what do you think it's like? It saves you maybe like 10% in terms of the number of tokens? We've seen customers save like 20 to 30% tokens easily on outputs. Oh, really? Yeah. So it's a lot faster. Oh, wow.

But I think one thing that's overlooked is not just that it's faster and cheaper, but the fact that it works with every existing model. So like DeepSeq R1 came out recently, and they released that model without function calling. And same with OpenAI's O1 models, they released that without function calling. We have users of BAML using function calling with those models today because schema line parsing requires no modification to the model to be able to make that work. Oh, that's super cool.

That's amazing. So in addition to that kind of use case there that you just gave, that was a really interesting one with DeepSeek and being able to do tool calling or O1 tool calling. What are some other kind of unexpected or innovative uses of BAML that you've seen from your users? We've seen, I think, one cool use case that I've seen is this company that's making Cursor, but for Xcode.

Because obviously if you're in the Mac world, you have to use their own proprietary tools. And knowing Apple, they're probably going to take forever to build anything like it. And one interesting thing that we learned is that you need tool calling to go do these things, to build something like Xcode or Cursor. But when you generate code, I don't know if you've ever seen this in Cursor, it always messes up Markdown stuff for me. Because it tries to put triple quotes and their parser for some reason doesn't handle that correctly.

And if you're generating code diffs inside of JSON blobs, they mess up all the time because you need so many escape characters. Every new line needs to instead be a backslash n. Every quotation mark needs to be a backslash quote. But because schema line parsing is so flexible, we actually kind of found it funny that they just told the LM, hey, output my code in triple quotes.

And schema line parsing took the triple quote text, which doesn't need to be escaped, and just converted to a string, a regular string that is properly escaped. But I think that was one of the coolest things I've seen in a while. A lot of dynamic UIs and a lot of generative UIs that I've been seeing with BAML. Those, I think, have been the coolest, like visual things to experience. Can you give an example of like a generative UI? Yeah.

Yeah, it's hard to describe generative UIs, but I'll do my best. Because it's just like a new concept that doesn't really exist in so many places. Let's take the idea of like a recipe generator. When I have a recipe generator, we can all go to ChatGPT and ask it to dump out a recipe. And that's, I'll be fine. It'll do the thing.

But what I really want is something which can almost show me carbs of here's all the ingredients I have and here's what amount I want. And once the carbs are done, then I want to show up and say, hey, there's a separate section for here's the steps that you have to follow. And here's the preparation steps at the end. And wouldn't it be nice if there was a spinner moving along each one of those sections showing exactly what it's currently working on?

That sort of stuff is, again, like in the old days of web dev, would take you a lot of things to do, like state management. Took you a lot of code to go do. We revealed this new thing called semantic streaming that allows you to have that data available in a type-safe way. And you can just build a UI like that now. And now your chat app all of a sudden can not just respond to text, but you can have dynamic graphs, you can have dynamic models. And your chat app suddenly stops feeling like ChatGPT.

Because if you're out there trying to build a company around chat, I think you have to ask yourself, why would my users not just go to ChatGPT? And one huge value prop I think you can do is build the best UX for the thing that you are trying to provide. If you're trying to analyze stocks, show me ticker symbols in real time right there. Show me graphs.

fun things like that that no chat gpt is just not going to go do nice yeah those are great examples i love that recipe cards stocks that does make it easy to see what a generative ui means nice uh good thinking on the fly or maybe some examples that you're aware of in the real world but those are cool um so where do you think you're going to evolve next you may you must have you

you seem like a really creative thinker, really sharp. You probably have a million potential directions and it's probably gonna be any number of pivots relative to what you think might happen. But maybe kind of in the near future, what kinds of new features

or capabilities might we expect from BAML? In some ways, designing a programming language, the hardest thing about it is knowing what not to do. Because if you're truly inventing a new syntax, you can technically make anything happen. If your compiler can read it, you can do it. So we often try and practice saying what we're not doing instead of what we are doing, because it's so easy to try and commit to everything. But the most important thing is we're trying to bring more powerful capabilities into BAML.

One paradigm that we think is going to become more common in the future is you'll want to say something like, "Hey, I want to send all my free users to models like GPT-4.0 Mini, and I want to send all my paid users to models like GPT-4.0 One." How do you represent that in your code in a way that's elegant? How do you be able to repro exactly what a paid user sees versus a free user sees in that hot reload loop that we referred to earlier?

We're going to try and introduce concepts like that, where you can do if statements and for loops and conditionals in general into BAML, because this has mostly been the direction users have been asking for, like make it more powerful, which is kind of scary to us. But really, the other element of it is around the tooling that you can expect to see around BAML.

One of the most important things that I've learned about machine learning in the last 10 years is that, and I'm sure we've all heard this a thousand times, but data is key. Data is the most important thing that you can have. And so many times people get their data pipelines wrong.

And it's not like they do it wrong because they're intentionally doing so. It's just that there's a thousand like foot guns that you can just step on and get wrong. Another big problem that a lot of companies do is they develop some kind of AI functionality, assuming that the data that they need exists and they just don't have the data. Exactly. Right. And there's you can just and then what the worst part people can do is they make their data pipelines so rigid that

And then what they do is they change their code. And now their code is sending different data, which means your data pipeline needs a lot of work. So every change is a massive change. And now you're shipping slower because you have to go update your data pipelines. BAML is backed by a version of data schemas that are similar to protobuf. But without all the kerfuncle that comes with protobufs,

So for those of you that don't know, protobuf is like a way to represent your data models in a language agnostic way that is able to version control them. So if you change a schema or if you change like an enum and add a new category to it over time, you can know that the enum has changed and still serialize and deserialize old values. BAML lets you do that in a super ergonomic way without actually maintaining that in your code. And our data pipelines automatically evolve to do that and address that. So if you have

an enum with five categories, and three months later you have an enum with 50 categories. We're actually able to render that difference. And we're going to go and share more about the data platform over the next quarter or two for us. Sweet. So we know lots of the advantages now of using BAML versus just sending a prompt without that kind of structure and reliability that BAML provides. If we have a listener out there who wants to start with BAML right now,

How do they do that? What's it like your first time using BAML? How do you install it or get experience with it? Yeah. So BAML takes, we work really hard to make it super easy for you to install. We appreciate anyone that is willing to learn the new syntax and go with it. So we have two things to help you out. One, we have an online playground for you to experience BAML without installing it at all in your repo.

So you can just go to promptfiddle.com and you can just experience what it will be like to use BAML in VS Code or Cursor or anything else you want right there. And the second thing you can do is depending on whether you're using TypeScript, Ruby, Python, Java, or anything else, we support every language. You can just install BAML using the package manager of your choice. And we have instructions how to do that in our repo. And you just do pip install BAML-py. Then you add a couple of BAML files to your repo.

And that's it. Like, that's the work. And for anyone that really, really doesn't want to learn the BAML syntax at all, we have a chat that you can ask and you can just describe your problem to it and it will actually generate the BAML code plus a couple test cases

plus Python or TypeScript snippets to show you how to use that BAML code in Python TypeScript of your choice. Very cool. All right. So listeners, you've now got another tool for your tool belt. Go out there and check BAML out by boundary. You won't have a hard time finding it. We've of course got a link in the show notes as well. Uh,

I haven't used it personally myself yet, but next LLM project, it seems like a no-brainer to be using BAML, to be taking advantage of all the efficiencies and capabilities that BAML offers relative to just providing my plain text prompt and getting back whatever I get back from the model API that I send it to.

If we have listeners that want to be working with you, they could be checking out the Boundary website and seeing if you're doing any hiring. But I thought you might want to fill them in on the interesting hiring process that you have. Yeah, so we're a little weird, like all things we do. We like to do things a little atypically.

But we've actually never posted a job posting online. And part of the reason for that is because hiring people that are willing to make a compiler, we just want a bunch of people that want to do that. And there's just not that many on the internet that are necessarily actively seeking that out.

But the approach that we've hired is we want to build a team of engineers that have good taste and know how to build tools for developers and complicated tools so our users don't have to. So our approach so far has been kind of simple, which is you just send me an email. And I guess we'll put my email down below if we want, which is titled Why I'm Awesome.

And just brag about yourself, right? Three amazing things you've done. And really what we're indexing for is like complexity, but really how well you communicate as well. Because our syntax is a way of communicating with our users. And we have to do that exceptionally well. And if that goes well, then we just get on a quick little call and we chat and make sure we don't hate each other. And if that goes well, we just call up three of your references.

And if you're referenced and we actually, instead of interviewing you, we interview your references and we go deep into the tech that they worked on with you. And that actually gives us a better signal for what you're, what you have done and what you have been able to, how you work well with other people and gives us really good insight on how we think you'll fit into a team. And that's been the strategy so far. And at that you get a job offer.

Very cool. So they don't get asked any technical questions directly? Yeah, usually not directly. And then one last thing we do after we give you the job offer is we give you the opportunity to come and spend a week with us so you don't have to actually officially commit to our company. And you can get a feel for what we're actually like in person because we are five days a week in person. And the last thing that we do once everything is good and dandy and we've got a feel for each other and you're really, really excited and hopefully we're really, really excited too,

is you just tell me what company you want a job at and I'll help you go interview there and I'll help you land that job personally. And we hope that if you get your dream job and you get us, you still choose us. Excited to announce my friends that the 10th annual ODSC East, the Open Data Science Conference East, the one conference you don't want to miss in 2025 is returning to Boston from May 13th to 15th. And I'll be there leading a hands-on workshop on agentic AI.

Plus, you can kickstart your learning tomorrow. Your ODSC East Pass includes the AI Builders Summit running from January 15th to February 6th, where you can dive into LLMs, RAG, and AI agents. No need to wait until May. No matter your skill level, ODSC East will help you gain the AI expertise to take your career to the next level. Don't miss. The early bird discount ends soon. Learn more at odsc.com slash boston.

Very cool. I want to dig into that, the in-person work thing there for a sec, because I badly miss up until the pandemic, I had always been working in person and I miss it so much now

Something that you and I were talking about before we started recording was how, I think it was before we started recording. Sometimes it's hard to remember. But we were talking about how when you're using Slack, for example, you're using Zoom, you're working completely remotely, you have colleagues. But when you're going and meeting with people in person, you really know what's going on in people's lives. You figure out who in the group you're like, oh, we should be grabbing a beer after work. And that just kind of organically happens. And you end up having

I mean, a huge proportion of my friends through my life have come through people like either they are the person I'm working with or their friends. You kind of have these people

Yeah, it's a huge social experience. It makes work fun. And since the pandemic, I probably laugh like 10% or something, the amount that I used to laugh. Because you're working around people that might have similar backgrounds, similar interests, and are often really smart, funny. And so it's just like work can be hilarious. Yeah, we...

I remember when we first started this remote work thing, I mean, we were a Slack competitor, so we were full-on in remote work when we were doing that in the very beginning. And we kind of had to be with a mission of what we were doing. But, like, really it was about human connection. And when me and Aaron started this journey, we told ourselves, like, I don't care how much money we make out of this, if we hate each other, that's an L. We don't... That is the worst-case outcome. And when we hired our first person, it was very similar. We just don't want them to hate us. Ideally, they'd like us. But...

I think in person is the way to go. Like it's not only like for our, I think for our tech specifically, we need to be in person just because the amount of bandwidth worth sharing in any conversation about like syntax, like you cannot digitally do that, but it's just fun. Like you get to do weird things. You have your own office. It feels like home as like horrible as your office is as a tiny company. It just feels like home. Um,

And it's just fun. Like I remember one of our colleagues has kids and they brought them into the office. The kids were just excited. And you get to know them. You get to know their partners and you become friends. Like a lot of, like I think I feel the same way as you where almost all of my closest friends, except the ones from college, are all through work, 100%. And I know them like 10 years plus now. And I love that. It's like, I would not trade that. And the college thing isn't even really different. It was like you and these other people showing up

During daytime hours and sometimes grinding it out late and grabbing a beer after, like it's really kind of the same. Yeah, it's, I don't know. I think there's this thing about work that a lot of people have, which is they do a job because that's a job they have. But I think every now and then, if you're able to find a group of people that you truly like working with, even if it's the most boring thing in the world, or maybe it's the most exciting thing in the world, but if you have a group of people that you really like,

I think going in is amazing. For sure. Going in is amazing. It can be so much fun.

And yeah, hopefully somehow I figure that out again someday. We should have you up in Seattle sometime coming out with us. Yeah, for sure. I'd love to. Yeah, I mean, I do. So, you know, I love recording in person when I have the chance to do it with guests. And something recently for me in 2024, I recorded, I was the host of six television ads for NVIDIA, Dell and AT&T.

And with any of those shoots, it was so awesome because that felt like, again, you know, you, there's 20, 25 people on a shoot and you get to know each of them to some extent. Um, and there was also, I did all six of those ads with Bloomberg TV. And so the same, you know, there was a lot of overlap in who was showing up from Bloomberg. And so you, you know, we'd, we'd be shooting in San Francisco. That's why I was in San Francisco to meet you. I was actually, I was doing a shoot for an NVIDIA ad.

And so, yeah, so, you know, there were people

at in San Francisco with me there that I had, you know, now seen in five other cities in the past year and you go for dinner before and after. Yeah, totally. It's really cool. So hopefully there's more of that in my future. The podcast is probably going to stay a remote workforce. Nice man. Well, it's been so great chatting before I let my guests go. I always ask for a book recommendation and you told me that that would be easy.

Yeah, I have only two types of things that I really, really like reading about. More recently, it's been the Rust manual. Please go read it. If you're a developer of any kind, I think it just changes the way you think about code, like the way it does exception handling and everything else and how everything is a result type. Highly recommend it.

And this isn't so much reading, but if anyone really enjoys it, CppCon puts out great lectures and great talks. Some of the best talks I've seen are all from CppCon.

I highly, highly recommend watching those and listening to those as well. Very cool. The Rust Manual and CppCon Lectures. Very nice. Or talks, I guess. They're long enough that they feel like lectures, but in the best way possible, not in a... Yeah. You know, I don't even really kind of distinguish between those two terms, but I do know what you mean. Yeah, some people don't like lectures. Thank you for disambiguating like your Bama language does. I had an ambiguous prompt.

Nice. All right. And then very last thing, you've already offered your email address, which we'll have for our listeners in the show notes. What are other ways that people can reach out to you or follow you after the episode? So my LinkedIn is probably the best way to get in touch with me. My co-founder is a lot more active on Twitter, though, for most people. And sometimes I'll type on his Twitter when he lets me. But my Twitter is dead.

And then Discord. Honestly, if you need any sort of prompt engineering help, I love seeing new problems. I've seen problems like this one company was trying to parse a 100-page PDF of bank statements, and we work with them. And now they're able to do that 100 pages reliably with zero sense of error. And that was fun. I learned a lot, just weird things throughout that problem.

So our Discord is a great place to ask those questions and I will do my best to learn to help. Awesome. Yeah, we'll be sure to follow up if you can get that Discord so that that's in our show notes as well. Maybe you already even provided that to us and I just didn't notice. Awesome.

Bye, Pav. I've really enjoyed this conversation. You didn't disappoint. You were entertaining, as I anticipated you would be. Thank you so much for taking the time with us as a busy founder of an early stage startup. It means a lot to us that you give us that time. And yeah, I learned a lot. No, thanks for having me, John. This was really fun. It was a great way to spend a Wednesday.

Vaipav Gupta is terrifyingly clever, yet remarkably approachable and fun to speak to. I loved having him on the show today. In it, he detailed how BAML, basic but machine learning, is a programming language specifically designed for natural language generation interactions with AI models, offering things like a hot reload loop that enables testing 240 prompts in 20 minutes versus, say, only five prompts without BAML,

Token efficiency improvements of 20-30% through schema-aligned parsing, which intelligently handles model outputs without requiring explicit JSON formatting. Compatibility with models that don't natively support function calling like DeepSeq R1 and OpenAI O1. And built-in type safety and error handling for more reliable AI applications.

Separately from BAML, another highlight for me in the episode was learning about Boundary's unique hiring approach that bypasses technical interviews in favor of candidates sharing three things that make them awesome, in-depth reference interviews, and a week-long trial period. As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for VibeHub's social media profiles, as well as my own at superdatascience.com slash 859. And

And if you'd like to connect in real life as opposed to online, I'll be giving the opening keynote at the RVA Tech Data and AI Summit in Richmond, Virginia on March 19th. Tickets are quite reasonable and there's a ton of great speakers. So this could be a great conference to check out, especially if you live anywhere in the Richmond area. It'd be awesome to meet you there.

Thanks, of course, everyone on the Super Data Science Podcast team, our podcast manager, Sonia Brajovic, media editor, Mario Pombo, partnerships manager, Natalie Zheisky, researcher, Serge Massis, our writers, Dr. Zahra Karche and Sylvia Ogwang. And of course...

the man himself, our founder, Kirill Aramenko. Thanks to all of them for producing another great episode for us today, for enabling that super team to create this free podcast for you. We are deeply grateful to our sponsors. You can support the show by checking out our sponsors links, which are in the show notes. And if you'd ever like to sponsor the podcast yourself, sponsor an episode, get your message out through us. You can find out how to do that at johncrone.com slash podcast. Otherwise,

Share this episode with people who'd love to learn about BAML. Review the episode on your favorite podcasting app or on YouTube. Subscribe, obviously, if you're not already a subscriber. Feel free to edit our video content into shorts or whatever to your heart's content. Just refer to us, but mostly BAML.

I, you know, doesn't really matter to me if you don't do any of those things. I just hope you'll keep on tuning in. I'm so grateful to have you listening and I hope I can continue to make episodes you love for years and years to come. Until next time, keep on rocking it out there and I'm looking forward to enjoying another round of the Super Data Science Podcast with you very soon.

859: BAML: The Programming Language for AI, with Vaibhav Gupta 59:30 Share

Super Data Science: ML & AI Podcast with Jon Krohn

Deep Dive

Shownotes Transcript

859: BAML: The Programming Language for AI, with Vaibhav Gupta