We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The Development Basics of Managed Inference and Agents

2025/7/2

Code[ish]

AI Deep Dive AI Chapters Transcript

People

Hillary Sanders

John Dodson

Topics

Hillary Sanders: 我是Heroku AI团队的AI工程师，主要负责构建帮助Heroku客户集成AI到他们的应用中的产品。我们团队还利用AI来改进Heroku的现有产品，例如通过构建定制的内部工具和流程。我个人对统计学和神经网络非常感兴趣，并认为将统计学与大数据和计算能力结合可以产生强大的机器学习能力。我对神经网络的兴趣源于对其工作原理的好奇心，并致力于解决实际问题，例如优化神经网络。

Deep Dive

Shownotes Transcript

Translations:

中文

Hello and welcome to Kodish, an exploration of the lives of modern developers. Join us as we dive into topics like languages and frameworks, data and event-driven architectures, artificial intelligence, and individual and team productivity. Tailored to developers and engineering leaders, this episode is part of our deeply technical series.

Hello everyone, my name is John Dodson and I work for Heroku on the builds team. I've been programming ever since my parents bought a VIC-20 and I wanted to make my own software and if I'm being honest it was mostly just weird games. I'm a huge Heroku super fan and I'm excited to talk to you about what's awesome in Heroku. And today to talk about what's awesome at Heroku I'm joined by Heroku's own Hillary Sanders. Hello Hillary.

Hey there. So Hillary, we're just going to jump right into it. So I wonder if you could tell us a bit about yourself and what you do at Heroku. Yeah, I am an AI engineer and researcher with a background in statistics and neural networks. And I work on the Heroku AI team, where we build products to help Heroku customers use AI and integrate AI into their applications.

And we also use AI to sort of build custom internal tools and pipelines and improve existing Heroku products. That sounds fun. So your journey as a software engineer, what's it been like up to this point? Any tips for developers just starting out in the game? Yeah, my journey has been super fun and lucky. I think it started out because I was in college and I fell in love with statistics, obviously.

Common path. Yeah, common path. I was puttering around taking way too many classes and took a stats one. And it maybe was not exactly like a religious calling, but something like it. I think stats is like the study of how to optimally evaluate evidence about the world, which I find to be very beautiful and important and really speaks to how my mind works and how I felt the world should make decisions to maximize good.

I realize that statement might sound horrifying to some people. If it does, then they should talk to me. But that's what got me interested in stats.

What I realized is that if you combine that with the power of big data and compute, stats becomes not only beautiful, but very powerful, i.e. machine learning. And in this day and age, and certainly in the next couple of decades, perhaps scary powerful. And so that's how I got into machine learning. I just fell in love with stats and started doing research with some professors and then went to the Bay Area.

I got into neural networks, honestly, because I was confused by descriptions of them because no one could explain them back in the day. I actually didn't study that in college. It was all about hierarchical Bayesian networks and Markov chains. Amazing things.

But I ended up getting into neural networks because I was just burningly curious why no one could explain how they worked, which made me really want to figure out how they worked. And that helped me end up getting some jobs for quite a few years doing neural network optimization and eventually led me to Heroku. So that was sort of my journey to become an AI software engineer.

Insofar as tips, I would comment on the job market today, which feels in part inauthentic because I haven't experienced a job tech market this bad before. Agree. It's rough out there. And I'm in a better position than most. Yeah. Yeah. Because like, I think if you're a junior dev entry, it's worse than the experience I'm going to have. But I've also made lots of mistakes, so might have some tips there.

My first ignorable tip would be to not write cover letters. I don't know if this is correct, but I think they take a long time. Often people are just going to assume that you're writing them with an LLM.

You can apply to a lot of jobs in the time it takes for you to write one. So unless you have something really meaningful and specific to say, and the question is pretty unique, I think just skip it. And it's not just for the time trade-off. I think it's to avoid depression because I am friends with people who have interviewed for a year and have gotten really depressed. And it's like hundreds and hundreds of applications.

And it feels so depressing to write all these cover letters and kind of pour your heart out, even though you're trying not to. And then essentially be ghosted by all these companies.

So you know what? Just skip it. I think it's too depressing and not very useful. That's my first tip that is not really informed by data. Right. No cover letters. Check. Yeah. Another one is if bad companies reject you, don't pay too much heed because that is an incredibly noisy signal. Absolutely.

I and others I know have gotten job offers for like twice the salary in the same week that other companies paying half as much and that sounded way less cool rejected us. And that's like very standard. And in fact, I would argue I've noticed a positive relationship between the job and company quality and the simplicity of the questions being asked during the interview process.

for maybe reasons that I won't get into. So if you get rejected from places, don't take it too personally. Maybe take it as very, very noisy data on what to focus on.

My third tip is the most uncomfortable one. I can't wait. Yeah, it's just awful. A big part of interviews, you know, once you get in and get an interview, you have a much higher probability of getting a job. It's pretty important. You know, the code you write and how you do, but how you present yourself and how you communicate is also very important. And maybe often underestimated because how you communicate and present yourself is really important to the job.

So I recommend doing mock interviews and videotaping yourself and then gasp,

watching them back over, which may cause you significant nausea and emotional discomfort. But I think like per unit time investment, it's super, super effective at making you get better at interviews. And I really recommend it, even though it's just the worst and terrible. I agree with you there. I think doing mock interviews is one of the most important things you could do. Maybe practice with a friend. You've got a job.

There's oftentimes an employment office around that you can do that with, or you could just, like you said, record yourself. There's plenty of interview questions online that you can practice with. So yeah, I think that's great. That's great advice. So Hillary, you're on the Heroku AI team, which is still a pretty new team at Heroku. And I'm wondering if you can tell us a bit about why it was created and what problems the team's trying to solve.

Yeah, I think we're trying to solve lots of problems. Essentially, Heroku was incredibly cool 15 years ago, very like hot. I think it's still very cool. And we're trying to do a great job of keeping up with the times and making sure we do a very good job of doing that.

And so that leads us to focusing on two main areas when it comes to AI. Like one, making it really easy and seamless to incorporate AI into your Heroku applications and making sure those AI components can easily interact with your databases and other components in your Heroku space. And then also just using AI to make our existing products better.

So if you have databases on Heroku or apps on Heroku, or if you want to be vibe coding with cursor or VS Code, we should be helping to make sure that AI is making that experience really, really good.

Cursor should have an extension that makes it easy to have the LLM understand how to deploy your app as a Veroku app, that kind of thing. So kind of enabling all of that is the main goal. And I think that leads to lots of really interesting, fun products and features. So Hillary, this is a really important question. My son's eight.

He enjoys watching various brain rot videos on YouTube, you know, as one does. And one such piece of nonsense are these versus videos such as 10,000 Harry Potters versus a million predators. Like I said, real important. So we're going to apply this to AI. Which team do you think would make a better application faster in six months? 10 principal developers with no AI or 100 fresh bootcamp graduates with all current AI tools at their disposal?

Woo. Okay. That I have complicated feelings here. There's a lot of trade-offs. There are absolutely. I will first raise a concern or discussion topic with the implicit hypothesis that a hundred engineers on a single application makes things better.

Unless you're organizing it very well. I feel like that's implied by the question. It's the first trap of the question. Absolutely. That is really hard. Right. It is. If they all have to work together on one app and one code base and have six months, in this specific situation, I would bet on the 10 principal developers. Yeah.

However, there are many permutations to this that I think would lead me to bet on the fresh boot camp grads for sure. Which permutation? Okay, so if they're allowed to like break up into groups of five or 10.

Grab eight smartest people you know that do different things in the hundred people and run into a room. If they can all go be siloed and work together, and then maybe you take like the best thing that they've built of all those groups after six months, that I could see beating out on the 10 principal developers for sure. I agree. I agree on that. Yeah.

Because if you're a fresh bootcamp grad with a good product vision and you're using AI, sure, a lot of the time you might end up going in bad directions because you just don't have all of the lovely mistakes and learnings that the experienced developers have made along their career paths. You might go in the wrong direction, but one or two of the teams will probably go in a really good direction. So I would bet on them in that circumstance. Another permutation is if you shorten the time period.

If you have one day to build a thing or a week or even a month, then maybe I'd bet on the bootcamp grads because you can do so much so fast with AI. And that is just a lot harder without it. So that's like a thing. And additionally, if you have six months when you're not integrating into existing complicated services, bureaucracy and et cetera, you can build a lot. And that means your project gets pretty complex and your code base gets fairly big.

especially if you're using too much AI. And existing LLM models struggle with various things. They're like amazing, but they also struggle with common sense reasoning and they struggle with understanding a really complex code base and then making changes on just a little bit.

There's a lot of really cool tools that people are working on and that Heroku is interested in to make that easier, like shoving a whole repo into the context window of a bigger and bigger model or trying to have good ways to look up the relevant parts of your code to put into your context window for your LLMs so that it can edit your code or adjust things.

But that still doesn't work amazingly on pretty complex code bases. So on a six month time span, I think there are decreasing marginal returns to like AI, especially when if you made wrong turns in the beginning, that's not going to be great. But if it's a short time period and you have a clear vision, like, yeah, junior devs with AI, they can build a lot.

Awesome. So a different track here, Ruby and Rails, Ruby and the Rails framework are the language and platform of choice for Heroku, at least historically. So we support much more than that, including .NET, which we just added, which I'm really excited about. But historically, Ruby and Rails are the bread and butter of what we do. So for you, like what language would you consider to be the language of AI if there even is one?

I mean, Python, for sure. It's super popular. Yeah. If you're trying to like develop on neural networks, you're typically almost always using Python. It's great. It's very popular amongst AI enthusiasts. It's very easy to use. Lots of support and lots of packages. So yeah.

All right. So moving to talking a little bit about Mia, which is our newly released product here. So question for you, what is Mia? What does it do and why should customers use it? Mia, yay. Mia is Heroku's managed inference and agents add-on.

And it does a lot of things and it will do even more things. But essentially, it is an add-on that makes it really easy to integrate AI. So specifically like large foundational models, like a LLM, like QuadSonic 3.7 into your Heroku apps. So it lets you kind of just add the add-on, attach a model like Quad directly to your app without the hassle of

external account setups or like sending your data out of Heroku or API key management for data security issues and inference calls will just work. That is very nice and avoids a lot of hassle, but there's also a lot of kind of special sauce features that we have been adding to make the experience nicer and to take away a lot of boilerplate code that you often have to write in certain situations.

So we have really cool features relating to like agentic automatic tool execution, including some tools that just come built into Mia that will just work off the bat automatically without you having to deploy your own tool servers. We have nice dashboards and dashboards for the tools you've attached to your models. And we're also providing a lot of really cool features relating to MCP.

MCP is a open source protocol, stands for model context protocol. And it's basically helps define how AI applications should interact with tools and databases and resources in like a nice standardized way.

And we're betting pretty heavy on MCP, and I'm very happy that we're doing that. So a lot of our features are built around making sure deploying your own MCP servers to Heroku is really easy and adding in yet more special sauce around that. Great. I like MIA because it's as easy to add AI to your application as adding a database or adding Redis. And it's going to be first class. When I was originally looking at the API that the team designed,

I love the simplicity of it and the extensibility of it. It's really great work. Yay. Yeah, yay indeed. So what's the biggest problem Mia solves for Heroku customers? And where do you think developers should absolutely consider adding Mia to their applications?

Yeah. Okay. So the biggest problem Mia solves, Mia solves like a lot of tiny annoying problems. Right. Similar to what Heroku does really well. Like we solve a lot of tiny annoying problems for you. So it's not a frustrating experience to deploy apps.

So if I think about the biggest problem you solve, it's probably pretty boring. You don't have to go make an external account with like OpenAI and give them your credit card and worry that your data is being transferred to them. That's pretty simple, but it is nice.

And then you have a very easy to set up LLM or image model or embedding model that can easily connect to your other Herku apps or databases if you so choose. So maybe that's the biggest problem solved. It's perhaps not the most interesting one, but it's relevant to everyone who uses it. So it's kind of high impact. Yeah, absolutely. What was the other question? Where do you think developers should be adding it?

If they have an application where they were going to call out to an external third-party large foundational model,

I would strongly consider using MIA because you're getting often the same thing, but it's just going to work really nicely on Heroku because you get all of these features built in for free, like dashboards and token consumption and all those really nice features relating to MCP. And if you want like an MCP server, oh, all of a sudden it's pretty easy to add OAuth authentication and do server registration that works really well with MIA.

And that kind of stuff is just going to be totally doable, but harder if you use an external third-party LLM. So I would say if you want to use large foundational models like these, APIs in your apps, that is awesome and consider using Mia.

So this is really, really important. In the Star Wars film Attack of the Clones, Padme and Anakin travel to Naboo via the H-type Nubian yacht, which is a luxury vessel, which is part of Naboo's fleet and known for its sleek design and strong deflector shields, obviously. According to a Tumblr blog post that did the math, the total travel time to Naboo was 10 days and 10 hours. So my question for you, Hillary, is...

How many movies or Disney Plus seasons do we need to cover this journey? 10 days, 10 hours, really important. Yeah.

I think that's a really excellent question. Oh, good. And I have an answer. And that answer is it's Andor season two. Oh, my gosh. I'm watching that right now. It's so great. It's a great show. Yeah. Ten out of ten. Yeah. I'm literally watching it for movie night tonight with friends. So no spoilers. Oh, yeah. It's fantastic. Yeah. I agree. Yeah. If you're listening now, like, what are you doing? Go watch it. Oops. After. Yeah, it's really good.

So what kinds of technologies did you use and the team use in the development of MIA?

Oh, that's a fun question. So insofar as like Mia as it exists today, we are mostly using technologies that Heroku provide. So we're dogfooding Heroku, which I think is just typically an excellent thing to do. So we're using Heroku Dynos and Heroku Private Spaces and Heroku Pressgress and Heroku Redis. And despite what I said earlier about Python, a lot of our code is written in Go because we wanted routing to be fast. Right. And we're doing less neural network development.

We want API routing to work really well. The actual models are hosted by Amazon and secure AWS accounts. We might add more models in the future without the might part, actually. So that's sort of like the rough tech stack we're using today. I will say that a while back, we were playing with the idea of hosting our own models, which I think...

It was definitely the right decision to move away from that, but we got to play with a really fun set of technologies. So like Trident and TensorRT and VLLM and PyTorch and doing a bunch of cost performance analyses on different EC2 instances, like Infra-Nchi and Tranium, classic GPU powered EC2s.

That was super fun. I think that was just fun. But those pieces of technology we're not using in the current MIA. We are, though, using a bit of Python. Like, we're publishing various, you know, open source MCP repos so people can use our first-party tools. If that floats their boat, they can deploy their own MCP servers and have that work with MIA. And they can also kind of just clone some of our getting started repos to do that kind of thing even more easily.

Can you walk me through what you think is the coolest feature of Mia? Yeah, I don't know if this feature will be for general audience by the time this podcast comes out. If it's not right, it will be very soon. Probably the coolest feature for me for me is what we're calling first party automatic tool execution.

So Mia offers like an agent's endpoint, which allows you to tell your model like, hey, you can call XYZ tools. And normally with an inference provider, the model would select tool and then call back to you. And you, the client would have to be like, oh, the model wants to call this tool. Now I have to like handle that and call it to some server I've deployed or do something and then give it back a response. But if you use the agent's endpoint, you have the option of us just doing all of that control loop nonsense for you.

And what is extra cool, and I think is maybe the coolest feature, is we are also offering tools that just come working built in natively with Mia.

So the idea is you can create an app on Heroku and attach it to Mia and maybe attach Claude Sonnet 3.7 and say like, hey, Claude, you have all these tools available that I know like Heroku makes available. And I want you to like write Python code and it's going to run on a one-off dyno in my Heroku account. Or like, I want you to like be able to search Google or I want you to be able to look at one of my read-only databases and tell me about it. Or I want you to be able to like parse this random PDF and talk about it.

There's various tools that we're just thinking are very useful and we want to just offer natively for free. And what I think is really cool is that that's something that people can just use in the first three minutes of using Mia. And it just doesn't take a lot of boilerplate and a lot of work to get that working. Because again, like that's something you can build yourself, but it is really nice if you don't have to build it yourself and it just works beautifully in three minutes.

of writing a couple lines of code. That's my favorite kind of development. It's pretty nice. It's incredible.

So oftentimes when teams get together to mix up the stuff that comes out in terms of products, oftentimes we have technical disagreements. It happens. People disagree. I was wondering when it came to building Mia, were there any disagreements that the team had or did the product just sort of naturally evolve without any? I guess we'll have to define disagreements, right? Because I will say, honestly, the engineering team and the hierarchy team is the most pro social set of engineers I've ever worked with. Yeah.

And I've been doing this like 12 years. So that should be a significant statement. So if I define disagreements, like high uncertainty, high impact, hot topic decisions that we waffle on as a team, we've totally had those because there are a lot of decisions that there's not a clear answer and we really need to work through it together. Like we're still waffling over some things like this. If you want the hot gossip that may or may not be deleted from this

One really interesting problem that we're working through is what is the best API schema for future models that we release or future endpoints that release? What's the best API format? And there are just so many conflicting incentives here that are really important.

I don't think there is a clear, perfect answer. There's just a lot of different trade-offs. You don't want to overwhelm customers with choice. That's exhausting. I don't want to evaluate 250 types of jam. I just want like a delicious jam and I want to eat a snack. Like that's what I want. Same with Heroku.

On the other hand, though, there's trade-offs between different API schemas. Like you can take your underlying model providers and use their API schema, but then you have like a different one per model. You can convert everything to like a really popular schema like OpenAI, but then you'll kind of fall short on the edge cases. And you can do a lot of work to fix that in many situations, but at the end of the day, you'll still have little edge cases that might make the experience less elegant.

when you want to use a feature that our model supports but OpenAI doesn't or vice versa, or want to be explicit that we're actually ignoring a feature, but you also don't want to break things when people are running the code through OpenAI SDKs. There's so many benefits with doing that too. It's so nice to just use all of the example code online and all the packages online that really know OpenAI really well. And then to add to that party, you also have all of the custom stuff we're building

that needs to be like a superset on whatever API format we decide on that relates to the agentic capabilities that we're offering with like the automatic tool execution and that kind of thing. And so that's a really fun, maybe quote unquote disagreement, but I think more just like super interesting problem that we will continue to think about solving really well as we release more and more things.

And honestly, if you're listening and have a strong opinion, email me because I'm curious to hear what you think because it's a super tough problem and people have a lot of good opinions.

Well, inbox flooded, I'm sure. I hope. That would be great. So finally, I wonder, to wrap things up here, thank you, Hilary, for talking to me today. Oh, thank you. Yeah, you're welcome. What's coming next with Mia? What can we look forward to? I know it just came out, I know. And everyone's looking to see what's next, but I'm just curious, what's next for Mia? So much. I mean, mostly the things that we wanted to squeeze into our initial release that didn't get squeezed in.

The biggest category of stuff is a lot of really cool features relating to MCP, the Model Context Vertical I talked about. So we're offering different types of models, like an embedding model for RAG, for Shoevel Augmented Generation, Image Model, and a couple chat models. But you can really supercharge chat models with tools. And so we're making that big bet on MCP and releasing a lot of super cool features relating to that pretty soon.

So I'm excited about that. Me too. And thank you again, Hilary, for talking to me today. Thank you very much. Thanks for joining us for this episode of the Kodish podcast. Kodish is produced by Heroku, the easiest way to deploy, manage, and scale your applications in the cloud. If you'd like to learn more about Kodish or any of Heroku's podcasts, please visit heroku.com slash podcasts.

The Development Basics of Managed Inference and Agents 26:28 Share

Code[ish]

Deep Dive

Shownotes Transcript

The Development Basics of Managed Inference and Agents