We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

#112: Better Developer Experience for Event-Driven Architectures | ft. Alex Bouchard, co-founder of Hookdeck

2025/1/29

Real World Serverless with theburningmonk

AI Deep Dive AI Chapters Transcript

People

Alex Bouchard

Topics

我创建Hookdeck的初衷是为了解决在电商业务中处理Webhook的难题。Hookdeck作为一个Event Gateway，与AWS EventBridge类似，但更专注于工程团队的需求，提供更灵活的服务器端运行方式，并使用HTTP推送队列而不是拉取队列，这使得它可以运行在任何云平台或运行时环境。Hookdeck关注完整的软件开发生命周期，特别改善了本地开发体验，允许同时连接多个隧道处理异步事件，并提供对本地开发过程的完整可见性，包括事件的请求数据、响应以及重试功能。Hookdeck作为HTTP代理，不强制使用特定事件包络，并支持通过转换功能自定义数据格式。Hookdeck提供自动重试机制，包括指数退避和自定义重试计划，并允许手动或基于条件的批量重试。Hookdeck使用“Issues”功能替代死信队列，提供对失败事件的详细分类、分析和批量重试功能。Hookdeck提供单一模型来控制并发性，简化了在不同运行时环境中管理并发性的复杂性。Hookdeck的转换功能使用隔离的虚拟机运行纯JavaScript函数，高效且安全。Hookdeck通过与其他工具集成（例如，Event Catalog）来实现模式发现等功能。Hookdeck的架构分为两个独立的组件：摄取层（API层）和事件生命周期管理层（交付队列），分别使用不同的技术实现。Hookdeck通过声明消费者来映射事件架构拓扑，并支持回调功能实现事件链。Hookdeck使用访问令牌来实现与AWS服务的集成，例如Lambda，并支持IAM角色。

Deep Dive

Chapters

This chapter introduces Hookdeck, a serverless event gateway service built on Cloudflare Workers, and compares it to AWS EventBridge. It highlights Hookdeck's focus on developer experience and its operation as an HTTP push queue.

Hookdeck is a serverless event gateway service.
It's built using Cloudflare Workers and other technologies.
It differs from AWS EventBridge by operating as an HTTP push queue instead of a pull queue.
Hookdeck focuses on developer experience and the full software development lifecycle.

Shownotes Transcript

Translations:

中文

Hi, welcome back to another episode of Real World Serverless. Today we're joined by Alex Zepeshat. Hey man, how are you doing? I'm doing great. Thanks for having me, man.

So Alex, you are the CEO and the co-founder of HookDeck, a serverless event gateway service. And I guess we did some partnership a little while back on some sponsorships for my newsletter and had a look at your service. It looks quite interesting. And I think from our email conversation earlier, you have also built HookDeck on top of

serverless technologies as well using Cloudflare workers and things like that. So I'm really quite interested in hearing about how you guys have built the HookDeck under the hood and how you found Cloudflare workers because me and I think a lot of the other people that are listening to this podcast are more coming from AWS side of things. So really just hear how HookDeck, say, differentiates from, say, something like AWS EventBridge or SNS or what have you. Before we get into it, yeah, tell us a little bit about yourself and...

And hook deck.

Yeah, I mean, totally. We first decided to sponsor your podcast a couple of months ago, I want to say, mainly because of your AWS audience. Like we were seeing more and more kind of heavy AWS users of Enbridge, Lambda, and those sorts of folks kind of migrating over to OakDeck. And so it kind of made sense for us to try to like sponsor in that space. But I'm also happy to bring different perspective on, you know, what service looks like with OakDeck, with Cloudflare, with other services that we use. So yeah,

It might be like a bit of a branching out, but hopefully that's a constructive one. A bit about me. I kind of started early as like a product designer. I worked on all sorts of product from B2C to B2B, eventually ended up in e-commerce. We grew this like fairly big subscription business in Montreal.

And as part of that, we built a ton of the original kind of stack to power that because Shopify really wasn't what it is today. And behind that, there was just like so many webhooks, like webhooks from Stripe and Shopify and Intercom and all those platforms. And, you know, webhooks for some businesses, and I think it's more and more the case are kind of like those...

critical integration events that you require to be able to operate your business. And it was just so difficult to be successful with them. And so that's where kind of a deck was born out. And so I'm kind of like this mix of designer and product engineer, technologist, like whatever you want to call it. I find that's actually a pretty great mix to be building and kind of like dev tools, dev infrastructure space. So yeah, it's been great.

Yeah, because in this particular space, there's actually quite a lot of, I guess, business opportunities because you've seen companies like Segment and a few others who are also in this kind of space where you're essentially taking over interesting events from third-party vendors and Shopify and other e-commerce platforms that make it easier for you to just...

to the right application code that's relevant for your business as opposed to the kind of the plumbing. And that's kind of the serverless promise, right? That you get to focus on the business problems as opposed to the underlying infrastructure.

So I guess in that case, are you in a similar space as something like a segment where you're kind of sitting in the middle between your application and, say, third-party events? But do you also kind of overlap with EventBridge? Because you mentioned people are moving to HookDeck from EventBridge, where it's not just about adjusting the third-party events, but also your own business events between different services in, say, like a microservices environment.

Yeah, yeah, totally. So ultimately, I think like we compare ourselves to EventBridge a lot more. We've coined this term that we're still kind of trying to get people to get rallied behind this concept of EventGateway. And really, it's kind of like inspired by what API Gateways has been to bridge kind of the outside world with your own

endpoint business logic and that sort of stuff, but specifically for asynchronous event-driven applications, right? And we think of EventBridge as an event gateway. I mean, it's not that far of a stretch, right? EventBridge, event gateway, it's like more or less synonyms.

But really, we're kind of just trying to say event gateways are this new category, the way of thinking about your integration events that are coming from outside your system, right? So integrations in terms of like Shopify or Stripe or Twilio or GitHub, you name it.

And even in some cases, like we're looking at SDKs, libraries, IoT devices, scanners, like all those sorts of things, right? Like events are coming from outside your network topology, they're coming from outside your cloud and so on.

And then bridging that with your business logic. Because I think in many cases when you're looking at segment or even like, I'm going to go a bit further, like even Zapier and that sort of stuff, like those were tools that were built for like business analytics or business operation type people, right? It wasn't built for engineers. And so like the proxy is interesting and there's definitely a lot in common, but I think like the main difference is that this is built for engineering teams and DevOps teams that need to operate services scale and run,

thousands of function invocations a second and so on, right? And so much closer to EventBridge. One of the big distinction that I draw specifically through our approach with compared to EventBridge is that that gets off as a queue. So it operates as a HTTP push queue instead of a pull queue, which is surprisingly fairly unique. Like you do see others doing that, like GCP PubSub, for instance, as a push mode.

But Apple opens out a ton of flexibility when it comes to serverless, because you don't need workers to be pulling off those queues. And so there's a ton of opportunity in terms of where you can run your business logic. So it doesn't have to be AWS. It can be any cloud. It can be any runtime, but still bring the benefit of serverless to it.

or at least some of the serverless benefit to it. And so that's really how we're thinking about this bridge, this connector between your integration events from all the services and devices and so on to your core business logic.

Okay. Is there anything that you can show us, give us a quick demo of what that looks like? Because I have to admit, I haven't quite seen it myself. I've gone through your website and looked at some of the information that you have on there, as well as some videos that you've published, but not quite seen the product in action myself. So I'm quite interested to hear and learn a little bit more about it myself as well.

Yeah, I know. Totally. Let's dive in. I recognize demos might be a bit more difficult for kind of like audio only listeners. And so I'm going to invite you to follow through the YouTube link that's going to be in the podcast notes. Yes, yes, it will be. Yeah, the show notes. Thanks. And then I'll kind of like try to do my best to give like, you know, more context than I would on just a normal share screen just so that people listening by audio can follow along.

So let me get that going. All right, we're on. So I guess the first place I want to start is Ogg Deck responsibility, kind of like your lifecycle starts much sooner than it would with, say, EventBridge. The reason why I say that is we're really trying to focus on the full software development lifecycle. And so really where your experience with Ogg Deck starts is in development.

So in this specific context of a Shopify demo, I'm just going to be using Shopify as an example here. When you say the full development lifecycle and starting with development, are you talking more specifically about local development experience?

Yes, thank you for clarifying. I'm talking about local development experience. Like one of the common problem when it comes from events are coming from outside your own system is that those are generated by external system, right? And therefore don't have a way of reaching, let's say, like your local environment.

There's tools that have been built in the past that kind of like do this tunneling, but all those tunnel services are built on synchronous assumptions. And so that means you have a single tunnel that's open toward your computer or your VM or whatever. And then there's only a single connection possible at any given point in time because of this kind of like one-to-one relationship.

Um, but that is fundamentally asynchronous. And so what that allows us to do is to kind of like break this assumption, um, and let you have multiple kind of quote unquote tunnels connected at the same time for the same URL. Um, and that's very, very good when you're working in teams, right? Like one thing that you will see often is like in the software development deal teams are integrating with, you know, Shopify or whatever, um, is that they're going to have to either have a ton of different URLs or, um,

they're going to sync up between each other where it's like, oh, this developer is using this one and I'm using this one and I can't connect to this and so on. Right. And so, yes, we start from local development. But when I talk about as DLC, it's the local development process, it's the testing, it's the integration tests, the deployment and production operation.

Okay, yeah, those are really common problems that people run into, certainly on the EventBridge side of things. And I've kind of developed some practices to kind of make local development a bit easier, but it's never going to be quite the same as, say, if you want to try to check your EventBridge rules set up correctly, that you're catching the events and getting them directed to your Lambda function correctly.

There's always some things you can't quite catch easily without deploying to AWS and just trying it out. So I'm really quite curious how you guys are doing that here.

Yeah, no, absolutely. I'm going to get into it, but it's going to be a recurring team, I think, here. And the reason why I'm saying that is because really, like, we just spayed a ton more attention to, like, the end-to-end developer experience, right? And so there's a lot of things that are, like, are doable in either systems. But then it's also, like, a boil down of, like,

How easy is it to set up? How accessible is it? What's the developer experience and that sort of stuff, right? And I think in many ways, like we've paid a lot of attention to that. So I expect that's the type of comment I'm going to get a couple of times through this demo. But yeah, let's have a look. I just kind of like put something simple here. So receiving, in this case, web books from Shopify. So events from Shopify. I'm subscribed to two types of events. The order created and then the product updated.

And then that's targeted to go to two of my local services. So on my local computer, I currently have a product service running and the order service running.

And I've set up filters on each one of those, what we call connections, to be able to tell which events I want to go to which service, right? So in this case, for instance, on the out of stock case, I have a filter on the body property where the variant inventory quantity is lesser or equal to zero, right? And then on the case of the orders, I have a filter on the editor property where the

X Shopify topic starts with orders, right? And so in this case, I'm kind of telling a deck what is the routing logic in between all the events that are coming into my system and the services that it needs to get to.

Now the next step to that is I need to actually connect to this. On my terminal, I'll do a- Sorry, can we go back to the previous screen just quickly? Absolutely. Because in that previous screen here, I'm looking at the Shopify being the event source. I'm guessing you are turning those, those are webhook events.

And then that's why you've got the filters on the header. And then you're able to... So that syntax you got there earlier, the less than or equal to, I guess that is a unique syntax that you guys offer that's specific to CookDuck?

Yeah, that's correct. We have our own filter syntax. It's really meant to be super straightforward. And really, instead of being a JSON schema match evaluator, it's a value evaluator. And there's actually no great standard around that. So we built our own syntax, which is open source, by the way. Okay.

And I like the fact that you can show the last event that's been captured side by side. Oh, yeah, absolutely. And you can test your filters and all that sort of stuff, right? Okay, that's cool.

Okay. All right. So to keep, by the way, I love that you're kind of jumping in that this is how, this is the right way to make a demo in my mind, just like rambling on for half an hour. It's not that interesting to me. So, so we offer the ODEX CLI. The CLI can be started from your terminal. So in this case, I'm already kind of logged in my account and all that, but you can use it without an account account. So we support like kind of what we call guest mode. And so you don't actually need to give us any information about you to be able to use this.

So what I'm going to do is I'm going to run OogDecList in 9001. 9001 is the current port that the services are running on. They're kind of running inside the same service for now, but they could be on different port.

And then if I start this command, it's going to ask me which source I want to be listening to. So I could have, you know, my Shopify, my Github, my Twilio, and I can selectively decide, do I only want to listen to a specific subset of those? Do I want to listen to all of them? So in this case, I'm going to go with Shopify. And then what it's going to tell me is like, okay, great. You're now listening on those two connections, right?

So what we'll do next is we'll add in Shopify. Which one was it? Okay, right here. So I kind of prepared an order in Shopify. I'm kind of going to be buying this real world serverless mug and place the order. And so as I place the order, this is now going to generate a webhook.

In any second now, we can see both requests came in on my local host. We can see there was a 200 HTTP request that was made to localhost:9001 and then to both of my services. In this case, there's /web/orders and then /web/product.

And so that's really kind of like a bare bones, just like showing that amount of flexibility that you have for local development. But we also give you like the full visibility around that. And so you can see all the events are associated with yourself or any other of your team members.

And there's a kind of like full story of those events. You can inspect what the request data is, so the others, the body, and then also their response for that data. And you can, you know, obviously retry it. And so if I were to send this back against my service, you know, I'm developing, I made some kind of mistake, I'm sending it back. And now we can see there's kind of like a third request that's coming in. Right.

Without going into too much details, there's a lot more you can do around development. We have this concept of bookmarks where you can create postman-like collections of all the events that you're expected to receive. You can do integration tasks and just like development tasks against those bookmarks. But I'm going to spare you some of the details or else we're going to be here for a while.

Sure. So in that case, earlier in that first screen, I saw you have two subscriptions already to a local service. And then when you start and then go to CLI, you start in another one to say you want to listen to those events. So are those two different things, the ones that I saw in the console and the one that you started in CLI?

No, those are the same. And so really it's kind of like the relationship of like you can see both the status of it on your console or like in the dashboard itself. And so, yeah, those are the same. So how come those local services was really on the HookDuck dashboard before you started listening in the CLI?

Oh, sorry. I kind of prepped that at the time for the demo. But we could go together and add another one. If we were to add another one, then we would. So you just need to add the connections before you can start listening. Correct.

Okay, gotcha. So if you're familiar to AWS EventBridge or other services, the way to think about it really is that the source, what we call a source in this case, Shopify, is kind of equivalent to a topic when it comes to most queuing system, right? And then the connection itself is the subscription. And then lastly, the destination is the consumer. So in this case, if we're looking at the screen, your topic is Shopify events, right?

We have two subscriptions out of stock, all orders, and those are going to two destinations or consumers, and those consumers are the product service and the order service. Okay. And in this case, do you have to write anything? How do you write the consumer? I guess the consumer code is it the, well, I guess he said, and also what do the events come in look like? Does it follow some kind of like an envelope that you've got so that the events are wrapped with some metadata and stuff like that?

So ultimately, a deck acts as a HTTP proxy. And so that means the data that we get is just using the HTTP definition of those events, right? So the headers, the body, and so on. And we don't wrap it inside an explicit envelope. And that's mainly for compatibility across the board. But we're going to get into this a little bit later. But we have the concept of transformations.

And what some folks do is that they'll format that data into envelope using transformations. And so with those transformations, you can basically run post-processing function on all the data that comes in. And so Shopify obviously is not going to send it in like a specific envelope, but you can wrap that into your own envelope. And what people will also do is that, for instance, if you get from Shopify, from WooCommerce, from BigCommerce or Commerce Slayer and so on, you might also have like standardized envelope around like

an order definition is right and so um you can have different transformation for the different order shader coming in and format it into like a common format for instance

Yeah, because one of the things that people in the DDD space talks about a lot is the idea of anti-corruption layer where you don't want your business domain to be tightly coupled to the contract of, say, Shopify webhook. You want to have some layer in there so that if there was a change, something in the webhook

payload you don't have to rewrite all of your business logic because now some field has changed you want to centralize and basically make that change just a layer at the injection point so that in your event handler you would transform that into a payload that your application understands I guess going to that point I

Ebony City EventBridge doesn't speak HTTP. It, and all the payloads and stuff comes in its own envelope, but it's very much geared towards those exchange of events within your services. So you are speaking more just HTTP as

I guess as a language that you're talking about in terms of the events, everything looks like HTTP requests, got payloads, got headers and so on. In that case, how would you use it for, say, within your... Because you said it's built for developers, it's built for people that are building their own services. Let me say between my services, I don't really want to speak HTTP.

Speaking like talking to another HTTP API and I want to send an event and an order has been placed or something like that. How do I translate that? Do I need to translate that at all? Or does it not matter what I send to hook deck?

Um, yeah, because there's not like a structured envelope, it doesn't really matter. Right. And so in a sense, like the envelope is like self user defined. And so if you want to have an envelope, then you can do it. Um, but it's not like something that's strictly enforced. What we see most often is that, uh,

Well, first of all, like WebEx are commonly used. And even if like specifically in the case of Shopify, if you receive them through EventBridge, like that's all well and good. But the problem is that you're probably also receiving BigCommerce or WooCommerce or whatever. And those don't have direct to EventBridge kind of like enveloped support. Right. And so that means you're going to have to go through it.

API gateway, you're going to have to format it into your own envelope and that sort of stuff. And so what we see is that most people already have HTTP endpoint handler. And so the huge benefit of that is that you don't actually have to change anything about how your system works or any of the assumptions that it makes. Because if you have an endpoint that's going to be receiving Webhooks already, possibly queuing it to SQS or putting it into a Vembridge, uploading that data to S3, that sort of stuff.

that endpoint already exists and already has knowledge of the business logic that it needs to run. And so really, the way that we think about a deck is that it can come in and insert itself in between, bringing you that benefit of an event-based platform, but without having to completely restructure your architecture around the fact that now you're receiving...

you know, events that have like specific envelope and specific protocols and all that sort of stuff. Right. Um, and so like the beauty of HTTP, uh, in this case is that, uh, it's very transparent to like the current operations that you are, you're already doing. Um, and it's very rare that you're going to see a team that like from the get go, the very first implemented, implemented version DDU is with event bridge and all the complexity that comes with it. It tends to be like, Oh, I'm going to spin up a Lambda with HTTP, uh,

that's exposed to HTTP with API gateway. And, you know, for the first like couple thousand webhooks we receive, that's probably going to be fine. And at some point you get some success that you're like, oh crap, like I need to rethink this. Right. And so that, that's what we tend to see a lot more. And that's also like the adoption path that I think we're trying to like very much like simplify. Yeah.

Okay. I mean, I guess the difference there is that you're looking at this very much with an e-commerce lens because obviously in that space, Shopify and I guess many of the other vendors in that space are going to be using webhooks to talk to your systems.

I guess from where I'm sort of coming from, where it's kind of the enterprise environments I'm working in, there's zero webhook. There's literally no webhook. It's all system to system within your, I don't know, a bank or what have you and everything. And so that's where for me, looking at this, looking at your events and the events looking like

HTTP messages fuse out in that regard. Where I'm coming from, much more closer to the SOA world, in terms of event messages and envelopes and some of the more enterprise integration patterns you saw before.

less this kind of webhook-centric, I guess, kind of view. I guess that's where, for me, it looks kind of weird that, okay, these are just HTTP messages in JSON. I know you guys don't enforce any, well, you don't enforce this particular format, so I can just come up with my own messaging. But at first glance, it's kind of weird that everything kind of speaks HTTP, even though it's all building events and stuff. That's where, I guess, the disconnect is coming from for me.

Yeah, no, I mean, and that's really reasonable. I think that comes back to the type of environment that you're building in, right? Because for a lot of software development teams nowadays, webhooks are all the rage because really building software is integrating software for a big subset of people. And so whether or not that's built on Twilio or Salesforce is another really big one, right?

what you tend or at least what we tend to see is that there's a lot of adoption around those integrations and third parties and offloading a lot of the system complexity that you're building. And then as part of that, the

need to integrate with them and integrate with their events becomes really important. Another system, for instance, that is usually popular on the deck is Stripe and all payment platforms. So Paddle, Checkout.com, all those sorts of providers. And so really, I think more and more, especially in this

companies and startups that have been built in, let's call it like last five, 10 years and so on. There's a strong bias toward buying systems or using third party systems instead of building your own. And then really what you're adding on to that is what's unique for your business, right? What's unique for your business logic or for the product or the service that you do. And when you look at those type of companies, WebEx are everywhere.

And so the assumptions around HTTP that comes from Webhook is something that's very widely spread. And it's kind of hard to find a platform now that doesn't support some form of outbound HTTP Webhook.

And so really, this bridge is meant to speak that language and be as simple as possible for those sorts of use cases where really those events are not events that you generated yourself. They're not domain generated events. They're not events that you're responsible for. They're events that you're... Ooh, my computer's going to sleep. They're events that you're...

almost victim of, right. In some contexts, right. Cause we've added like ton of cases where, uh, for instance, like Shopify will, uh, you know, send a million, 2 million events to you over a very short time span and, uh, people being unable to process those and all that sort of stuff. Right. And so really, uh, the thing that's very interesting about this kind of particularity is that ultimately, uh,

This is a messaging system where you have no control over the producer, right? The producer is going to send whatever it wants to you in whatever shape it wants at whatever speed it wants. And so that's part of kind of like the challenge that we're trying to tackle as well, right?

Sure. I think that's the same with EventBridge as well. Also, EventBridge has got its own third-party event data port. They basically do the ingestion of the webhooks for you, and then they convert them to a format that's not

That doesn't look like HTTP, whereas you guys kind of do the opposite. And I guess one thing I also want to find out about as well in terms of building asynchronous processes, one of the key things is about the handling errors and retries because everything is failing behind the scenes asynchronously. So in terms of detection, in terms of the built-in retries, what do you...

What do you do? Do you do something similar to, I guess, event bridge in the sense that you get some built-in retries and exponential backoff and stuff like that? Yeah, absolutely. So let me maybe skip to what I'd prepare from a production standard view. And I'm going to get into all those details because right here, I actually have two environments set up where I have my development environment and my production environment.

And in the production environment, those same services that we were looking at earlier, right? Like, so the product service and the order service, in this case, is instead targeting Lambda function, right? And so earlier, we're looking at local hosts, kind of like through HTTP server locally. But in this context, we're targeting those Lambda functions. Yeah, that's sharing screen, by the way. Oh, really? Okay. Yeah.

Sorry, I think my computer went to sleep and then the screen share died in the process. So coming back to it. Okay. So now we're kind of like in a similar screen as earlier, but I'm in the production project instead of the development one.

- Okay, right. - And those services here we can see are targeting Lambda URLs instead of specific localhost and so on that we were looking at earlier. - So in this case, you're expecting your Lambda functions to use a function URL so that you've got, okay, right.

Although we're billing support for directly triggering Lambda functions, and that's going to be coming up in about a month now. We're also billing kind of like built-in support for S3, SQS, and a bunch of other services like that. And not just in the AWS environment, right? Like same is true for other cloud providers.

But yes, for all the same services, this time targeting those Lambda URLs. And then if we have a quick look at my Lambda function here, so just kind of like for the purpose of the demo, you have this function that returns the HTTP 500 status code, right?

And so to get to your point about errors, if we look at my events here. So earlier when we did this order together in Shopify, we actually generated events on boat reproduction in a development environment because I had a Shopify target boat.

And then what we can see is I have like some HTTP 500 errors. And then I also have a next attempt at. And so really what the system is telling me is that there's going to be a retry that's going to be happening at a specific like scheduled time. And that's based on the retry logics that you can configure in the deck. So we support exponential back off, near back off. We also support like custom,

retry schedules that you can define through the retry after HTTP other from the responsive server. And so really you have like kind of full control over when retries happen in the scheduling around those retries.

And that's for everything that's automated. But you can also do this, do the same kind of like manually. So for instance, I could filter on all my failed events, in which case I have like five of them. And then I can either manually retry them, but I could also bulk retry based on conditions, right? And so I could say, retry all events that match specific conditions. And I can go beyond that, right? I could say that I have like specific data in the body. Like, so for instance, like maybe a customer ID, right? Sorry, sorry.

Yeah, so it could be like a customer ID, for instance, where it's one through three or whatever. And then you... Oops, customer ID. Yeah.

And then I actually checked what the customer ID was. So this is not going to work right now. But the point is like, you can have like those kind of like complex conditions and do bulk retries towards that, right? So instead of like kind of building a arbitrary archive of everything that exists, the idea is that the archive is kind of like persistent really. And at any given point in time, you can filter it, segment it, kind of slice and dice it the way that it makes sense. And then you can retry

those like specific segment of events that you define, right? And so that gives you a lot of flexibility around this. Actually, there's nothing with customer ID on one, two, three, so this is not going to work. So that's very much kind of like one part of the story. But I think the other part of the story is the obvious ability. So if you have a look here at issues, we have this concept of issues, which really for us is a way of kind of replacing dead letter Qs.

And so issues will group together events that fail for similar conditions, right? So in this case, you can see we have HTTP 500 on out of stock and then on all orders. And if I were to drill through on one of those, I can see like a timeline of when those error happened. What is the response from my server? And this case is just returning a little world. What is the data, like sample data that's associated with those failure conditions? And then as I'm resolving the problem, maybe like a service went down, I brought it back up, right?

and so on, I can then like submit a bulk retry associated with that issue, right? So we can see like, okay, I wanna retry all events between those dates where it should be 500 on this connection and blah, blah, blah, blah. And so you can, instead of having kind of like dead litter queues where everything that fails ends up kind of like bunched up together where you can't really tell apart what's what,

We give you very specific breakdown and drill through abilities on the specific things that failed and also a resolution step for that. So that's something I'm really, I think, proud of because as someone that works with queues all the time, dethered queues are basically the bane of my existence. And so that's our approach to it.

Yeah, dealing with death accusers and also the whole process of investigating them and then replaying them is quite a process. It's not very nice. And what you've shown me just now, being able to see all the individual failures and be able to replay from there, but also being able to see them already calculated

categorized and grouped together by error type and then be able to reprocess them as a batch. This is super, super nice from the operational point of view.

Yeah, no, I absolutely. And we have like a ton of different issue types, uh, for different type of filly conditions. But, uh, we have like triggers for, um, I guess perhaps like one of the most interesting one is for back pressure as well. And so if you accumulate some amount of back pressure when you're processing, uh, we'll, uh,

alert you about them and then also like give you like the specific metrics around like what's causing back pressure. So is it your response time? What's your concurrency? And then give you a resolution step as in like increase your concurrency and that sort of stuff. Right. And so issues have been like a super powerful system and it's, it's definitely making like the operational process of managing kind of all those inbound events, like so much simpler. Yeah.

You mentioned the observability and I see there's a metrics there. So I guess that's one of the things that you can see in terms of just how many messages are received, sent, failed, and maybe break down by path perhaps?

Yeah, actually, let's have a look here at all my delivery attempts. And so you can see here, those are all the events that I got on my specific consumers or destinations, right? Delivery rate, there's nothing pending in this moment. Response latency, so the time that I took, so it looks like we're looking at about 400, 300 milliseconds right now.

What is that response latency? Is that end-to-end in terms of the time between you receiving the event and when you call the destination? So we do have metrics about end-to-end. This one specifically is like the Lambda response time. And so that would measure the time that it's leaving a deck server until the time that we got a response from the Lambda function. Right, gotcha. Okay.

Yeah. And this is relevant because you can configure delivery rate basically based on parallel, so per second or concurrent. And so for instance, if you have a five per second concurrency on your destination consumption, then obviously like the speed at which you deliver those is now impacted by the responsibility of your server. And so being able to do this relationship of like, oh yeah, okay, I'm accumulating, I'm accumulating back pressure because my server is now slower and therefore not consuming as fast.

is pretty clear to look at kind of like through those metrics, right? Right. Yeah, this is also quite nice because probably I guess asked about 50% of the time how to maximize concurrency and throughput. The other 50% is how do I restrict my concurrency for my Lambda because Lambda is super scalable, but my downstream system is not. It's running some legacy database

Yeah.

and 5 concurrent requests or 5 requests per second, which are slightly different depending on how things go. So this is quite useful.

Yeah, I know. And I think that's something that's interesting with the kind of like standard model right now is that it's a destination or the consumer responsibility to control its concurrency. But there's kind of like two core problems that make that difficult. The first one is that if you're scaling your number of consumers, then inevitably you're also scaling your concurrency and your aggregated concurrency across all your consumers is very difficult to control.

In the case of Lambda, AWS specifically gives you the provision concurrency concept. But really what it's just doing, it's telling AWS, "Okay, here's the amount of Lambda you want to execute in parallel." But if you're using something like Fargate or running Kubernetes containers or what have you,

your aggregate concurrency is very difficult because now you need to be able to scale the number of workers. But for each new worker that you add, you're also increasing the concurrency. And so we're flipping it a little bit. And the other thing that's nice is that regardless of what's the underlying method that's used to run the service,

So right now, like a lot of the conversation we've had has been like Lambda specific, but really the reality is that we're seeing all sorts of runtime sitting as kind of like destinations, right? And by having a single model for this is how you control concurrency, instead of each of the individual runtime, having their own different ways of concluding concurrency really simplifies kind of like the mental model for it.

Because if you're using OCDEC, it's like regardless of what's processing it, I can control my concurrency in OCDEC itself. And it's the same way of controlling it regardless of the runtime. Yeah. And also with Lambda, they've got two different things that they have the reserve concurrency, which I think is what you were actually talking about when you said the provision concurrency. They also have provision concurrency, which is a different thing. Yeah.

This is kind of confusing. Provision concurrency is more on the pricing side of things, really, right? It's about keeping certain instances of your Lambda function warm so you don't see cold starts. Oh, for the cold start, right. Yeah, it's more for the cold start than for limiting concurrency. And reserve concurrency really is the maximum concurrency, but you can't call it reserve because you're reserving some unit of your currency.

maximum vision wide concurrency for this one function. So it's like, again, that confusion is an amazing one. And that's Lambda. Plus it's also, like you said, EC2, there's Fargate, ECS, what have you, all with different mechanisms for controlling concurrency. So self-dealing with all of that. And plus, as you start to add more and more

And the equation targets, different targets can have its own model for managing concurrency. So it's just much easier to have one model for managing concurrency and delivery rate. Yeah, absolutely. And like when we're looking at a demo and stuff, you know, like obviously it's like a simplified version of the problem. But the reality is like when we look at a lot of our customers, they're going to have 20, 50 like different destinations or consumers. And those are not necessarily going to be all running on the same platform.

uh, underlying like runtime or hardware or anything like that. Right. Um, and so it's easy to say like, Oh, here's how you solve the problem with Lambda. It's got, okay, all well and good. But now as an engineer working that specific company, I now to like figure out how to solve that problem in like five different places instead of a single place. Right. Um, and it's very rare that you see it where it's like all neat and pretty. Like people have accumulated a technical debt, their services that were built years ago and all that sort of stuff. Right. Um, and so, uh,

I think that greatly simplifies things. So in our production project here, I also added another S3 destination just because I wanted to share, for instance, what that might look like if you're actually... It's not a one-to-one relationship of, "Oh, you get your van, it goes to some consumer," and it's very simple. So in this case, I'm actually setting it to a S3 bucket as my destination.

And the way that a deck will work is that it will kind of like fan those events out, right? It's going to basically create copies of them. So as it comes in from Shopify, we're going to get all data sent to S3. But then for those that are also out of stock or are orders, we're then going to get a copy of those going to those specific connections.

And so in the S3 one, I have a transformation here, which is what I was referring to earlier about. And that transformation is going to basically remove some PII. So I'm like trimming user emails and so on from the body. And then I'm setting a file name for S3. So I'm kind of using the Shopify web book ID in this case as the file name. And what that does now is that for every event that we get, if we add over to S3 bucket,

now I'm getting a JSON file that I can keep, for instance, for my archive or potentially trigger other events in the AWS ecosystem after that from there. And right now, all of this is very AWS-centric, but again, often people then send it to Segment and Datadog and blah, blah, blah, you name it. So it's rarely just kind of like a cookie cutter, everything going to one place. Okay, so question about that transformation. Is that...

So I guess you are hosting that code, so you are running that transformation code. So I guess is this just JavaScript or is this some subset of JavaScript that you can use? Because I imagine it's running in some secure environment that you don't want people to start running stuff on your machine, doing stuff that they're not supposed to.

Yeah, yeah, totally. So those are running in what we call like isolated VMs that we spawn for each one of the transformation function and they're fully isolated like execution context from one to another. One thing that's interesting about transformations right now, they're just JavaScript, but we're actually looking at like WebAssembly based solutions to support multiple languages.

But those functions are pure functions. And so what that means is that you can't really do IO. And so there's no like, you can do HTTP requests or read files or like that sort of stuff. They're really kind of like, that's why they're called transformations really. But the flip side of them is that they're highly efficient. They're meant to not basically be able to fail, although there's still some logic around error handling.

But the core idea for those is basically they're lightweight, they're free. We do not charge for those executions. Right. And it lets you format that like data, that HTTP data that, you know, you weren't so much of a fan of earlier into like those envelope or that format or standardized and, you know,

any sorts of logic on that data. And so we have like, for instance, people they'll get webhooks from Wix, for instance, which are JWT encoded, and then they'll use transformation to decode the JWT into raw JSON, right?

Right. Or the data might come in as JZip and you need to unzip it, like decompress it. And like there's a ton of different use cases and really like the breadth and depth of what people need to do with them is basically why we kind of decided to go the route of let's just let user run arbitrary code to be able to like format the data into what they need.

Okay, sure. That makes sense. And I saw this, it's like no output yet. Does that mean that you can actually run this transmission function testing in the console here?

Yeah, absolutely. So if we were to, for instance, console.log and then run this, we're basically going to get log output. We can see the diff in between the data that we got and the data that we wrote. So in this case, the path got overridden and things like that. Yeah.

Okay, cool. That's really, this is actually very nice. You've got some really nice touches that are, okay, things that are small, but okay, I wish like, you know, EventBridge has got some of these things, which would make life so much easier.

Yeah, yeah, no, absolutely. And it's really interesting kind of like coming back to your comment around the HTTP envelope. And I think that for some people, like the way that we're building it is such a no brainer because they actually don't have that much exposure to EventBridge and kind of like more classic like

event-driven architecture development inside like big enterprise with like big internal systems and that sort of stuff, right? And like really the way that they think about events or I've been like calling Webhooks the gateway drug to event-driven architecture for some years. I think for

a lot of developers are you know building for instance like a shopify app and they hit their stride and it becomes like super popular um it's their first exposure to like okay how am i going to deal with you know 500 1000 events per second coming from shopify um and

And from folks coming from this kind of background, the HTTP envelope makes a ton of sense because that's already how they're thinking about it. But it's interesting to hear your perspective about that. And I think ultimately, it's probably one of the bigger distinction beyond the developer experience and all of it. And I'm interested to see how over time we can bridge that gap and how we can maybe support both mental models together.

Yeah, I think you already support both because you're not forcing the HTTP syntax, you're just reflecting them as they come in. You don't force anything, so I can still use the kind of format that I'm used to in terms of sending, like I said, more traditional, event-driven architecture style messages as opposed to more webhook style HTTP messages.

So I think you can really support that. And also, I think that difference is probably more cosmetic in the grand scheme of things. It's the thing that jumps out at first, but probably not the most important thing. I think things like what you showed me earlier in terms of handling their failures and replaying them, those have far more operational value than anything else.

What about in terms of other things that, say, EventBridge often offers, things like schema discovery, stuff like that? Yeah. Are you familiar with Dave Boyne event catalog? Yeah.

So actually, I think that makes a ton of sense to say that other services are going to do this very well. And so Phil from a team recently built a exporter for a deck to Event Catalog. And I think that's the type of way that we think about it, really, is there's going to be product that are going to be very good at doing those kind of schema documentation, event documentation, that sort of stuff.

And so let's integrate with them. That's not to say that like longer term, we might have like schema registry type product and so on like EventBridge. But right now, right now, we kind of like differ to integrations with other tools.

Okay, that makes sense. Yeah, I guess I just spoke with David on this podcast. So by the time you listen to this, that episode is probably out already. We'll show a little bit more about the event catalog. Yeah, it's awesome. Yeah, he's done some really interesting stuff in that space as well. Again, a very, very common problem people run into. And so, yeah, interesting to hear your thoughts about that in terms of integrating as opposed to trying to implement this feature yourself.

Okay, so I think that's quite a nice overview of HookDeck. Let's talk a little bit more about how it's implemented under the hood because you talked about the fact that it's a queue and you mentioned previously that you're running on Cloudflare workers for ingestion and then Google pops up. Is that right?

Yeah, partially. There's actually a mix of Kafka plus Redis, you name it. We're using the right tool for the job and depending on the different context. But yeah, for the gist of it, our architecture is segmented into two components that are completely independent from each other. So we have the ingestion, which is, think of it as kind of like the API layer. And that API layer is where all the events are produced and

and consumed, right? And then the second part is what we call our event lifecycle, or really it's like the delivery queue. And so those are like the subsystems that are going to determine like which event need to go where, execute transformation, do delivery, handle the retries, the retry logic, the operations for bulk retrying, that sort of stuff, right? And so that first part is kind of like fully serverless. That second part is

Partially serverless, it's mostly Kubernetes type clusters that are working on the actual operation side of things. But then we do use PubSub, Kafka, Redis as kind of like queuing for various different situation. So yeah, those are kind of like two distinct part that are both using serverless in their own way.

Okay, so in this case, you've got your Cloudflare workers running on the edge, doing the ingestion, and then they get sent into Google Cloud, into something. And again, I guess on the way out to the target, you are using a combination of PubSub and a few other things to handle the various constraints like the concurrency control, retries, and things like that.

Actually, I forgot to ask as well. In terms of, for example, one of the common things that you see with event-driven architecture is that it's not just about one event coming in and then goes to a target, but certainly that's more so, I guess, having quoted the classic event-driven architecture, is the fact that the event consumers quickly turn into publishers and then you have these

chain of events happening, how can HookDeck help us visualize that? Because typically that's something that you do on your side with observability tooling, you instrument your code and use other tools to visualize that. But is there something in HookDeck that can potentially help

in terms of discovering who your subscribers are and what your topology looks like when it comes to your event architecture?

Yeah, totally. Getting back into that topic, I think what's nice about the idea of being able to declare your producers and your consumers is that it also lets you map it out because in the AWS context, your consumers are self-declared. As a consumer, you want to listen to a specific queue or react to a specific event.

But there's a decoupling because of the consumer declaring it. While in Nugdec, it's the other way around where you declare where your consumers are, who they are, what endpoint they are, right? And so when you look at this kind of like dashboard in Nugdec and you look at those graphs that are kind of made out for you, right? Like those kind of node network that we were just looking at earlier, it makes it super clear what's what, right?

And then we also have this concept of callback. And so coming back to consumers turning into publishers and so on, the concept of callback is basically this idea that in response to an event,

can generate a new event, right? And then also go through its own life cycle. And so if your consumer returns, you know, specific data, you can then say, okay, for that specific data, I want that to now become a new event. And that new event can now go to a new event

a new destination, a new consumer and so on. And so you have this relationship where if you have, for instance, an event that essentially turns into an action, that action can now be the result of the original integration event or whatever that got to your endpoint. And so that's how we're thinking about it. I think there's a lot more that we can do in that front. It's still early in the sense of how native the support for that sort of stuff is.

But there's already a ton of people using it and making the best of it. Okay, that's very interesting. I guess that's all the questions I can think of. Anything else that we haven't talked about that you'd like to mention before we go? Well, I think specifically when it comes to, you know, we talked a lot about the AWS environment and all that.

One thing that I kind of mentioned earlier is that we're building direct integrations with a lot of those services in such a way that you don't necessarily have to go through HTTP, right? And so if as part of the audience, that's something that you're working on or there's like compatibility that you'd like to see and so on,

we're kind of like talking with a lot of users in those environments to understand like what the needs are, what the different like integration points that we can do that are the most useful. And so I'd definitely love to hear you on that. And if you have any thoughts on where you would see that kind of fit in.

But yeah, no, I mean, that covers it. Like always happy to get nerdy about the whole Cloudflare and GCP stuff and all that. But I'm going to leave that up to you.

Actually, that brings a good point about the integration with, say, for example, Lambda right now using function UAL. What's the security model? I guess, do I have to leave my function UAL public for you to call it, or can I use IAM roles for that?

Yeah, you can use IAM role through service accounts. And so you provide a deck basically with a service account to use, right? That you can then grant the permissions to the individual functions or buckets or whatever resources in the AWS environment.

Okay, so are you saying that I create an IAM role and then I let you assume that role? So that's something that I need to know your... How are you going to assume that role? Are you going to assume that role from your own AWS account or from... It's through access tokens.

And so, yeah, you would generate like access token that we can assume that rolled through. Yeah. Okay. Okay. Right. I see. Okay. All right. Thank you very much. Yeah. So if you, I guess if anyone's got any questions, where do they go? Do they want to get started? Do they just go to hookdeck.com and then just try to try it out for themselves?

Yeah, I mean, at SelfServe, we have a generous free plan. You can build a lot with it. And you can also reach out to me directly, either Alex Bouchard on Twitter, blue sky now since yesterday, or really reach out on our Slack developer community. And we'll be happy to jump in and have that conversation with you.

Okay, sounds good. And your pricing is based on usage levels, number of events that you're sending and... Okay, all right, gotcha. Yeah, very much kind of like usage-based pricing that you would expect from most cloud infrastructure products.

Okay, sounds good. All right. And yeah, I'll put those links in the description below so that anyone who's interested in checking out HookDeck, they can go and try it out for themselves. And yeah, thank you so much, Alex, for coming on the show and showing us what you've been building. And I have to say, some of the things that people... Some of the things that I'm most impressed by from what I've seen are things that, you know,

are the small cuts that gets you on an everyday basis, things like how to handle errors, how to replay a failed event, and how to test your transformation logic. But I think probably the one that's probably the most interesting, probably from the development point of view, is just the local development, being able to easily replay an event against my local environment.

But again, for me, I'm always responsible for running the thing that I'm building as opposed to just writing the code. So for me, dealing with errors and delta-q side of things, that for me was the highlight. It's super, super interesting.

Listen, I've yet to meet someone that is not getting their gear grinded by all of that. So I totally get it. I appreciate the chance to come on and share more about it. And thanks for jumping in and challenging some of the ideas and all that sort of stuff. I think that's how we're going to make a better product. So I appreciate it.

No worries. Nice talking to you and everyone else. I guess I'll see you guys next time. Take care now. Okay, bye-bye. Sweet. Take care, yeah. So that's it for another episode of Real World Serverless. To access the show notes, please go to realworldserverless.com. If you want to learn how to build production-ready serverless applications, please check out my upcoming courses at productionreadyserverless.com. And I'll see you guys next time.

#112: Better Developer Experience for Event-Driven Architectures | ft. Alex Bouchard, co-founder of Hookdeck 59:18 Share

Real World Serverless with theburningmonk

Deep Dive

Shownotes Transcript

#112: Better Developer Experience for Event-Driven Architectures | ft. Alex Bouchard, co-founder of Hookdeck