We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

CockroachDB with Jordan Lewis

2024/1/4

Cloud Engineering Archives - Software Engineering Daily

AI Deep Dive AI Chapters Transcript

People

Jordan Lewis

Topics

Jordan Lewis: CockroachDB的特别之处在于其架构，它允许我们将传统的SQL分布在同构节点集群上，从而实现水平扩展。CockroachDB的工作原理是将数据分割成范围，并自动进行分片计算，对用户隐藏分片细节，从而简化SQL的使用。CockroachDB允许用户像往常一样使用SQL，数据库管理数据分片，无需暴露复杂性。我两种说法都会用，但我通常说SQL，这主要是受公司创始人的影响。

Deep Dive

Shownotes Transcript

Translations:

中文

SQL databases were built for data consistency and vertical scalability. They did this very well for the long era of monolithic applications running in dedicated, single-server environments. However, their design presented a problem when the paradigm changed to distributed applications in the cloud. The shift eventually ushered in the rise of distributed SQL databases. One of the most prominent is CockroachDB, which uses a distributed architecture inspired by Google Spanner.

But what were the engineering approaches that made this architecture possible? Jordan Lewis is a Senior Director of Engineering at CockroachDB Cloud. He joins the show to talk about the design of CockroachDB and how it works under the hood. This episode is hosted by Lee Atchison. Lee Atchison is a software architect.

author and thought leader on cloud computing and application modernization. His best-selling book, Architecting for Scale, is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments.

Lee is the host of his podcast, Modern Digital Business, produced for people looking to build and grow their digital business. Listen at mdb.fm. Follow Lee at softwarearchitectureinsights.com and see all his content at leeatcheson.com.

Jordan Lewis is a senior director of engineering at CockroachDB Cloud, and he's my guest today. Jordan, welcome to Software Engineering Daily. Hi, it's nice to be here on the show with you. Thank you for being here. So what is it that makes CockroachDB special? Why are you able to solve the scalable and distributed SQL problem?

you know, without simply ignoring SQL. What makes CockroachDB special is the architecture that we've chosen for the database allows us to take familiar, regular, old, traditional SQL, or SQL as you like to say it, and distribute it across a cluster of

homogeneous nodes. So we can scale your database horizontally in a way that is more familiar for someone who's maybe used to using DynamoDB than, you know, an Oracle or a Postgres. How does it work under the hood? The idea is it's really quite simple conceptually is that we take your data, we split it up into ranges. So

chunks of contiguous data. And just like you might be used to if you're used to using a sharded SQL system where you're manually deciding where those shards live inside of your data, CockroachDB is doing that shard computation for you automatically under the hood. And

Abstracting away the fact that you might have to separate your data up into shards. It allows you to use SQL just like you're used to. The database manages deciding when to ask a particular shard for a bit of data or a particular range for a bit of data, like we say in Cockroach. And it decides when you're going to write to a particular range of data in Cockroach without having to expose that complexity to the user. That's the fundamental idea.

We're going to get deep into how it works as we go along. And that's great. I want to want to know more, but first of all, a tomato and tomato, I

I always get this debate, too. I'm sure you do all the time. Is it SQL or SQL? You say SQL mostly. I say it both ways, but I think I usually do SQL. But it's amazing what we can debate about sometimes. Exactly, exactly. I think with that one, I'm just copying what our founders do. I joined the company back in 2016, and everyone was saying SQL then, so it kind of stuck in my head. That's what I say now. Makes perfect sense, yeah.

So, and as you know, and full disclosure to everybody who's listening, I focus personally on

Architecting for Scalability. And I wrote the book, Architecting for Scale by O'Reilly Media. And in fact, CockroachDB is a sponsor of that book. So we've, I know some of the things that Cockroach goes through and you're aware of some of the things I talk about. So our interests are very well aligned. We both focus on scalable architectures. But let's get into some of the more salient details of how this works. You know, so

Many people have tried, or there's lots of different approaches, let's put it this way, to try to scale SQL databases. But most of them involve some form of limitations you put on the way you access the data. You might be talking about requiring single writer, multiple readers as a way of scaling your read-only part of the data solution. There are other solutions as well, too, but...

Your approach, all nodes can service both reads and writes simultaneously at a scalable solution. So, you know, how do you synchronize your writes, for instance, to make sure that you scale, yet you maintain the consistency that SQL demands?

It's a great question. Ultimately, it comes back to this decision in the architecture to use ranges, as we call them. So shards or ranges, it's the same idea. Let's pretend you're a shopping cart application, which is a really good and classic use case for a system like CockroachDB. You need that consistency.

Because, you know, you've got two people trying to check out the same item at the same time. It's a classic interview question, probably. And you only want to make sure that you're selling one of them. You don't want to commit a transaction that says, okay, user A got to buy the widget and user B also got to buy the widget, but there's only one widget.

Ultimately, it's key that you can satisfy these consistency requirements at the same time as allowing that shopping cart application to scale to the millions of users and thousands or millions of whatever transactions per second that your application is trying to serve. So how does it work? Ultimately, if you kind of think about what you might have to do as a database, you have two users touching the same shopping cart data, you're going to have to think about as the database about,

How are you going to resolve that conflict? Whereas maybe you have two users who are touching two completely different pieces of data, different items in that shopping cart or in your store. You won't have to worry about having those two transactions talk to each other. So what does CockroachDB do or generally horizontally scalable systems do to allow the system to only worry about those conflicts? It's about taking the data and

making sure that the potentially conflicting bits are close to each other. So in CockroachDB's case, it means that you've chosen your schema in a way that might say, you know, shopping cart item IDs are, you know, in a table. Your primary key is organized by item ID, something like this. If two people are touching that same item ID, it's going to be processed ultimately inside of this machine, inside of the system by the same machine.

Once you're inside of that same machine world, the way CockroachDB does consistency is it uses Raft. So Raft is the ultimate transactional consistency algorithm. That's your consensus protocol that's used to adjudicate between replicas of the same range. And so if you're touching that same item in the shopping cart at the same time, two people at the same time, Raft is going to deal with adjudicating those conflicting transactions. In other words, you use...

traditional sharding. Now it's automated sharding, so it's invisible to the customer. But you use traditional sharding to limit the scope significantly.

So that it's rational for both rights to go to the same machine and have it dealt with simultaneously. Exactly. Yeah, that's great. That works well, you know, probably safe to say that works well for about 99% of the problems you run into. And 99% of all synchronization issues can probably be resolved with things like that. But there are a few things that can't be, you know, and, you know, if you look at,

full transactional awareness with what SQL allows. You can't localize data like that and be able to get full transactional awareness.

So how do you handle that? Do you allow full SQL transactions? And if so, what do you do to make sure that, you know, a full transaction that's now touching, you know, 50 pieces of data on 40 different machines along with one piece of data that's being touched by another transaction on another 50 machines by someone else and how all that coordinates together? Yeah.

So I love the way that you took the micro and then zoomed out to the macro. The micro is raft consensus, exactly. And then the macro is sort of a traditional two-phase commit sort of transaction protocol. So we do support any kind of transaction that you can throw at SQL is supported inside of CockroachDB. And if ultimately that transaction does need to touch more than one consensus group or range, then the system notices that, inserts...

two-phase commit style transaction records in the places that they need to be and orchestrates the transaction across as many consensus groups are required to make the transaction go through. You write the records, the journal records necessary in order to do the two-phase commit, and then you ultimately have an event that's broadcast that does the actual commit. And that's

That's the only thing that has to be transactional in nature at the lower level than is that two-phase commit signal that says commit. Yeah, exactly. We kind of take those transaction records or journal events, whatever you want to call them. We call them transaction records. And those records are really just data items like any other data item, according to CockroachDB. So you're inserting this transaction record. And to CockroachDB, it could be just another user record.

But it's got a special bit that allows the system to know, you know, this is the thing that I need to go to to resolve the two-phase commit or check to see if it's been committed. Or there's all sorts of fun edge cases. You know, what if a writer goes away or the machine dies? You know, how does this stuff all get cleaned up? There's a protocol. I think this is actually one cool thing that we've done is this particular part of the system. I believe we've verified with one of the, forgetting the name of it right now, but one of the kind of protocol verification languages. Yes, protocol layers, yeah. Exactly. So that's a cool thing.

thing. Cool, cool. So you're fully ACID compliant, is that correct? Yeah, yeah, exactly. At a macro level. At a macro level. And we do support kind of a serializable level of isolation, which is something that we're pretty proud of. We think that for most developers, it's actually an easier idea, an easier concept to think about making all the transactions serializable instead of a lower level of isolation, like read committed, for example. So I

Actually, for many years, that was the only isolation level that CockroachDB supported. Unusual choice, I would say. We kind of, in the early days, thought we're going to revolutionize databases by making sure everybody is using the serializable best consistency. Developers are going to have an easier time.

As it turns out, I think the market made us realize that there is demand for having a looser level of isolation as well. So we are introducing read committed in an upcoming release that we're excited about that as well. Cool, cool, good. I'm anxious to see how that works and how that all fits together. Because it's, I remember I was working at AWS and I didn't

Don't remember the guy's name nor what I say his name, but he's a well-renowned database expert, traditional database expert that happened to be working at AWS. He'd worked at Microsoft and other places as well. And he said, scalability and SQL do not go together, period. You cannot solve the scalability problem with an SQL compliant database. And what he meant by SQL compliant database is a full asset, you know, everything that goes with that.

I never believed them. And so I'd love to see companies like Cockroach prove that this actually works. So are there compromises that you do make to SQL or to asset compliance? Are there, asset compliance, that's not the best example, but are there compromises you make? Maybe it's slower than a normal SQL database would be in certain circumstances or whatever, but are there compromises you have to make in order to make this actually work in a distributed fashion?

There's a few kinds of compromises that we've had to make. Certainly, you mentioned latency. It's hard for a distributed database of any nature to compete with a single node database latency. Just doing the extra network hops, doing the consistency round trips, that's going to introduce an unavoidable amount of extra latency for sure. I think another little

It's been an interesting one for us to kind of navigate has been related to schema changes. So one thing that a lot of the old school SQL databases do offer you along with their ordinary ACID compliance for data transactions is they also offer fully transactional schema changes.

And that's something that's very tough for us to offer because we also want to offer these kind of online schema changes, which is something that our customers really like. If you're operating on large scale data, you have an online shopping cart or a gaming application or a financial transaction log that's happening in real time 24-7, you don't want your system to be interrupted by a schema change. You don't want to have to take your application down. Obviously, you can't do that.

Over the years, many people have come up with workarounds for this, for different systems. We've decided to build that online schema change capability directly into the schema change system, which is great. The trouble is that if you want that system to also be fully transactional, in other words, you want to be able to create an index at the same time as editing a row or add a column and edit an enum all at the same time.

we don't have the ability to roll that back atomically right now. That's a limitation. Our customers have made that known to us that that can be trouble. But

Maybe we'll find a way to solve that limitation one day, but for now, that's kind of one of the limitations that does come to mind for me. It's a good one to bring up because what I like about it is, you're right, it truly is a limitation from standard stock SQL, and it is something that is valuable to have. On the other hand, it's also, I might get myself in trouble here with some of my customers or some of my clients, but it's bad protocol to require that limitation.

in a production system. I mean, you should design your schema changes in such a way that the impact is a micro step-by-step change that

where you don't have to roll back transactionally 10 levels of schema changes and first sign of trouble. You can design your schema changes. It's possible to do it in almost all cases so that you can change your database small scale at a time, non-transactionally without worrying about a downtime. It's possible in most cases, but I'll get in trouble with some people for that one, I know. But what's your thoughts on that?

Exactly. I think you're so right. You know, there's a lot of best practices in the SQL world that you would really love if all of your customers followed, I guess is the way that I would put it. One thing that I've learned throughout my time at Cockroach Labs, my evolution as a technologist, I guess you could say,

For a long time, I really thought that we could convince customers to do it right, so to speak. Maybe that's using the right type of schema changes, always using serializable, never making foreign keys that didn't make sense. Over time, I realized that there's so many reasons

It's almost a Chesterton's Fence kind of situation. There's so many reasons why people are using SQL in a way that you didn't think of. Maybe this is actually what your friend at AWS was saying about, you know, you can never make SQL scale because there is so much variety in the different ways that people use SQL. There's all valid reasons for this. You may not understand them all yet.

But people use SQL in really incredibly creative ways. And I think as a service provider, you know, you'll be limiting yourself and you won't be doing your job to solve those customer problems if you end up thinking, ah, if only they could stop using stored procedures, if only they would only use serializable isolation level, things like that. I think it's probably safe to say, I guess I don't know for sure that this is a true statement, but I think it is.

that SQL is probably either the oldest or, you know, one of the oldest languages that's still heavily in use today in programming languages.

And so it's got a long history, meaning that people were using SQL for radically different purposes when it first came out. That's where they were learning SQL and that's where the restrictions and what SQL means were put into place. And so if you look back, you know, 50 years ago, how SQL was used versus how it's used today, you know, it's not a surprise that

And there are lots of databases that do things not the way we would do it if we built it from scratch today, but are doing it in very important ways, not only because of the fact that data already exists, there has to be access that way, but also there's programmers who act that way with decades of experience that understand how to program in that way. And that's what they're used to doing. And

That's one of the problems I think with no SQL databases in general is, yeah, they may be quote unquote better. They're just different enough that the way we think about building applications has to change. And that's hard for a lot of companies to do. You just can't do that. So that's why things like Cockroach are so important and why I imagine as full of compliance with SQL standard, which is,

nebulous term is very important for your business. You can't be 90% there. You've got to be 100% there. That's exactly right. So true. And we've learned so much about exactly that word. What does SQL compliance really mean? Which company? It may be different depending on who you ask.

Are we talking about Postgres? Are we talking about Oracle? Are we talking about SQL Server? I think we, as an industry, have not done the best job just yet of finding that single standard that people can conform to. Don't think we're really quite there. Maybe one day. Maybe Postgres will become the standard. That would be a dream for me personally. Yeah, unfortunately, there'll probably end up being a couple of standards, right? If history works out, yes, there'll be standards, but there'll be

you know, this one or that one, and it'll be radically different. And the compatibility won't be perfect. It'll be just like browser wars. Actually, I already is. What am I saying? Well, we talked about transactions, but the other thing that makes scaling in SQL hard is joints and how joins work. You know, the fact that you can join anything to anything in any combination in highly complex ways, quote unquote, efficiently, relatively speaking, and,

How do you do that? How does either cockroaches architecture helps make joins easier or makes them harder or challenging for you or what? How do you do joins?

We've taken a lot of best practices from other distributed SQL systems that have been developed over the years. There's several of them. I think the one that I've spent the most time learning about is a system called F1 that Google developed. And F1 is a system that you can think of it as a distribution layer on top of a distributed SQL database or any other kind of database under the hood that you could implement joins on top of.

So inside of CockroachDB, the SQL system, it has a similar layeredness to it. There's an underlying sort of

standard execution engine type of thing that you could think of as the part that runs on the nodes that have the data. So, you know, let's say I'm doing a distinct operation or a select or a filtration or even a join between two tables on the same system. You can think of these operators that they're kind of just running on the node that you're asking the data from. What happens if you ask to do a join that needs to get data from more than one range that are not co-located?

what does the system actually do? You could imagine, well, you could send one of these operators. Let's say it's a join operator and it wants a little bit of table A and a little bit of table B. And the table A bits are located on the node that you're running the join on. But the table B bits are somewhere else. So you could imagine that node could send an RPC off

and start pulling in all of the data from the other table. And the nodes streaming all of the data, maybe you did a little filter push down, maybe you didn't, but in any case, it's maybe a bit expensive since you're having to send all this data over the network. So the insight of F1 is that you can actually distribute these processing operators across the whole cluster if you want. So you could make a,

hash join or a merge join that is composed of many little sub hash joins or merge joins or a distinct that's composed of many little distincts or you know you're taking filters and you're asking them to run on different nodes and really that's fundamentally what our system does it's

similar to this F1 idea. We call it distSQL. That's just an internal name. And the node that's doing the SQL planning, it can create a distributed plan and ask the nodes remotely to run these subcomponents of the plan. So not really too different probably from what other distributed SQL systems do, but this is the way that ours works.

Cool, cool. So there's a lot of chatter that goes between nodes. Is that correct? Absolutely. Yes. So how do you handle geo distribution with so much chatter going on? Or is there a different algorithm that works for geo distributed nodes versus local nodes? Or how do you do that?

So geodistribution, what is that about really just at a high level zooming out? The way that I think about geodistribution is saying we can have a distributed SQL database that lives inside of three data centers that are only a few milliseconds from each other. Or we can decide to say, we need to have replicas that live not only in US East, but also in US West and also in Europe. And when we have a system like that, we call that, that's the geodistribution part kicking in. So

So how do we deal with it, right? Under the hood, if we were to say, well, I'm a user of CockroachDB and I'm just going to send you all of my data, I'm not going to give you much instructions about what piece of data is supposed to live where, I think we would struggle as a system. Just as you say, there would be a lot of that chatter. A lot of that chatter would have to go over the cross-region links, which are expensive and slow and this and that.

What a system like Cockroach allows you to do, the way that we've chosen to allow you to organize your data by region, you can configure as a user a table to be regional by row, as the way we call it. And what that means is that

the database can look at a component of the row, let's say it's a user ID or a region code, and it can decide based on an algorithm that you pick whether that data should be homed in a particular region or another particular region. Not only in that region, so it's not like a traditional sharding system where that data only lives in that region, unless you want it to, but by default it's going to be homed in that system and replicated across other regions for redundancy.

So once you've got that all as a basis, I'm a user, I've configured my system to be regional by row in the right places, then the system won't even need to reach cross region if it doesn't have to. The nodes that are serving the data for any particular region will also be next to each other, and most of the time.

When you organize data like this, you don't have to end up doing any cross-region joins. If you do, you do, and then there's going to be some chatter, to your point. But that's kind of the high-level idea. And that's great. And that actually also helps with durability, which is going to be one of my other questions about do redundant nodes and all that sort of thing. And it sounds like the answer is yes, at least across regions and maybe otherwise.

But then that also comes back to the question about transactions and transactional integrity and how do you deal with that? When you have two transactions, one in Europe and one in the US, touching the quote-unquote same data, even though it's not the same nodes because they're replicated, how do you handle transactional integrity in those cases? Yeah.

Yeah, in those cases, it's going to be similar to what we were discussing before. There's going to be that two-phase commit. There's going to be that transaction record. And ultimately, if you're trying to touch that same bit of data, shopping cart data across US, across Europe, we're both trying to buy the same item. It's going to be slower. There's really no way to avoid that, unfortunately. The speed of light, we had a little phrase earlier on the company that the speed of light is that fundamental limit for Cockroach TV.

And that makes perfect sense. And, you know, it's what's nice about what you're dealing with. When you talk about adding capabilities like geo-replicated transactions, you talk about moving from you can't ever do this to you can do this. So you don't, it's not tend to be as critical of a question about the performance of it. It's understood that a transaction that has to go across four continents is going to take longer than one that goes across two.

four millimeters of a wafer. You know, it's decided that that makes a lot of sense, but it is something that the choice that you make is, you know, acid compliance is more important than performance when it comes to those sorts of situations. So you don't break integrity. You may break performance or suffer performance, but you don't break integrity. Is that in general, a good philosophy or is that in general, I should say a good description of the philosophy of cockroach DB? Yeah.

I think that's a good description. We think of CockroachDB as a database that's a fit for your tier zero system of record transactional workloads. So these are the ones that we're talking about the shopping cart that can never be inconsistent or more. A better example would probably be a financial system ledger. This is a system that you cannot afford to have an inconsistency in the system. So that's the prioritization. You know, if you're thinking about cap theorem, it's not exactly a cap theorem thing, but

It's the same kind of idea. You want to prioritize that CP, that consistency over the availability if you want to work on some kind of transactional finance system, for example. So let's get back to the durability aspect too, which we touched on a little bit with the geo-distribution. So obviously geo-distribution gives you durability.

Do you also offer durability options within a single region where you can replicate data and make it look like it's essentially multi-region from the standpoint of the durability standpoint? Is that the capability you offer? You're shaking your head, but for the audio, is that something that you do? Absolutely, yeah. So we do provide a kind of configurable replication factor.

solution like you're used to in a NoSQL kind of system. So you can decide to say, as a user, I would like to have three replicas balanced anywhere. That's maybe a default. If you're using a single region CockroachDB, you can say, ah, my range is going to be replicated three ways somewhere.

Or you can say, I want it to be five-way replication because I'm interested in extra or seven-way or whatever. And if you're multi-region, those kind of constraints that I described, the regional by row kind of thing, you can ask the system to do anything, really. It's quite configurable. And I think early on, we found that that level of configurability may be a bit

too much for most users. So we do provide some nice defaults and some sane defaults and things like that for most people. But if you really want, you can say, I want five replicas in Europe and three in the US or whatever kind of other combination you're interested in for your application's needs. Yeah, that makes a lot of sense. And a lot of people talk about durability of databases versus backups and the importance of durability as the replacement for backups. And

What's your philosophy in general about database backups versus a durable database?

Yeah, so it's a deep question. We think of Cockroach as a system that allows you more peace of mind than a standard distributed database or a standard single-node database because of this extra durability provided by replication and cross-data center replication, cross-region replication. But to me and to us, it doesn't obviate the need at all for disaster recovery, for backup

Organizations use backups in a lot of different ways. They use them as mistake rollback machines. Let's say someone comes and deletes your whole database. You can use disaster recovery for that.

Even if, you know, your system already replicated that mistake to all of the copies of the data or your organization probably needs this for compliance reasons. Even if you don't end up using those backups, there's lots of good reasons why there are laws that require organizations to take these backups or your organization might use it for, you know, annual,

risk exercises? What happens if there's an internal attacker inside of your organization that takes down that database and deletes all the copies or something like this? So by no means does the extra durability and extra replication take away the need for good old-fashioned standard backup disaster recovery. Or for that matter, vendor issues. If you accidentally delete a record, whether it was customer's fault or your fault, it doesn't really matter if that record is

is completely and totally removed there's nothing you can do about it anymore and that's only a backup that's offline and separate from the system would solve that yeah exactly that's that defense in depth i mean i should probably mention you know test your backups this is just general advice for all the listeners out there please test the backup one two three exactly yes exactly

So let's talk a little bit more about the sharding mechanism. You say you use sharding as a technique for doing the distribution, which is a standard age old approach, except as you also mentioned that most sharding solutions are customer managed. And so customers have to deal with deciding what the shard key is and then dealing with the consequences of having to reshard and all those sorts of things. Now you do all that automatically for the customer, which is fantastic, but what

How do you do that? What do you do to make the decision on how to shard, what to shard by, how much control do you give your customers and how exactly does that mechanism work? Under the hood, you can think of the

the CockroachDB architecture as another layered architecture. I was talking about how the SQL processing is layered, so is the storage. You can think of the SQL system laying on top of underlying giant key value system like Dynamo. So under the hood, that key value system, it's a giant ordered map of all of your data. And we've

decide a careful scheme of translating a SQL row into key value pairs. So for example, you know, if you're a table, maybe for users, and you have a secondary index on their email addresses, for example, you know, primary key is their first name, last name or something. And

you've got an id you've got an email whatever you're going to actually have three underlying key value sections of that giant kv store that has the first index the primary index of first name last name kind of a bad primary index by the way that just kind of came out of my mouth i mean you've got a second one for email and they have sort of separate spaces in that key space and ultimately when it comes to the sharding strategy we call them ranges sharding strategy whatever by default each

Each of those tables and each of those indexes is going to get their own little partition inside of CockroachDB. So those are going to all be, by default, their own replication groups. When those replication groups become large enough, and there's a heuristic about the size of the range in terms of megabytes, I forget what the number is now,

The system will automatically split that range up into multiple. And a similar heuristic comes about when we're talking about QPS. A particular range is getting a lot of QPS, a lot of traffic on it. The system might decide, oh, I'm going to split this up to make sure that the system can actually uphold that horizontal scalability that we were talking about in the beginning. If you only have one big consistency group, then it's just going to be hard for the system to actually scale it out.

So you're actually making a sharding decision based on the defined SQL indexes? Absolutely. Yeah, that's a super key property. And so to your point, you know, it's a little bit misleading, I guess, to say, oh, it's fully automatic and it has nothing to do with user decisions. That's not really true. The user, you do have a lot of ability to change how the system does the sharding. And it's quite important that you keep this stuff in mind.

you don't have to worry so much about the things like, oh, okay, well, I've got 128 shards and I've got to split it up into 256, but I'm worried about X, Y, Z. Like I've lived that life at my laptop. You do still have to worry about things like you've got an index on an ascending key, for example, you're doing an increasing ID, just one, two, three, four, five,

Systems like Cockroach tend to struggle since it's most likely that you're going to be writing to the end of that index all the time. And a system, it just won't be able to scale unless you have the ability to shuffle that hot area of the range out to multiple replication groups. So you can avoid that using other techniques like adding a hash in the beginning or using a UUID or whatever the case may be. Got it. This is great. So...

Let's start talking about the cloud approach to CockroachDB. Now, you specifically are the director of Cockroach Cloud. You have both an on-premise version of Cockroach and a cloud version. Talk about the differences. Yeah, yeah, yeah, sure. So we offer CockroachDB in kind of three different configurations right now. We offer it as a self-hosted product. So you can run that on-prem or in your own cloud and kind of manage it yourself.

You can purchase it through Cockroach Cloud. So if you go to CockroachLabs.Cloud, you can buy two types of Cockroach database, a dedicated Cockroach DB, which is just dedicated hardware. You can think of it that way. But it's got multi-region. You can have it live in Azure. You can have it live in Google Cloud. You can have it live in Amazon.

Or you can purchase it as a serverless deployment. And in serverless, you're using shared hardware and you're purchasing kind of request units instead of purchasing things like, you know, provisioned capacities. And this is nice for bursty workloads. It's nice for getting started. And it's also really nice for being able to have a workload that you, it's a bit less predictable. I'll put it that way. So those are kind of the three ways that we offer Cockroach right now.

So the whole category of distributed databases in the cloud has been growing dramatically, right? Databases like DynamoDB have been growing in popularity. Aurora, when you talk about SQL databases, right?

And obviously databases like CockroachDB are really designed for the cloud, really, because of the scalability aspects. Why do you think there's been such a growth? I mean, this is almost too much of a leading question, I guess, but let me ask it anyway. Why do you think there's been so much growth in distributed databases for cloud offerings in recent years? And why?

Do you see databases like CockroachDB being able to fill those needs as they grow?

Yeah, so I think the reason for the growth in need of distributed database, it follows the growth in the internet. People want to build internet scale applications more and more. That's almost the default from my perspective. If you're a startup, you don't want to aim your sights at a system that only serves your local city or something like this. You want it to be internet scale from the get go. And so I think because of that, people are really expecting that their database will

will be a system that scales along with their user base more and more. I think there's some wisdom about you should start with simple technology that's well known. You should start with something that's just really rock standard. And I think that's fine. I think it does make a lot of sense

My dream for our industry would be that over time, that distributed system database will become that standard first component that you reach for simply because it's a system that allows you to scale whatever your end goal is, right? You can keep it small, you can get really big and you don't have to worry about doing a migration halfway through when everything's on fire and you're worried about keeping up with the demand.

So you absolutely hit head on where I was trying to go with that series of awkward questions. I couldn't get out. And this really is in the concept of, I build several systems, you know, that's SaaS-based applications that do blah, blah, blah, blah. I've built for companies that have been really successful, like New Relic and companies like AWS that have been successful, plus some from my own companies and other things I've done. And you always hope every time you do it, that this is going to grow to be the giant thing that's

saves the world. And 99% of the time it doesn't, it stays small and everything's fine. And when you're running small scale in the cloud, any database works just fine. It doesn't matter what you use, put up a Postgres on a EC2 server and you're good to go and you don't need anything more than that. But CockroachDB aside, if you expect to grow, then the question comes up,

Maybe I shouldn't use SQL Postgres from the beginning. Maybe I should instead go to something like DynamoDB, which is so much different, so much, you know, new learning curve, all this sort of stuff, all the restrictions of tying you to AWS cloud or GCB cloud with big table, whatever. And maybe I should do that now so that when I scale, I can handle the scaling, even though it's harder for me to build it now. Yeah.

But really what Cockroach allows you to do is to say, you know, just build forgetting about scale. Really? You can almost say that it's not good advice in general, but you can almost say that build not thinking about scale, use SQL just like you would normally do and

Because Cockroach will be there when you need to be able to scale and go up to something bigger and better. And moving from, let's say, Postgres to Cockroach is so much easier than moving from Postgres to DynamoDB. Yeah, that's a great way of phrasing it. And we've had a few nice successes from businesses that start small with Cockroach serverless, actually. And

And they're trying things out. They're in dev and test. They go into prod. It's a slow thing. But then they hit some virality. In the AI world right now, that's like a big cause of virality, I would say. You've got a little idea, and it's pretty cool. All of a sudden, boom, it hits Hacker News. And who are all the people trying to use your application when it hits Hacker News? And is your database going to scale with you, or is it going to get in your way? That's kind of the fit that I see for serverless in particular and CockroachDB in general.

So you even see the case where instead of starting with Postgres, start with CockroachDB serverless, which is probably similarly priced than a Postgres instance would be, and do that at small scale, and that works great and wonderful. Then you don't have to do any migration at all when you move to a larger. As you scale, you can grow from there. The only time you'd want to move on is if you needed to move to dedicated hardware or whatever, just for cost reasons or whatever. Yeah.

Is that a fair statement then? Do you really see that as a use case? Totally. I think it's something that we have a great, you know, free starter plan. It's pretty affordable at lower scales on serverless. And from my perspective, you know, if I'm anticipating a system, you know, of course, this is what you said earlier, right? You are anticipating your startup to grow large, no matter who you are. So if that's me, then yeah, I would definitely want to focus on using a scalable system from day one, like CockroachDB for sure.

Can you easily move to the different levels, like from a serverless platform to a server-based platform? Can you easily make that migration path as you grow? Today, that's definitely a bit more of a work in progress for us. That's something that our team is really hard at work doing right now. We just actually came from our yearly planning week last week, and

And one of the big focuses that we've got for next year is unifying those two deployment models. You know, I've been talking about dedicated as one product and serverless as a different product

Really trying to take off the technology hat and put on the consumer hat. What you want to be consuming is the database, whether it's serverless deployed or dedicated deployed, you don't really care. You want a connection string. You want a console that shows you what's the slow statements. You want to be able to do schema changes. And you really don't want to worry about the deployment model at all. And so our focus is going to be unifying those deployment models and offering just CockroachDB in different tiers for developers.

whether this is a tier zero, high scale production deployment, or it's something that you're just getting started with. And that's kind of our next year's worth of goals. Cool, good, good. So Cockroach Cloud, you obviously have a huge infrastructure that I believe you and your group is responsible for managing. A huge infrastructure for managing your database as a service is really what it is. It's a database as a service, multi-tenant database at a scale, at a service level.

system application. What's the biggest challenge you have in keeping that infrastructure operating?

So today, we have to manage infrastructure across all of the clouds that we support. And for me personally, as a leader in this department, the biggest challenge is just managing all of the infrastructure and understanding where is it living and why is it spending and why is it costing this much? This is more of like a kind of cloud FinOps kind of problem that we have.

But it's something that we're struggling with, I would say, as we start to become a bigger and bigger company. You know, we've got a deployment model that we use in Azure. We've got a different one that we use in AWS. We've got a different one in GCP. Ultimately, I mean, what I've learned is all the clouds do things a little bit differently. Sometimes that's good. Sometimes that's not so good. Sometimes they have strengths in one department and weaknesses in another. But whatever it is, ultimately, the infrastructure is subtly different on each of them.

And one challenge for me has just been, how do I actually manage all of these costs and, you know, charge back the cost to the teams that are using them and just manage all the money? Really, that's been a fun challenge for me as of late. You know, that brings up the magic word I love to talk about, which is complexity, right? How do you manage this complexity and deal with, you know, the minor deviations from platform to platform that are critically important, but it's hard to keep track of because there are so many of them here.

So how do you deal with drift in your infrastructure? Obviously you, you know, in a very simplistic viewpoint, your infrastructure is a series of nodes that are identical distributed worldwide. And that's a very simplistic view.

But I imagine node drift and you define the word node, however you want to. That might be a single computer. It might be a whole network. I don't know. That's not really the important part. But the point is the configuration of that node itself.

How do you avoid drift between all the different deployments of those nodes over time and over distance? Yeah, so I guess the way that I would define the unit here, you were talking about nodes. I think the unit that we think about when it comes to just managing our own internal cloud is it's really at the cluster level. We use Kubernetes to deploy CockroachDB for CockroachDB Cloud. So we've got kind of all of these different Kubernetes clusters. We've got all this important but

fiddly networking stuff that you need to set up in order to make sure that the multiple regions can talk to each other properly or VPC peering. We even do this for our customers. So if you want your Cockroach database to talk to your infrastructure securely and not over the public internet, you've got to set up PrivateLink in AWS or Private Service Connect in Google. There's all sorts of kind of fiddly bits. And what's tricky is that depending on what the customer wants to do,

what configuration that we end up deploying for that customer is different. And so when it comes to Drift, we have this problem of figuring out the desired state per customer, per cluster versus the actual state. And the way that we deal with that is we use a system called Pulumi, which it's a system that helps manage some of this stuff. And we've gotten a good amount of value out of the way that we use Pulumi for this. We have a record of

of what's supposed to be deployed. And we can compare that record of what's supposed to be deployed with what's actually deployed and ultimately come up with, yeah, sort of a set of drifted configurations and manage dealing with them in a variety of different ways. I think one challenge for us is we sometimes have customers that we've got to sort of snowflake, so to speak. You know, you're taking a customer who needs...

Maybe you're deploying something fresh for them. You're doing a private preview for them, for example, a different type of networking. You're working on a new capability. How do you actually tell the system that is managing this drift that, well, it's going to be a little bit different in this case? We've got methods to do that. But ultimately, as I'm sure you know and the listeners know, the more snowflake clusters that you get or the more snowflake situations that you get, the more pain that you're going to undergo when you have to do things like upgrade the whole system or change the Kubernetes version of everybody.

This all leads to some fun challenges. I wouldn't say that we've gotten all this stuff figured out. By the way, this is just sort of some of the challenges that we're kind of going through right now. Yeah, I think it's a moving target challenge is really what it is. Yeah. But it is interesting that the word customer entered into the conversation there.

That implies to me that you're as probably, especially for your larger customers, even in the cloud offering of your product, you're a single tenant. Tenancy in which sphere? So an application that runs a hundred customers on a single interface,

instance of an application is multi-tenant if you deploy a separate instance of that application for each of your hundred customers that's single tenancy in that context we have a blend and it depends on the cloud and it depends so i mentioned kind of dedicated and serverless of course in dedicated we do offer for our customers a single tenant model where they're getting that dedicated hardware and serverless where they're getting that shared hardware and the

depending on the different ways that we use each of the clouds kind of play into this as well. And some of them we've configured more multi-tenancy, keeping the kind of hardware single tenancy, but more of the infrastructure multi-tenancy. But it's kind of a bunch of details. And I would say just at a high level, we have a mixture of both right now. Yeah. So the answer is you have customers that are single tenant, customers that are multi-tenant. Yeah.

The bigger they are, the more single tenant you tend to make them. Exactly. That makes perfect sense. Good. Okay, great. So that's all the questions I had for you. Is there anything else that you think that we should have talked about that we haven't?

Good question. Probably a lot. It's just so fun to talk about database stuff. It's such a rich and interesting field. There's a million places that you could go to. We haven't covered them all just yet. And that's a valid point. But I guess what I hope is maybe what we can do is have you back and do another episode with some follow-on topics as we go along. But I really love, I just want to say, I love what CockroachDB in general is

trying to accomplish and what they are accomplishing. And so I'm very excited to be watching what you're doing and to keep up with you. And I appreciate you coming on to the podcast. So Jordan Lewis is a senior director of engineering at CockroachDB Cloud, and he's been my guest today. Jordan, thank you very much for talking to me. This has been very informative. Thank you for coming on Software Engineering Daily. And thank you so much for having me. It's been such a pleasure. We had a great conversation today.

Thank you.

CockroachDB with Jordan Lewis 48:53 Share

Cloud Engineering Archives - Software Engineering Daily

Deep Dive

Shownotes Transcript

CockroachDB with Jordan Lewis