We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

#009 - Scaling MySQL with Brian Morrison from PlanetScale

2023/7/10

Backend Banter

AI Deep Dive AI Insights AI Chapters Transcript

People

Brian Morrison

Lane

全国知名的广播电台主播和消费报告员，专注于提供独立和客观的财务建议。

Topics

Brian Morrison: PlanetScale 是一个托管的 MySQL 兼容无服务器数据库平台，使用 Vitess 进行横向扩展，提供数据库分支、零停机迁移等功能，并支持地理数据分布以提高性能。Vitess 最初由 YouTube 开发，用于解决 MySQL 扩展性问题。PlanetScale 的数据库至少包含两个节点，以确保高可用性和提高性能。横向扩展的性能提升大致呈线性关系，但取决于代码、配置等因素。纵向扩展成本高昂，且性能收益递减。PlanetScale 提供团队支持来帮助用户处理分片和应用性能等问题。在 PlanetScale 中，节点可以配置为只读或写入，扩展能力取决于配置。PlanetScale 默认会设置只读副本，用于读取操作，主节点用于写入操作。PlanetScale 提供只读区域，用于将数据放置在用户附近。Vitess 采用更一致的模型，写入操作可能需要通过一个节点进行，大多数应用场景只需要一个写入节点，其他节点作为只读副本和故障转移。大多数应用是读多写少，初学者不需要过早地进行数据库优化。 Lane: 讨论了数据库扩展方式（纵向扩展和横向扩展），以及 PlanetScale 如何使用 Vitess 来实现 MySQL 的横向扩展。提问了关于 Vitess 的工作机制，以及在不同数据库（MySQL 和 PostgreSQL）之间迁移时可能遇到的问题。探讨了 PlanetScale 的数据库分支功能，以及如何处理数据和模式的复制。还讨论了地理数据分布对应用性能的影响，以及 PlanetScale 如何优化读取操作。

Deep Dive

Key Insights

Why was Vitesse created, and what problem did it solve for YouTube in the early 2010s?

Vitesse was created to address the scaling issues YouTube faced with MySQL. YouTube was running out of resources and pushing the limits of MySQL, particularly with replicas storing more data than MySQL could handle. Vitesse was designed to handle these scaling issues by enabling horizontal scaling and distribution of load across multiple MySQL nodes.

What is PlanetScale, and what makes it different from a traditional MySQL deployment?

PlanetScale is a hosted, MySQL-compatible, serverless database platform. It offers features like database branching, zero-downtime migrations, data caching, an API, a CLI, and integrations with other services. Unlike traditional MySQL, PlanetScale uses Vitesse to enable horizontal scaling, making it easier to manage large amounts of data and traffic.

Why is horizontal scaling (scaling out) often more cost-effective and reliable than vertical scaling (scaling up)?

Horizontal scaling is more cost-effective and reliable because adding more machines (nodes) to a cluster distributes the load, improving performance and reliability. Vertical scaling, on the other hand, involves adding more resources to a single machine, which can become expensive and eventually hits a performance ceiling. Horizontal scaling also provides redundancy, ensuring that if one node fails, others can continue to serve requests.

What is Vitesse, and how does it help scale MySQL?

Vitesse is a layer on top of MySQL that acts as a gateway or load balancer. It distributes the load across multiple MySQL nodes, enabling horizontal scaling. Vitesse was originally created to scale YouTube's MySQL databases by breaking up data across multiple nodes and ensuring efficient load distribution.

What are the main differences between MySQL and Postgres, and what challenges might a developer face when migrating from Postgres to MySQL?

The main differences between MySQL and Postgres include data types and specific SQL syntax. Postgres supports more data types and has a native UUID type, while MySQL requires using a binary 16 or a raw string for UUIDs. The syntax for pagination also differs, with Postgres using `LIMIT` and MySQL using `TOP`. For most developers, the transition is relatively smooth, with about 95% of knowledge transferring directly.

How does PlanetScale handle reads and writes in a distributed database setup?

PlanetScale sets up read-only replicas to handle read operations, while writes are directed to a primary node. This setup optimizes for read-heavy applications, which is common for most web apps. While it is possible to configure nodes for both reads and writes, the complexity and potential consistency issues often outweigh the benefits for most use cases.

What is the benefit of using geographic data distribution with PlanetScale?

Geographic data distribution with PlanetScale allows you to place read replicas closer to your users, reducing latency and improving application performance. For example, if you have a large user base in Europe, you can place a read replica in an AWS region in Europe, ensuring that read operations are faster and more efficient for those users.

What are the key features of database branching in PlanetScale, and how does it help with development and deployment?

Database branching in PlanetScale allows developers to create isolated copies of the database schema for testing and feature development. It works similarly to Git branching, with features like deploy requests for reviewing and merging changes. This ensures that changes can be made and tested without impacting the production database, and deployments can be done with zero downtime.

How does PlanetScale handle data branching, and what are the benefits of this feature?

Data branching in PlanetScale involves restoring the most recent backup of the source branch to a new branch, creating a complete replica of the production database. This feature is useful for testing and development, allowing developers to work with real data in an isolated environment. It helps in accurately simulating production conditions and testing complex changes.

What resources are available for developers to learn more about PlanetScale and get started with it?

Developers can visit planetscale.com/docs for detailed documentation on features and usage. The PlanetScale blog offers tutorials and in-depth articles, and the YouTube channel provides video tutorials. The hobby plan is free and includes a 5GB database with generous read and write limits, making it ideal for learning and small projects. Users can also reach out via the PlanetScale Twitter account for support and updates.

Chapters

Brian Morrison, a developer educator at PlanetScale, introduces PlanetScale as a hosted MySQL-compatible serverless database platform. He highlights its ease of use, especially for educational purposes, and its free trial.

PlanetScale is a hosted MySQL-compatible serverless database platform.
It offers database branching similar to Git.
It features zero downtime database migration.
It provides data caching, an API, a CLI, and various integrations.
It offers a free trial without requiring a credit card.

Shownotes Transcript

Translations:

中文

Vitesse was originally created to scale YouTube way back in the early 2010s. YouTube was running out of resources. They were pushing the absolute limits of MySQL to the point where even their replicas, the amount of data that was being stored in a replica was exceeding what MySQL was able to handle. So that's where Vitesse was birthed. It was created in order to handle the scaling issues of YouTube back in that timeframe.

Brian, I'm so excited to have you on the show. Do you want to take just a second and introduce yourself to the listeners?

Yeah, my name is Brian Morrison. I'm a full stack developer. I've worked in a lot of different industries. I've done front end. I've done back end. Professionally, I've done React and Angular. Back end, I do go and C Sharp. And currently, I'm developer educator at PlanetScale. So a lot of what I do is managing the blog, managing the YouTube channel, and then also working the documentation and making sure that's all up to date based on feedback we get from our customers, as well as new things we might be building internally.

I actually didn't know until just this moment that you are a Go developer. That's always really exciting to me. At BootDev, we mostly do Python and Go, but with a really big emphasis on Go. It's like 30-70 is split there. Nice. Very cool.

Cool. So I obviously brought you on primarily to talk about Planet Scale because Planet Scale is a key part of this CICD course that I just finished writing and will be released by the time this podcast episode is released. Tell us what Planet Scale is, like 50,000 foot overview.

Yeah, so 50,000 foot. We are a hosted MySQL compatible serverless database platform. That's kind of it in a nutshell. Key emphasis on the platform bit because we offer a lot more than just MySQL, even though that's like the main offering. We do offer database branching that's very similar to something like you'd work with in a Git environment, which most developers I believe are familiar with.

We have features that enable zero downtime database migration. So if you're making changes to your database in that branching setup, you can merge your changes and without having to like take down production, take down the database itself, you

We offer data caching. We have an API. We have a CLI. We offer integrations with a bunch of different other partners as well. So that's why emphasis on the platform. MySQL is the tip of the iceberg, and then it goes so much deeper. The core of what your users are interacting with is MySQL, but all of the features that PlanetScale builds are like stuff that you would have to do manually if you deployed bare bones. MySQL is the way I'm taking that.

Yep, pretty much. I want to give some context. I recently wrote this Learn CICD course for the Boot Dev Platform. It's going to be one of the last courses in this like backend learning path.

And throughout the course, students are using GitHub Actions to test their code, right? They're doing automated testing and automated formatting, all very much within GitHub, GitHub Actions. And kind of the last step of the course is to connect the web server that they've been doing all of this CI and CD with to a production database. And as I was shopping around different database solutions, obviously I've used many different databases over the course of my career, but I've never specifically shopped for a database before.

That would be great for students to spin up a development instance of in the cloud, because usually you're spinning up local databases to test with, and that's what students would typically do.

typically do. But the whole point of a continuous integration and continuous deployment course is that at some point there is a deployment. And I was just really impressed with how, A, how easy it was to get started with PlanetScale, but also that you have a free trial that doesn't even require a credit card. You can get this very ephemeral, very small database that to me is just perfect for education purposes. And I don't know why anyone else

isn't doing this can you speak to how planet scale thinks about scaling databases or i guess what the value proposition of planet scale is when it comes to scaling databases because i was both impressed like with how large it can go because it's like the main thing right planet scale you can scale up your database to very large sizes but also how small it would scale

Yeah. So just reiterating the question to make sure I understand this correctly, you just want some insight into how we scale at PlanetScale or kind of the philosophy around it. Is that correct? Well, I guess for my listeners, like, what does it mean to scale a database? A lot of people listening to this podcast might be very new to databases in general and kind of just thinking of a database as somewhere to put data. Like,

What does it even mean to scale up or down a database? And what business ramifications does that have for your project? I see.

Okay, so I'm going to go with the very academic answer and say that there's two primary ways of scaling. I think it's architecture in general. This is completely agnostic from databases, but you can either scale out or you could scale up. Most databases support the scaling up method, which means if you're running your database on a server, you just kind of like toss more resources at it, more hard drive space, more memory, more RAM, and then it gets faster because the database has those additional resources to utilize.

But there's also the scale out method or horizontal scaling. And that's where you can create additional nodes, we'll call them. And then those nodes, depending on however many you have, can all kind of work in tandem together to make the overall environment operate more efficiently and more smoothly. So at PlanetScale, we really hone in on the scale out aspect. We use a platform underneath everything called Vitesse. And Vitesse is kind

And I'll claim to fame as it's an open source, horizontal, horizontally scaling, uh, my SQL platform. So that's really kind of what we do now. Not that's not to say if you're on like one of our enterprise tiers that we can't like fine tune some of the underlying architecture to scale upwards and out at the same time. But for most people that are going to be using the platform where we're looking at more of a scale out implementation. Got it. So scaling up or scaling vertically, it sounds like is when you just add more resources to the same machine.

So you've got like one physical server and we're going to shove more sticks of RAM in it if it's running out of memory, or we're going to add more CPU cores if it's processing slowly, or we're going to add more disk space if we've filled up the database with data. Right. And that's scaling vertically. Why can't I just like do that? That seems easy.

Because it gets expensive. And at some point you run out of the ability to bump your servers as the pocketbook starts diminishing as well. In theory, you can toss as many resources towards a physical machine as possible until you can't. And then at some point, if you imagine kind of like a hockey stick kind of a graph, the cost for the amount of resources you're going to apply to a physical machine is going to start scaling exponentially over the actual return that you're going to get on it.

So the performance benefits that I get for scaling vertically start to fall off even as I'm spending more money. Like I'm spending more money and I'm getting less return on performance is what I'm hearing. Yep. Whereas scaling horizontally is adding more machines. So rather than having one machine, now I have two machines.

Basically, yeah. So within a planet-scale database, anytime you spin up a production-grade database, you automatically get at least one additional node for that database. And that primarily acts as a method of keeping your database online because nobody's perfect. Things happen, especially when it comes to working with computers. I'm sure anybody who's worked on a computer can tell you that. Sure.

stuff happens. So the additional nodes that we spin up for you give you that high availability, but also give you an additional replica for you to read off of, which in turn increases the performance of your application. Because instead of reading and writing from one of the nodes internally, you have multiples available to you to kind of like split and even out that load.

One benefit is that, okay, if I'm all on one machine and I just scale it up vertically, I have, say, a $50,000 computer, right? But if something goes wrong, my whole system is now unavailable and borked. Whereas if I have three nodes in a cluster and one node goes down, my users can still get the data they need to through the other two nodes. So reliability is better when you scale horizontally. The other thing is the cost. So

Am I correct in estimating that by adding new nodes to the cluster, the performance gains kind of scale linearly? Like I add a second node, I get twice as much read performance, roughly speaking, and add a third node three times as much read performance. Again, like I know this is like super hand wavy, but does it tend to work that way?

For the most part, yeah. Obviously, it depends on your code and your configuration and how things are configured. But like you said, roughly speaking, yes, it scales more linearly. Cool. I remember reading a paper when I was in school. I think Google had published it. I'd love to cite it, but this was many years ago, so I can't remember. I just remember reading the paper. But the paper went something like this. In the early days of Google, the way tech companies scaled up their systems was to...

uh buy like really expensive state-of-the-art supercomputers right so like the other big tech companies at the time were just spending like gobs of money on like large kind of vertically scaled up systems

And Google just started buying like commodity PCs that were like discarded and connecting them all together and building distributed system software, right? So they could take advantage of all of these commodity machines and put them to work, like indexing the internet. And it ended up just being way more cost effective again, because they're utilizing all this hardware that like in a lot of cases, people didn't even want it anymore, right? Nobody wants a PC that's seven years out of date. But if you connect it all, then like all of a sudden it starts to get really powerful.

Yeah, yeah, for sure. You run into when you start doing stuff like that, you run into more complexities because now you have to figure out how to make all the machines talk the right way. I think that's why a lot of companies traditionally will will scale up. Well, I think that the tides are turning at this point. But back in my day, people used to just throw throw hard money at hardware to scale up because it's it's easy to think about. Right. More power, more faster.

But trying to network all these machines together and make them kind of work cohesively can be a challenge. And that's one of just bringing it back down to planet scale. That's one of the cool things that we do is all of this complexity, it's all thought through and handled under the hood for you. It's not something you really need to think about. Right. I mean, that's something that like...

As a small tech startup, you have access to these days. The reason Google was able to do this back in the day was they had amazing engineering resources, right? They had some of the best engineers in the world that could write that really complex software that handled all of those network connections and distributed load properly. Like those are not easy or trivial algorithms to write, right?

So generally speaking, that's why like small software startups have always opted to like scale vertically until they can't anymore because it's simple, right? You can just kind of deploy your code on a monolith and add more hardware and like your problems are solved.

But these days, like we do have tools. I mean, PlanetScale is one, right? For database. Kubernetes is one that I use more on like the server, the application server side of things. But there are definitely ways to scale up your tech horizontally now without having to write all of the distributed systems algorithms like from scratch every single time. Let's talk a little bit more about Vitesse. You mentioned it a little very quickly offhandedly. What is Vitesse? Is it part of MySQL? Is it outside of it?

It's outside of it. It is, it is a layer on top of my sequel. So first off, it's interesting that you're mentioning papers from Google and their approach to it because the test was originally created to scale YouTube way back in the early 2010s. YouTube was running out of resources. They were running out of the, they were pushing the absolute limits of my sequel to the point where even their replicas, the amount of data that was being stored in a replica was exceeding what my sequel was able to handle.

So that's where Vitesse was birthed. It was created in order to handle the scaling issues of YouTube back in that timeframe. And essentially what Vitesse is, and I'm not a Vitesse expert, so I may butcher some of these terms. Vitesse.io is where you can go to get all this information if you really want to dive deep into it. But essentially it is a gateway or load balancer that sits in front of

multiple MySQL nodes, multiple MySQL instances, the actual daemon itself that's running on virtual machines or containers of some sort. And then each one of those instances of MySQL has some kind of sidecar process that communicates with that central load balancer in order to distribute the load evenly across all of these different nodes. And it's an abstraction layer on top of MySQL.

I want to dive in a little bit more into how...

Like how it scales out with the test. So let me give some examples. MySQL and Postgres are the poster children in my mind of like open source relational databases. Right. And so specifically talking about MySQL, the test, as you mentioned, is like this orchestration layer that allows us to like add multiple nodes to a cluster. That's great. I always think about MySQL and Postgres as like where you would start for most web apps, like cross-linked.

CRUD apps, right? Create, read, update, delete. If you have like very traditional kind of a user's table and a posts table, right? If your application fits this like fairly standard model, then these relational databases are a great place to start. Where looking outside of the traditional relational databases starts to make sense is when you have different data. So like

An example might be website analytics. So you're tracking like every click on a page. You're just trying to like dump as much data as you can into some place on disk. With website analytics, you might have an insane amount of writes, right? Every time someone's doing stuff on your website, you're just writing all of these event logs to some data store. And you would need some like very special database to be able to efficiently do all those writes and then do some big like aggregation query on the data.

What I'm interested in is, as we say scale up MySQL using Vitesse, are there certain use cases

that you want to stay within when you decide to use like MySQL and Vitesse and PlanetScale and certain use cases where you'd want to look to a Redis or an Elasticsearch or some of these other like more domain specific databases? Ah, that is a very, very good question. Let's tackle that last bit you mentioned about Redis or I guess an in-memory key value store. If you have data that is being, that is relatively consistent at being,

predictably read back. I think that's where something like these caching tools make sense, like Redis or even like our own, our own boost. We have a, we have our own internal kind of data caching mechanism that's still in beta, but it more or less will, will front load that data into an in-memory store and make accessing it significantly quicker for you. If you had asked me this question, say two years ago, I would have said that

Typically, if you have unstructured data where you have very common read patterns on your data and you know how you're going to access it, then some of the NoSQL databases would make sense. Whereas if you have highly structured data and you know exactly how you're going to query it and whatnot, then your typical relational database system makes sense.

I think in the last couple of years, those lines have began to blurred a little bit because tools like MySQL now do support like JSON data structures in columns. So you can very easily dump data, unstructured data into a MySQL database. Where in a traditional configuration where you might run into issues is the amount of rows or amount of data that's being stored in an individual table. One of the cool things about the test, because we support horizontal sharding, you can actually break up the data

from, say you have one large table that you're storing all of your analytics data from, right? And it's just growing exponentially and it's eventually going to hit a hard limit where the engine from MySQL simply cannot handle reading back that data. With horizontal sharding implemented, Vitesse can actually create a logical table that spans across multiple MySQL instances. And it will know based on its knowledge of the topology of configuration of everything,

which tables to access in order to grab the specific data that you're accessing. You're asking for it to pull back. Oh, okay. So there is, when you start adding different layers on top of the traditional implementations of things, these lines start to blur a little bit. I guess that's probably my answer now. Yeah. My SQL for everything.

That's pretty cool. Let me, so let me like read that back because we may have used a term that might lose some people. We talked about sharding, talked about sharding the data. So let's use like a really concrete example. Let's say you're a really bad person and you log every keystroke that someone makes while they're visiting your site.

Right. So as someone is like typing on your website, you're logging a new record in the database for every single key press on their keyboard. So you can imagine you have a thousand concurrent users. They're all typing. You've got like immediately millions of rows of data that you need to write to your database. And if I'm hearing you properly, it sounds like an individual node of MySQL has some limit. I don't know what that limit is. A billion records maybe for an individual node, 10 billion, something like that.

Off the top of my head, I couldn't answer that, honestly. Okay, that's fine. Let's pretend it's a billion. It's some crazy, crazy big number to us, but I'm sure to a database that's getting that much data, you could hit it pretty quickly depending on how fast the data is being pumped into it. Okay, but there's some finite limit on any individual node. And what you're telling me is that Vitesse, acting as like a load balancer that you're actually sending the data to first...

We'll split that data up. Let's say it round robins it between five nodes in the background. So you've got like five actual MySQL instances, each with their own tables. And it's saying, okay, you get one and now you get one and now you get one. And they're each now storing like a row at a time so that as you're writing data, they're all like filling up, uh,

Like synchronized swimming. We're getting a little bit into the nitty gritty of Vitesse that I don't, I'm not even confident in my answers for it. I believe that configuration is possible, but I know there are several sharding configurations that are available to Vitesse that depending on what your use case is, you might be able to take advantage of.

Sure. And sorry, I didn't mean to say that's exactly how it works, but that's kind of general idea that you're splitting the data up between nodes in the background, whether the algorithm is literally one row here, one row there, or maybe something different. Yeah, for sure. There's tons of different sharding algorithms out there. But okay. So just so we understand from a high level, Vitesse is allowing us to split that data up

across multiple nodes. And the reason I think this matters is if you go like Google MySQL scaling problems, like maybe you're trying to decide on a database and you go Google a little bit about MySQL, you might read that it has all these limits and so you don't want to use it. But it's useful to know that a tool like PlanetScale uses a test under the hood so that you actually, you have to understand that there's additional capabilities added when you stack technologies on top of each other.

Yeah, and I also think it's worth mentioning that if you are in planet scale and you're getting to the level where you need to worry about sharding and application performance and how your data is being split across multiple nodes, like we literally have a whole team dedicated to helping people with that. Then that is not something that

We would expect anybody who's just logging into the hobby tier, a plan of scale to pull off and be able to set up and stuff. That's definitely, that's a little bit, unless you already have a database where you have those concerns, that's a little bit of a down the road consideration.

Yeah, I completely agree. The thing that's interesting is the listeners to this podcast, in order to get their own hobby database up and running and connected to it, they won't need to know how to do this stuff. But listeners to this podcast are interested in getting back-end development jobs at large companies. And being able to at least conceptually understand the sorts of limits that larger projects start to run into, I think it can be super useful. Let's talk about MySQL...

versus Postgres. So most of my career has been in using Postgres. I think I used MySQL at one of my very first jobs. And then I quickly started using Postgres because my next job used Postgres and have just used Postgres ever since. And to be honest, for a long time, didn't even really understand the differences because they are so similar in many ways. The differences tend to be quite subtle.

I understand that the primary reason you guys use MySQL is because of Vitesse. It's built specifically for MySQL. But what are some of the issues that a student who's maybe done a bunch of projects in Postgres might run into when migrating to MySQL for the first time?

That is an excellent question that I don't think I have a great answer to because prior to PlanetScale, my professional path was actually in the Microsoft stack. So the vast majority of my knowledge when it comes to databases is in SQL Server. Now that said, I would, I do know that Postgres has a number of additional data types on top of what you would typically consider like a Varchar or an integer or many of the standard data types that are common across all databases.

And I imagine they're stored somewhat differently under the hood, depending on what data you're throwing at it. However, just speaking from my personal experience going from something like SQL Server over to MySQL, since this is really my first

job where I'm professionally working with MySQL on a regular basis, the amount of knowledge that translated from SQL Server to MySQL was something like, I'm in a ballpark, like 95%. One of the main differences that has tripped me up and still does this day after working with SQL Server for so long is the difference between limit and top to paginate data. But beyond that, I don't really, I would say anyone who's interested in migrating from Postgres to MySQL,

Just check the day or their data types that they're using inside their database. If there's, if there's overlap, great. If there's, if there's something different, I, there are plenty of strategies in order to either convert that data into a data type that my SQL can handle. You know, it might not even be necessary to store, to store it in a specialized data type. So there's definitely avenues in order to do that. It's just, it's going to be unique for everybody, I suppose. Yeah. Yeah.

One analogy I like to think about is for anyone, any of the listeners familiar with JavaScript, JavaScript is this language that like technically is one language, but depending on where you run your JavaScript, you get access to different things, right? If you run your JavaScript in the browser, you'll have access to certain DOM APIs. If you run it in node, you'll have other APIs. You run it in Dino, whatever, like it changes to pace depending on where you run it. And I think SQL is basically the same way, right? SQL is a language and,

By and large, if a database supports SQL, almost everything's going to work. But there are certain APIs that different databases support. So for example, the only thing that I really had issues with when migrating from Postgres to MySQL as I was writing this course was Postgres has a native UUID type. So it stores under the hood the binary format of a UUID.

universally unique identifier for anyone who's not familiar with that, that I often use for like primary keys and IDs within a database. MySQL doesn't have that built in natively. So you store it as like a binary 16 or something like that, right? Kind of a raw string. So like there's definitely a corollary. You can do both things in both databases. The syntax changes just a little bit depending on which database you're using. Yeah, yeah, very true. I just, I don't think it's,

It's not enough where it's like you're going from language to language, certainly. Like going from Postgres to MySQL, I'm sure 95% of even that knowledge is going to transfer over just the same, which is great that we have one common language that splits across most of our relational database platforms, honestly.

And to be clear, when I was doing this migration, I was writing raw SQL and I still only had to change the types of a couple fields to get it to work. If you're using like an ORM that's mapping like your programming languages code, so like Go or Python or whatever into SQL for you.

it's very likely you won't have to change anything because under the hood, the ORM will make those transitions. Now you might have to manually, like if you're actually migrating a production database, you might have to like do some changes on the database side, but it's unlikely you'll have to change your code, I guess is the way that I would phrase that. One more question I want to talk about regarding the scale of MySQL and Vitesse and PlanetScale is how do you think about

Writes versus reads. Do they both scale up equally well as you add nodes to a planet scale cluster? Or is it, is the horizontal scaling mechanism within Vitesse optimized for, for many reads over many writes, for example?

Again, I would certainly double check the docs, but it's based on configuration. You can set up the different nodes within, they're called tablets in the test lingo, can be set up with different attributes that will flag that specific tablet as a read-only, as a write, or even as like solely dedicated to backup. It's pretty cool what they've put together. That said, I think the vast majority of applications are pretty heavy into the read and a little bit less so into the write.

Which is why when you spin up a database in PlanetScale, we'll set up these replicas that are flagged as read-only. So this way you can use them for your reads and then if you need to hit the main for writes. And this is definitely not necessary for everyone, but you have that capability to bounce back and forth between the two.

Now we also offer read-only regions, which if you wanted to bring your data closer to your users, that's another functionality where you can get a completely separate cluster of your database in a different geographical region anywhere you want, basically around the world. We support a number of regions in AWS and GCP at this time. Cool. So it sounds...

Now, I want to put in all the caveats that people should really go check the docs on what I'm about to say. But from a high level, I do like discussing this stuff. And we'll just make that disclaimer for anyone listening. There are databases I'm familiar with where the distributed architecture of the database is...

Essentially, the idea is that there is not necessarily a master node. You can read or you can write to any node, and then the database becomes, quote unquote, eventually consistent. So you kind of give up some consistency in your database in the sense that if you write to one node and then read from another at basically the same time, they may not be perfectly in sync at that moment. So you give up some of that consistency. But it means you can scale up better in both directions in the sense that you can read and write better.

to any node as you add nodes to the cluster. My understanding is that Vitesse might not take that approach. It takes a more consistent approach in the sense that you won't have this consistency issue, but that you have to run all of your writes through one node maybe and then read from other nodes. Am I accurate in guessing that?

Yep, that's pretty accurate. And this is before my time at PlanetScale, so I'm just speaking some things that I've heard. But when it was explored on whether we wanted to create these configurations where there would be multiple read and write nodes, the trade-off of complexities didn't really match what the benefits were. Most

most use cases would really realistically only need one node that you would need to write to. And then the read-only nodes would essentially act as failovers. So if something did happen to that write node, another one would come up and pick up the slack. Yeah. So one piece of advice I would give to anyone listening to this podcast, if you're kind of new to backend development, is I think there's a tendency, especially among new backend developers, to like,

Use very generic terms. Oh, Mongo scales really well. And Postgres doesn't scale very well. Those two statements are not true. Like that, you can't think about it in those like vague general terms. You need to think about your application and how it accesses data. So like you mentioned, most applications are read heavy. I think if you think of any website, it becomes very clear why that is right. Right when you load the page, you're probably reading a bunch of different rows from the database. Anytime you navigate between pages, you're doing reads.

Right. Anytime you open a dropdown, you might be doing another read from the database. Whereas really the only time you write something is if you like create a new thing on the website. So imagine Twitter. If you log onto Twitter and doom scroll for an hour, you've just done probably thousands of reads to the database. And if you don't tweet anything, you haven't even done a single write.

Right. So thinking about that kind of stuff whenever you're confronted with the problem of scale, I think is helpful. And it makes sense why PlanetScale with the use case you guys have, I'm guessing primarily web apps and websites, why you would optimize really heavily for reads. Yeah. And I definitely think advice to some of your listeners who are beginner devs, I would recommend

Avoid becoming overly. What's the word I'm looking for? Over-optimizing too early on, right? If you're starting a new project, this is probably not something you need to be concerned with. And even if you get a job at a big enterprise company, having the general knowledge that a lot of these, these, this functionality and this technology exists is going to be enough to get you in the door and, and,

There's probably going to be another senior engineer there who's going to be able to like take you through the ropes. I'm a fully self-taught developer. I've never, I don't have a professional degree or anything. And all of my experiences just come from, from mentors in the field that have walked me through the way that things work. And I've just kind of like accumulated knowledge over time of a lot of this stuff too. So just something else to keep in the back of your head as, as you kind of like move into the tech or dev space.

I think that's really good advice. In my experience, as a senior developer, you might get some of these like more hairy, like scalability type questions on interviews. But as a junior developer, I think it's much more likely that you'll just get questions like, have you used X technology before? The questions will be simpler, but I do think...

Yeah, don't get the deer in the headlights. Yeah, yeah.

You mentioned edge or geographic data distribution, and I just like glossed over it until now. I think what you said was that you can basically have your database cluster somewhere geographically. Let's just pick a location, say New York City, and you can have a read replica elsewhere in the world. Why would you do that?

It depends. The reason you would do that is if you have plenty of users in a very specific part of the world. So like taking your example, if you have, if your main, I'm going to go, I'm going to go with Virginia because I know that Virginia is like the main hub for AWS. Let's say you have your main hub.

Let's say you have your main database cluster in US East 1, which is on AWS in Virginia. And all of a sudden you notice that in Europe, you're getting a huge spike of users. In that scenario, everyone who's trying to access your application or website is actually going across the globe to hit wherever your application and database is hosted in West Virginia.

Now, in order to optimize this, what you'd want to do is ideally set up your architecture or your application, your database and whatnot closer to wherever your users are. So if you're

I think there's a data center in Ireland or somewhere over there. AWS has so many data centers. I don't even know where all of them are right now, but let's just assume Ireland. You can put your application in Ireland and store it there and then put your planet scale database, put the read-only version of your database in that same arena. So now in that scenario, your rights are still going to take a little bit longer than the people who are accessing them from the States, right?

But at least if your application is configured like most are where the vast majority of them is going to be reads, that's going to take care of 90% of people who are trying to access your application from Europe as opposed to having to come across the entire world. They can access the data more locally, which in turn makes your application a little snappier.

Yeah, that makes a lot of sense. I've loaded... So, boot dev as a web application. I've done the boot dev experience through a VPN in India. And it's a lot slower than connecting to the data center. I think, actually, I host my website in Salt Lake. And I'm just outside of Salt Lake. So, it's always super snappy for me. But...

At some point, as we scale up the company and the project, we'll probably want to do some sort of geographic distribution. Like right now, we have 40,000 total registered users. We probably don't have enough users in any given geographic area to warrant that complexity. But I think it definitely makes sense at some point for us to explore those sorts of things. Yeah, it's like with...

What, what a lot of people realize, especially as they get into dev is it's a lot of, it's just this evolving beast that you just kind of like things come up and you just kind of figure out how to tackle them and then knock them back down. And I mean, if you ask me, that's really, that's the fun part of this field is like these little things kind of pop up that you now have to go and figure out a good solution for to engineer in order to address those users issues. Yeah.

Yeah, like for most applications, that latency isn't going to be a deal breaker. I have users in India using boot dev and they're happily using boot dev. And like it's a little slower than they would have if they were using it in the States right next to the data center. But it's not like unbearably slow or anything like that. It's probably like a second of latency rather than 100 milliseconds.

give or take. But like, I could definitely think of, there are applications right out of the gate where you might need to think more about geographic distribution. Like in gaming, it could be really bad if you have a ping time of like a second and a half. So it just depends on what you're building. Earlier, you mentioned that PlanetScale, in addition to like all these scaling things that it does for you automatically, there's some additional features that you've added. So I think you mentioned like CICD or branching. Could you, could you speak to that a little bit?

Yeah. And I really, this is one of the coolest features of planet scale. And when I was going through the interview process, it made me super excited for the future of the company. So using the power of a test, we can create isolated copies of your database schema, which in the user interface we call branching. What this allows us to do is because these branches are isolated, we

You can create a branch, create a connection string to that branch, and then experiment with building features or testing things on that branch without with having zero impact on your production database, which is really neat. Now, we've developed the feature very similar to the way you would do like code merges, code branching and code merges in Git and GitHub. So a branch in PlanetScale is akin to a branch in GitHub, but also we have this concept of deploy requests.

So once you're finished making your changes to your development branch, you can open up a deploy request, which actually lets other developers on your team comment on the changes, review the changes, just like you would a pull request inside of GitHub.

And then once everything is checked off and ready to go, you can merge those changes in. And then we do some cool magic behind the scenes, which lets you merge those changes in without having any downtime to your application. So you can merge them in immediately once the changes are done, or you can actually hold off on the merge and say, hey, I don't, if you kick off a deploy request and start merging, say at 5 p.m., right? Right when you're getting ready to leave the office for the day.

Like you don't want all of a sudden things to go into production after hours. So you can actually like to say, Hey, hold on, let's just wait in the morning. We'll come back and check it and make sure there's no issues and all that, all that whatnot. And then cut over. And the cutover is near real time, which is, which is pretty cool. We also on top of that offer a backout feature too, which, which,

You have a 30 minute window once those mergers have been changed into effectively just say, nevermind. Something went south. Making changes to databases in general is hard and we try to make those as simple as possible, but it's inevitable that things might go wrong. So we have another feature which lets you quickly undo those, those changes so you can get back to your application running as quickly as possible. Nice little control Z for your database. Yes. Cool.

Okay, I have a question about the branching because it sounds really, really neat. A big problem that I had at a previous company was that we had different staging environments. So obviously, we weren't just making changes and rolling straight to production. We would roll out to staging environments. Then we had QA teams that would check the application on the staging environments. And one of the big pain points was that...

I mean, schema, as you mentioned, was one thing, but another big pain point was the data itself. Like you need like copies of the data in the database. Does the branching address the data as well as the schema or is it just schema changes?

In our base tiers, it's just the schema changes. So you're kind of, you're kind of left to yourself to seed it with data if you want to, which, which can easily be done using our, our CLI. You can actually connect direct, you can connect to any branch using our CLI and get yourself a, a MySQL CLI connection to it, to, to run some of the same commands you would, if you were using the MySQL CLI on a regular MySQL database.

Now, once you start getting into some of the higher tiers of our offerings, we offer a feature called data branching, which essentially takes your most recent backup of the source branch and just restores it directly to the database. Takes a little bit longer because obviously you're piping data to it. So it depends on how big your database is.

But using the combination of branching and then the data branching portion of it, you can actually get a complete replica of your production database that's completely isolated from your production database in order to build on and test and bang away at and do all the things that most developers do when they're extending a database.

Yeah, that sounds exciting to me because it sounds like basically an out-of-the-box solution for the kind of manual crap that we built at that company to get things working, right? Like we had all these scripts that would like clone the database, spin up a new one. And it's just a lot of work. I haven't tried the feature yet. Sounds pretty cool. I'm very excited for that.

All right. Where can people find more about PlanetScale or let's start with PlanetScale. And then I also want to plug your stuff because I know you do a lot of stuff on Twitter and YouTube, but where can people find PlanetScale? What are the resources they should go look at first? Yeah. To get started, I go to planetscale.com forward slash docs. That's where a lot of my work lives. We document all of our features perfectly.

pretty thoroughly. So this way, if you're looking to understand how certain features work, that's the first place to check. Our blog is also an excellent resource to finding out how to build certain applications on top of PlanetScale, like just sneak preview of something that's coming out. Well, I guess by the time this podcast launches, it will be out, but like how to build a Laravel application on top of PlanetScale. We have a new blog post that's coming out. That's going to show you how to do that and build something that

You can, it's not just a simple hello world application. You can click around, you can do some things within the application. And we have several of those on top of just our normal, highly in-depth technical blog articles that we have. I would say those are some of the best resources. Our YouTube channel, go look at that too. You'll, you'll see my face a lot sprinkled all over there where I'm trying to show you how to use certain features of the, of the platform as well.

Our hobby plan is free. You get a five gigabyte database. Our tiers are all usage chart or usage based now. So you get, if memory serves me, it's 1 billion rows read and 10 million rows written in a given month for a free tier, which is super generous, you know, in the grand landscape of database offerings. Yeah. And then

Hit us up, let us know. Submit a contact request or at PlanetScale on Twitter is one of the best ways to get in touch with us. I'm one of the people that actually manage that Twitter communication too. So you might end up even chatting with me behind the scenes and not know it. That's awesome. And where can people find you specifically on Twitter or anywhere else that you hang out?

I'm a little bit less on Twitter these days, to be entirely honest. If you want to follow me, I'm at BrianMMDev. If you want to follow any of my other work at BrianMMDev, I'm pretty much that everywhere, but I'm also that on YouTube. My website is BrianMorrison.me. I blog on there. You can find all my past work and everything I've done in my career on there as well. I like to write about my past projects. I've worked on some what I think are pretty interesting things along the way.

Yeah, I think that kind of sums it up. Oh, and then one other personal plug. Next month, I will be at that conference up in Wisconsin Dells. I'll actually be giving my first presentation. I'll be talking about breaking in. This actually lines up really well with your CICD course. I didn't even do this on purpose, but I'll be breaking down kind of a full pipeline and mimicking something like the...

like Netlify does where you push, where it pushes your code to production. I'll be deconstructing that and showing you how you can build your own pipeline using a bunch of different kinds of tools and give you some starting points and whatnot. So yeah,

Come say hi. That's awesome. We'd love to meet people. Congratulations. First presentation of the conference. If any listeners are at the conference, obviously go watch the talk. I'm guessing the talk will probably also go up on YouTube. Conferences usually do that. I don't know. It's not one of the main talks, but I'm not entirely sure. That conference runs two a year. There's one in the summer. It's in Wisconsin. In January, it's in Texas. And they didn't record some of the smaller sessions in Texas. So...

Okay. I don't know. It'll eventually be up there even if I got to record it myself in front of my computer. Cool. Sounds great, man. And if in case anyone's confused, it's actually called that conference. Like that's the name of the conference. We're not like just being facetious. Like everyone should know the conference we're talking about. Yeah.

It's creative, but can be a little confusing. That.us is the website though. It's run, the guy who runs it is really great. It's a fantastic conference. It was more fun than I've had at any other, any previous conference I've ever been to. That's cool. I need to go to more conferences. Maybe that's going to be one of the next ones I hit up. Thanks so much for coming on the show, man. Talk to you later. Yeah, it was great chatting with you. Thanks for having me, Lane. Bye.

#009 - Scaling MySQL with Brian Morrison from PlanetScale 43:23 Share