We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

AWS re:Invent Special: CircleCI with Rob Zuber

2024/1/10

Cloud Engineering Archives - Software Engineering Daily

AI Deep Dive AI Chapters Transcript

People

Rob Zuber

Topics

Rob Zuber: 我对容器技术在运输和软件开发领域的创新感到着迷。在 CircleCI，我们致力于帮助客户获得早期反馈，快速交付软件。我们现在提供从开发者 IDE 到生产环境的全面支持，包括部署和发布。软件开发的核心在于不断的变化和迭代，只有将软件交付给客户并观察其表现，才能真正对其质量感到满意。虽然 CircleCI 的名称可能需要更新，但目前这不是我们的首要任务。我们正在投资于链的监管，但客户也需要共同努力来确保安全。我们确保从源代码控制中提取的内容与最终交付的内容一致。在安全方面，需要从威胁建模的角度出发，识别最大的风险。CircleCI 正在努力减少对其他系统的访问需求，以提高安全性。过去，我们需要持有 AWS 凭证才能帮助用户进行部署，但现在有了更安全的 OIDC 方式。通过提供更好的工具，我们允许客户使用 OIDC 等框架，避免危险的方法。应该从根本上解决安全问题，而不是只关注已知的问题。

Deep Dive

Shownotes Transcript

Translations:

中文

This episode of Software Engineering Daily is part of our on-site coverage of AWS reInvent 2023, which took place from November 27th through December 1st in Las Vegas. In today's interview, host Jordy Moncompany speaks with Rob Zuber, who is the CTO at CircleCI. This episode of Software Engineering Daily is hosted by Jordy Moncompany. Check the show notes for more information on Jordy's work and where to find him.

Okay, so Rob, welcome to Software Engineering Daily. Thanks for having me. I'm excited to be here. I'm sure you've shipped applications that have been containerized, that were part of a container, but have you yourself been containerized ever? Because we are here at AWS reInvent. It's a very silly joke.

But yeah, we are effectively recording this in a container. Have you ever been containerized? Well, I have not, that I can recall, been in a container. But I will say, we were just talking about Oakland. I live in Oakland, and Oakland is a massive shipping port. And I see container ships all the time, and I'm fascinated by them. And I think I'm fascinated as an engineer by the technology and the innovation of containers and what it did for transportation and shipping and all these other things. So I love the metaphor in software, even with all the dumb jokes that we make.

Now I wonder, because Docker's founder, Solomon Hikes, I think he came up with the idea of packaging containers in the nifty way that Docker did when he was living in San Francisco. I think he was involved in a company that he also founded called DotCloud. You probably remember that better. But maybe he was inspired by the containers at the Oakland port or the San Francisco port, if they have any. Maybe, yeah, they would. I think Oakland's a little bigger, but they're on both sides of the bay and you see the ships just

drifting around waiting for spots or whatever they're called. Also, the tugboats are super cool, but now we're way off topic. I don't know, but at least the designers were because a lot of their early logos, and I think some of them still had kind of this mashup of whales and container ships and stuff. There's certainly a, at least in the design...

And the communication style, there's a San Francisco landscape and ecosystem inspiration for sure, whether it's economic or nature-wise. So tell us a bit about yourself, because you've been a long time the CTO of CircleCI, correct? Yeah, I've been at the company over nine years. And most of that time as the CTO came in through a small acquisition. The company was 14 people. We were three people, like small on both sides. And I had been working on CI-CD for mobile companies.

again with two other folks and then we joined CircleCI to combine that together with everything that CircleCI had built at the time to build a complete solution across the board. Because what is CircleCI right now, by the way? It's acquired a few other companies along the way? Yeah, we did acquire two other companies in the last few years. And if you're asking overall as a company, we're between 400 and 500 in terms of people. So that's a lot of growth, a lot of growth personally, you know, running a team going from 14 or whatever that was at the time to that size.

you know, it's been an exciting journey. We were just talking about containers. A lot has changed in how we deliver software in that time. So we did our own containerization in the early days. Actually, even at the time that I joined in 2014, we were using LXC, which was an early sort of built into Linux. I think some of the early Docker stuff was built on top of that. We don't need to get into the technical details, but

Thinking about Docker, about Kubernetes, about how we deliver software, all of these things have changed so much. Over the course of that time, our core purpose has stayed the same, right? Help our customers get early feedback, learn, deliver software quickly, feel confident in that. But how we've done that over that time has changed significantly. I got into doing that mobile CI, CD in 2014 because we were shifting as a software community, I think.

From Rails first, 2011, I was building a Rails app. And then by 2014, we were basically mobile first, right? Everyone was building iOS apps and Android apps. And the tooling just didn't really exist for that. So that was a big shift. And then the Docker shift and containerization and microservices. But at the end of the day, you're trying to build a product for your customer and get that out in front of your customer. And so I guess to round out about myself, I'm a software engineer by background, not by schooling, but ever since school and love software.

delivering great products. I love starting companies. I love thinking about what customers want. And so supporting our customers to do that for their customers is just a really, really fun place to be. So what does the CircleCI, this will be the end of the roundup of your description, professional description. What is the whole extent of CircleCI's current portfolio?

Yeah, so we talked briefly about acquiring a couple companies. And basically, we think about building and delivering software. And we think about the lifecycle, where that's happening, and getting feedback as quickly as possible. So it starts on the developer's desktop or laptop or whatever the machine is they're using, right? I'm working in my IDE, and I want very fast feedback. And so we're now providing that right back down into your IDE.

All the way out, so part of that came through one of those acquisitions, and then all the way out into production, which is the other end where we're now including deploy and release, so not just...

CI, which we've been known for for a long time, and CD, but release, which is going from I have the software in an environment to I'm exposing it to specific customers, right? Because it starts with the developer and the work they're doing. We think a lot about the fact that everything in software is change, right? We just...

made analogies to other engineering disciplines, but in a lot of other engineering disciplines, you make a big plan and you execute on the big plan, right? You build a bridge, you don't iterate and sort of figure it out as you go. The waterfall works well for those civil engineering projects, right? Exactly. But we start with something tiny and then we change and change and change. And so we think about change flowing, again, all the way from the developer out into production. And until you've released that change to your customers and seen how it behaves,

You can't truly feel good about it, right? So a green CI build, my test pass, isn't really that green if I then put it out in production and I cause an incident or customers can't use it or even business metrics change, right? Like not as many people are signing up. Well, we obviously changed something. It does what we expected it to do, but it doesn't perform the way we hoped it would perform in front of customers. And so we want people to have that full confidence again, right from the idea to the point where they're delivering value to their customers.

Yeah, I think that's something that, although I've been monitoring CircleCI for a long time, I don't think that everyone is aware that the offering, the product itself, covers the CD, let's call it, part of the software lifecycle. It does so for a while. It's been doing so. I just think that people tend to think of it as...

limited to a CI scope and it's not completely true as you just described. I'm sure there's a good lesson in naming somewhere in there. Have you considered changing it by updating it? I would say for the nine plus years I've been at the company, we've been having that conversation, but it's a hard job. And at some point you end up

Yes, your brand might say something about it, but ultimately it's a brand that people just recognize the name and then they go and use the tool. And so that's a trade-off. It's not the first problem we would try to solve right now. I wouldn't think it would change dramatically anything. So one of the things that the industry as a whole, I've been

having conversations here at reInvent for the two days that I've been around is revolving around security and CI systems and build systems both if there's any difference at all and I'm happy to hear your approach to it if it's worth it are under the spotlight in terms of the tool the space the moment in the software delivery lifecycle in which

most of the security critical measures should be in place. What are your thoughts on that? Do you think that that is a burden that should be put on the people managing CI? And if so, how does CircleCI approach that? Yeah, so it's a rich and deep... It's deep, it's broad, it's a very big subject. So there are a couple places that I think about a lot from a security perspective. One is...

what I often refer to as chain of custody, non-repudiation, basically knowing that the thing that went into the build system is the thing that came out. And that is an area where we are continuing to invest for our customers, but our customers have to do a lot of the work. And I think one of the things that's

interesting, challenging, but also helpful that we can solve for our customers is that there is a line that is the CI system platform that you use. And then there's what you bring as a software team, software developer, right? So there are many points that we don't own where you need to secure the system. So we need to be aware of that. But what we can do is say the thing that you put in

Meaning what we pulled out of source control is the thing that you sent out. And so some integrations around outside tools, like not all of that happens inside the system. But that is a key area in software delivery that, you know, you referenced SolarWinds earlier. It's an area that many folks are concerned with. Would you say that build SBOMs, the result of a build, the SBOM that describes the result of a build would actually solve...

That completely, partially, a big amount of it, or actually they are, as many think, potentially a waste of time, actually. Yeah, I think that there are many facets. And I think that sort of like law of constraints in many different places, like if you sort of flip it around and think about a bottleneck, you're going to solve a problem and just move the problem somewhere else.

And so ultimately, if I go all the way out and look at like a threat modeling perspective of security and step out of sight of CICD for a second, then the question would be, where are the biggest risks? Right. And so I think people who are saying this is pointless, right.

are probably identifying bigger risks. And that may be 100% true, right? And so do I have any control over what's even going into my source control system? I can't say that for all software development shops. That's a problem that you have to solve, right? Like if you look at

SolarWinds as an example, right? Which is the sort of... The watershed moment. Yeah, it was the key point that raised to everyone, hey, this could be a real problem. The amount of work and investment that went into making that happen, if you look over the detailed timeline, it's multiple years.

Right. It is a very, very long game. And in many shops, you can walk in through the front door. So why would I put all of the time and energy into, you know, that attack and attacking? Like, I don't know how many people think about this. I definitely think about this way. I used to work in email. And so I always thought about this with spam. Like, why do people send spam?

Clearly, no one reads this. The answer is enough people read it to make it worthwhile. So it's always an ROI cost. Like we don't think of attackers as MBA grads, but they're smart. They only invest the money when it's going to pay off, right? So the question would be what is the level of attack and then what is the return on that, right? So a lot of this depends on what you have, right?

Right. What you have that's of value. And so I said earlier, I think about a couple of different points of view and the way that I would bring those together is as a CI platform, we have a lot of access to other people's systems. So the place that we're really investing right now is getting rid of that. So historically, when CircleCI was created, we talked about systems changing and approaches changing.

The only thing that we could do in order to help you deploy was hold on to tokens. I mean, we're sitting under an AWS sign. So I'll say AWS credentials, right? And AWS credentials back in 2011 were all powerful, you know, no termination, like no expiry, access to everything.

Those are super dangerous to have around. Now, in 2023, you have OIDC. You have the ability to create a short-lived token, hand that off, use it for 15 minutes, and have all of its privileges removed. So by creating better tooling to allow our customers to use some of these frameworks, we're allowing them to get away from these dangerous approaches to using OIDC.

CI, CD in the first place, right? Or sorry, any integration I think is the best way of thinking about it. It's something that we've had to do in order to get to that deployment, but it's shifting rapidly. And so again, if you look for the sort of lowest hanging fruit or the first constraint, it's going to be something like that over, because if I can get access to another system, I don't need all the complications of injecting undetectable binaries into your build system kind of

thing, right? So I think starting from that foundation and working our way up is the right way to think of it versus, oh, here's a problem that I know how to solve. Therefore, let's go solve that one instead of what is the biggest risk to us right now. It seems like, as we mentioned before, that the SolarWinds was a watch moment and that in general...

Public organizations, governments have reacted in different ways. It seems that the EU follows a philosophy of, let's say, rule of law that is regulating the

And I personally like more the American approach, which is the American government will only buy software, and it's probably the biggest buyer of software probably in the world, but definitely in the US, or one of the biggest. And the American government will describe how it buys software, what kind of software it buys, or what kind of requirements that software will... And that will create enough demand for then that type of adaptation from that software to expand elsewhere. So this gives me...

The segue to ask you about FedRAMP and what kind of sort of like approach the CircleCI have to it and if those requirements are...

good enough? Or are they a good incentive for the industry at large? Yeah, I think it's a great question. I mean, we were the first CICD platform to be FedRAMP-tailored certified. And so we do take these things seriously. We see value in them. But I think it's important to recognize you described that as the federal government saying, this is the kind of software that we want to buy. And I think that's good for the government to use that weight in that way. And I

It's important to realize, though, that any kind of compliance framework is typically quite open, I guess is the best way of describing it. It's often what are the measures that you are putting in place to account for this risk? And then are you executing those measures effectively? And it's possible to kind of end up on sort of different points on that spectrum from

We think about delivering software quickly and effectively, right? And it's possible to put in a bunch of sort of unnecessary mitigations or unhelpful mitigations to try to meet those requirements and then impede your ability to deliver. On the other end, I think it's possible to describe things that aren't that effective and follow them but not actually be improving your security, right? And so I think it's important –

to drive from the perspective of security. And I talked about asking yourself the question, what's our biggest risk? What are the risks that we have in the way that we deliver software? And how can we mitigate those risks? And I think if you do a good job of that, getting to compliance in many of these frameworks is fairly straightforward. If you try to drive from the problem of compliance, you often end up with a lot of complexity and complexity actually means

breeds security gaps. It breeds quality gaps, security gaps. When your system is hard to understand, it's very easy to get it wrong. And so striving for simplicity and security, I think ultimately gets you to the place where you could say, oh, no problem. We do meet all these requirements. I think there are some specific things

specific technology choices, standards, et cetera, that are included in FedRAMP that are good to say, yes, that's great that you do it this way, but you must use algorithms that have met these standards. You know, basic stuff like that. But anybody that's writing their own algorithms for encryption is off to a terrible start and is definitely not on a pathway to security anyway. So I think that's probably best to leave to someone else to say, yeah, these are the ones that we currently consider acceptable.

One sort of like philosophical doubt that I've always had with CI systems and build systems is the difference between those two. Would you agree that build systems are...

Well, the process that runs locally only to compile, transpile, interpret the source code into a binary. And that CI would be that same process but hosted remotely. And if you agree with that or not, I would love to know if you could just chip in your opinions about which one is better or what would you recommend to anyone out there?

Yeah, I think you're serving different goals. So to your point, I think about build systems. I don't spend a lot of time thinking about build systems versus CI, but I think about build systems as the tools that we use to bring all the pieces together and compile the package. It's not always compilation these days, but construct the artifact, right, that we're going to... Would you say CircleCI is a build system? Well, in that definition, I'm thinking of things like... Bazel...

Make, Bazel, Mizon, like there's piles of different tools. And for us, it doesn't matter, right? We want to create an environment in which you can use those tools collectively, right? With your colleagues, with your teammates, whoever you're integrating with. You know, you're integrating your changes and other people's changes. And sometimes now those changes aren't even coming from anyone inside your organization.

Yeah.

or basil you mentioned, any of these, ultimately have some runners inside of them. What we're providing is...

One, all of the automation and tooling to allow that to happen regardless of who's making the change. And then two, the scale to make it happen so you don't have to worry about managing any of those systems, right? Whether it's, again, under the AWS, the bright AWS light here, just managing your spend inside of your cloud provider, right? Like spinning up capacity at the peak of your day, turning it all off at the end of the day is just a problem that most people don't want to work on. They're trying to build product for their customer. We take care of all that for you. So that's a little beyond just scaling

CI in general, but when I think about what a CI provider is doing, that's what we're doing is one, making sure that it's happening every time anyone makes a change and that that's happening constantly, many, many times per day. So feedback is provided as immediately as possible, especially when it's negative, when there's a conflict. Right, exactly. You want to make a small change and I make a small change and we find out our changes don't work together. Let's find out in an hour instead of in six months.

Right? Because in six months, it won't just be that one tiny change. It'll be this mess that we'll never figure out how to untangle to get back to the place where we're shipping. So we were talking about before starting the recording that it would be great if we could, I mean, not great, but you and I are having conversations at the forefront of software engineering, sort of like landscape. And we talk a lot about AI lately, and we probably are a bit saturated with it.

But it's unavoidable. So I guess among your customers, are you seeing any change of patterns in the way they build software with CircleCI that has been provoked? So I'm asking actually about causation, not correlation, but although probably it's difficult to find the first one. Have you seen any effect of...

LLMs, incorporating LLMs to builds and so forth, that has changed the pattern in which your customers, CircleCI's customers, are building software? Yes. There's actually three ways that I think about it. One is the tools that they use to write software. So we're thinking about AI coding assistance and stuff like that, which is generally producing more code, right? So you're integrating more often. There's people trying to build products that include

supported capabilities, right? Features in their product that now depend on an LLM and so trying to test that. And that's a new and novel thing. People are used to testing this input creates this output and now it's this input creates an output roughly in this range. Maybe. Right? And so helping them with things like understanding the concept of

evals and how do I do non-deterministic testing and know that it's good enough and what am I looking for? So that's number two. And then the third is we're actually putting capabilities in our product to help you get feedback faster, right? So we launched this error summarizer and if you get a stack trace, you're

we can summarize it for you and tell you probably what the fix is. Because often, maybe not the more senior engineers who have seen it a million times before, but a more junior engineer might look at that and copy and paste it, put it in Stack Overflow, try to find out what caused that error, and then try to figure out how to fix it. If we can just tell you, go do this thing, and we've even tried just doing it for you, which, you know, we haven't,

nailed at high scale, but we've made it work to just say, we know what the test is. The test failed. We'll fix the software and make it pass for you because we know all the details of the input, right? So I guess our customers are changing the way they work and then we're trying to help them work faster, get more feedback about what they're doing so they can continue to deliver faster. Going a step down in the stack since we're at AWS, are you seeing any changes in two areas in terms of speed, of power, of compute?

in the hardware that your clients, CircleCI's clients are using for the build, for the CI process? And in terms of security, are they looking for...

The infrastructure that runs the build system, the servers, the CI servers, are they looking for a particularly secure, more secure infrastructure than they had before? Are you seeing any of those trends be out there? I'll start with the security one. I think that that trend has been there for a long time. Yes, people are becoming more aware, but I think it's been consistent. The expectation that, you know,

how we manage our systems with their stuff inside. I think, as I mentioned, using OADC as an example, that notion of getting away from storing as many secrets with us as much as possible, which is something we're trying to help them do, that's better for everybody. Makes them ephemeral for those. Yeah, exactly. On the

Power and compute front, one of the areas where we've always invested is making available to our customers as many different options as possible. So if you're doing a very small thing, use a small, cheap container to do it. Why pay us for more? Why would you want to spend more on that? But we are seeing larger and larger compute demands, including a lot of, unsurprisingly, GPU applications.

access, which is something that we started offering a few years ago. We continue to expand with new classes of hardware that AWS provides, access to new GPUs so that more and more of our customers can use those to build because they're ultimately, maybe not everything is running on a high-end machine with a GPU because that's

That's expensive for everybody, but they can use the small cheap machines to do much of the work. But if they are integrating with a custom model that they've built and they want to validate the model and then validate the integration, they need each of those pieces. And so we try to make a large sort of

offering or menu available to customers so that they can choose, like mix and match what they need for the purpose while managing costs for their overall capacity. I don't know if the role of chief architect or the responsibilities attached to architect fall under the charter of a CTO, in this case, the CTO of CircleCI, with whom I am as the guest of the show today.

But if not, you're probably really familiar with this. So the question is, how do you architect an application, a product like CircleCI to be able to adapt to the changes that we just described and others? Probably unexpected because clients always come with really weird requirements and surprising ones. How do you architect a product like CircleCI? How have you been doing that for the last years so that it evolves and reduces the amount of friction that architects have naturally to changes?

Yeah, I think it's an excellent question. It is something that I spend a lot of time thinking about, have for many years and continue to. I think actually it's kind of in the answers in your question, which is designing for change, like thinking about change more.

As a core design principle, I already mentioned simplicity earlier, right? The more complex you make things, the harder they are to change. I think clear domain boundaries are really critical. So pulling together things that tend to change together, right? When change spans many domains, therefore probably many teams, maybe many services, depending on how your system is built, it gets a lot harder than when it is sort of

co-located or cohesive, right? And so I don't think the architectural principles are particularly surprising. I think

for example, AI and, you know, the mad scramble that everyone is going through of how do I integrate this thing that we never thought about before into my product is helping a few more people recognize the value of what folks have said in the past about designing for change, right? Like, I think many of the great writers in the world of software, the people whose books you will have read if you read anything about architecture, talk about change constantly. They talk about simplicity and they talk about change, right? And so I think that was

known phrasing in people's minds, but you really need one of these moments where it's like, no, next week we need to have something totally different. And you start to see the places where you're being impeded.

That you recognize, oh, okay, that's why we've been talking about all this time. So I think, you know, great domain boundaries. I'm a big fan of domain-driven design. Clear ownership and then cohesion, co-location of change. Those things are the kind of the underpinnings that allow you to then in a moment like this say, oh, no problem. Now we recombine these components in this other way.

And we've added this capability. I don't want you to give away any secrets, any particular competitive advantage or whatever. But if you could give us a lay of the land of the architecture of CircleCI, it would be great. But the actual question is, what specific areas of the architecture, of the way you've designed, you and your team have designed this throughout the years, do you feel particularly proud? Because precisely the existence of those features, of those design decisions, rather, are

have allowed you to adapt these years to make a product that is highly sought after. And from what I have seen and collected in the feedback of friends that have utilized it and clients, it's really highly valued. So again, if you could describe us in the

simplest way how it is architected and what decisions are you do you feel particularly proud of throughout the years yeah I think there's two things that I would call out I mean I'll try not to go through the entire architecture especially just by waving my hands in the air I don't think that would be particularly helpful

But a couple of things that are probably not super surprising. One is an area that we call execution internally. I mean, it's effectively at the point we know we need to run a job, a piece of work, and that might be many, many different jobs inside of a workflow or an overall build for someone. We hand that off to an area that's responsible for finding capacity, determining the machine type that it's going to run on, finding capacity in that kind of machine, and then

Getting it scheduled and run and doing all of that very quickly. We have high expectations for throughput and speed. We talk a lot about fast feedback. It's not fast feedback if you're waiting for your build to run. And the group, the set of teams responsible for that have taken a lot of time over the last few years to effectively cleanly partition that, meaning make a really simple API over it.

So it's not as aware of how the rest of our system functions. And effectively simplify, I mean, talking about clean APIs, domains, and simplification, simplify how we manage capacity, how we manage availability of VMs, of Docker containers, of whatever it is, how we connect into, we have a data center full of

Macs so we can do iOS builds, that kind of stuff. Do all that routing. That is something that then when we say, oh, you know what we really need is GPUs, it's sort of plug and play versus we never thought of this before. You know, we don't have the extensibility for this. So once things are simple and clearly marked off, sort of information hiding, if you think about lower level design concepts. So that team has done an amazing job of sort of

Carving that out and making it really clear to allow them to make fast change in any of those areas. So that's one. Another area that, again, that would not be surprising to people who know anything about our product, about CI CD, is how we talk to the providers of source code, right? So in the early days of CircleCI...

There was one place online to put your source code, which was GitHub. You mostly had source code. That's how you were driving change. And we talked about like Rails monoliths. It was a very different world. Wait a minute. Did CircleCI never work with SourceForge? This is a joke. I'm sorry. We did not. We did not. It comes up every once in a while. Oh, really? Yeah. I mean, the number of sort of

Of that era technologies that we did support is – it's a long list. We can talk about it another time. But, you know – Everything was in GitHub, yes. It was absolutely GitHub and we were able to get to market faster by leveraging a lot of what GitHub had done. But then people started to move things into Bitbucket and GitLab. And now it's like they're on Hugging Face and they're on –

Like everyone's got a prompt hub of some kind. So there's these different places that people are storing assets and the changes to those assets are driving builds.

And we had done the work over the last few years, and it was harder than we wanted it to be, to really separate ourselves away from that so that we could be independent of what was driving the change to allow builds to run under any conditions. And that is something that I would say we're proud of, but particularly we're mostly excited about because in this moment where we've had to make those changes, we've already done the work instead of scrambling now. That actually would have been for us –

The harder thing, I think, than the thing I was describing from the execution side to be like, wait a second, we need to be able to totally fundamentally change what we're based off of what we're integrated with. But we had already done a lot of the work. I really don't know what changes you did for that to happen. But had I been asked, and I'm obviously not a CTO, nothing close to that or a architect person.

But had I been asked a year ago, not five years ago, not a year ago, that if I was the decision maker for the architecture of a CI system to incorporate other things that were not source code and was not Git-based, that I should invest in...

making my system, my product agnostic of the source of the changes to allow, to accommodate for those changes coming from elsewhere, that if I should invest in that, I would say no. So I find that that insight that you guys had is extremely, at least uncommon to say the least, but certainly now I'm sure it's paying off, quite insightful and quite intelligent because you guys are then ready for this change

You know, this new way of building applications that is cooperating with source code, of course, from GitHub, GitLab, the Git gang, as I call them, probably other source code management systems, but also these new repos in which other things related to software, because I'm always hesitant to call neural networks and AI in general software per se, to inform the builds of your clients. So that's, I would feel proud of that. Yeah, I would agree with you.

It's quite insightful. How did you come across that? What were you thinking when you were saying, yeah, we need to be agnostic of this and be as flexible as possible, make our product as flexible as possible with this? Yeah, I mean, I want to claim...

It was all your idea, right? No, no, not even that. So I will say a couple of things organizationally. Our CEO, Jim, came from the same acquisition as me. So we had worked together for many years before and he was always the product person and I was the tech person in the earlier days. And so just to say he's a very strong product person, has equally insightful about where things are going. And so that's very helpful when you're making decisions about how to make big investments that you know are going to be expensive. Yeah.

For me, part of it was, okay, we're seeing, I don't know what it's going to be. I certainly didn't say, oh, on November 30th of 2022, the world's going to change. And the way that we think about building software is going to change. But we could see the signs, right? Even again, third-party libraries, third-party systems, all of like AWS is changing something about their EKS structure. That's going to impact me, right? So it wasn't just the source, right?

And I think for a while we've been trying to cram all of our belief about our systems into source, like GitOps, infrastructure as code. I'm 100% in favor of those concepts, but they're limited, right? At some point, systems are changing in a way that we can't always reflect in source. They're changing whether we like it or not and whether we decided to change them or not. And so seeing that happen and seeing that we were really still focused on the source code,

was a leading indicator. Again, it wasn't the, and then generative AI is going to drop in our lap, but rather this isn't the only thing. And we were sort of like, we see all of these other like signs of things changing in the world and we're trying to fit it into this

You know, round hole, this square peg, whatever. So how do we step out of that? And then, you know, we did a bunch of the work, partly just honestly, it just made it easier for us to do the integrate with all the other Git providers. But as we did it, we did it with the belief that there was going to be a lot more than that. And then this all landed and we were like, oh, fuck.

Thank goodness. Thank goodness we got ourselves here. So how would this work in practice? Because bringing up GitHub or a declarative representation of your system, whether it's the complete system, the infrastructure, the application layer, the data layer, everything in a data language, let's call it YAML for the sake of the example, store it in Git and have something reproduce that, reconcile that.

take it to production. Sounds ideal. I've worked at companies that propone that. In fact, I worked at Weaveworks, the company that coined that term, if I'm not wrong. I've also, like you, been a proponent of that system. But yeah, you pointed out a very strong shortcoming of that system, which is you need the description of the system to be updated of the underlying system itself. So what is CircleCI then proponing? That CircleCI would be listening, in a way, to the...

the part of the system and updating itself with the events that come from them? Or how does this work? I probably butchered the description, but could you describe as the difference between a GitHub's approach, a everything as code and the way CircleCI suggests works well with? Yeah, I think that ultimately...

You need a place where you can see all these things, right? And we've tried to do that. I mean, as an engineer, as someone who wants to know exactly what's in my system, I've tried, right? To say, great. And I'm still a big fan of infrastructure as code, you know, Terraform, CloudFormation, whatever you choose to use there. But it's describing a subset of the system, right? And we've seen a lot. I don't know how this plays out. And that's the other thing is about a

time like this is so many new approaches are coming out that it's impossible to make a bet. So you really want to be flexible. And so we're seeing, as I said, like prompt hubs and places to store models and, you know, people making modifications to open source models and pushing them back to places. They're doing their own custom training, their own, you know, fine tuning, whatever it might be.

And each of those is independently stored. Like people aren't taking their entire model and all of their data sets and putting that in source control. Some are trying, but it's not well suited to the task. And so what we're ultimately getting is that, yes, listening to all of those sources of change as we call them, and then being able to reflect to you, here's what changed in your system. Here's the current state of your system, right? And this went well, this went poorly, you know, whatever that is, so that you have a view that represents

what's happening in your production environment that came from multiple different places. I mean, again, under the, you know, sort of trying to take a simple AWS example, you might decide and put in your, you know, infrastructure as code in your repo, we're using RDS. But at some point, AWS is going to say, cool, we're upgrading Postgres. Like we're just upgrading the version, right? But that matters a lot to you when it turns out that there's some minor version that actually is breaking to what you do, right? And so you need to know that

And you aren't necessarily consciously making that choice, right? Especially, you know, you're getting an email notification or something, right? It's not AWS is not coming along and updating your repo for you. So knowing that these things are changing around you and being able to at least react and say, oh, okay, well, let's just rerun all of our tests against that version, right?

and know that we're still going to be good. Let's rerun them against a live RDS instance, which is the one that's been upgraded, and know that it's still good. Oh, that's brilliant. I really like that approach, to be honest. We're getting closer to the end of the interview. Again, a reminder that we are at AWS reInvent 2023. Looking a bit to the future, are you guys going to announce anything here? Have you already announced here? Or what is for...

coming to CircleCI in the next, let's say, year or so. Yeah, well, I mean, you can tell what's top of mind a lot. Yes, indeed already. Yes, we've announced some stuff this morning. We're here with the booth. So what we're, I mean, more access to instances with GPUs, et cetera, so that folks can build and train. We've done some integration. I don't know if we announced it, but we're demoing it with SageMaker, showing how you would use models that you've trained in SageMaker, introduce them and release them confidently as part of a bigger product.

Right. So model training is one thing, but ultimately you're integrating that into your product to deliver features. So how do you push that out through our deploy and release capabilities? Know that those things are good. And then that view that we, so you have all of these change sources, right?

you know, in multiple different places, that view that shows this is what it is that you are, this happening in your environment. And then this is how you can trace that back to where those changes come from. If you need to give that feedback or, or fix something. So bringing all of those pieces together is really what we're,

what we're focusing on here and helping people, you know, as we see those folks or many of our customers and new customers evolving to building product on top of AI capabilities, not just ML ops, but I have a product and I want to enrich that product with AI enablement.

helping them understand how to test that, how to know that it's good, and then how to roll it slowly into a production environment, release it to customers and still feel good about it. I'm quite excited that AWS is actually investing in physical and hardware, right? They are, they announced, I think, in

before we invent that they will be doing much like Apple, they will be providing the entire stack at some point. I think Microsoft has announced the same. So yeah, new hardware is coming, new very powerful hardware is coming. So I'm happy to know that CircleCI is actually incorporating more and more of those.

I had a conversation this morning at breakfast about a guy that his previous role was struggling to build computer vision pipelines to SageMaker and in turn could update their models that this guy was delivering to the client. So I'm fairly sure that companies would benefit from what you just said, the deployment pipelines and the better integration with SageMaker. So yeah, it sounds like CircleShare is still at the bleeding edge of...

you know, fast software delivery, which is, and I quote your colleague, Jim, that it seems like in the US, and we are in Las Vegas in the US, the key driver is speed. Move faster. And this is from a Redmond article that Kate Holthoff did in 2022. Is this still the case? Because I think circle sales has always been

perceived as a fast delivery platform. Do you see all this demand still out there and that you guys are serving that purpose? Yeah, I mean, I think that the, again, a time like this where so many things are changing, people are clamoring for speed. But more generally, the thing I love about speed is it meets that as a software developer, I just get frustrated when things take time.

Right. If you completely ignore my customer, it's just frustrating to me. Now, as a product person, someone trying to build product and put it in the hands of customers, I have exactly the same need. I want to get feedback. I want to build something small, put it in the hands of customers and learn if it's working or not. If not, why not? How do I fix it? And then ship the next thing. Right. So everyone benefits from.

and, you know, take it down to just flow. Like I'm staying in flow and I'm getting fast feedback and I'm working and this is really exhilarating or I'm delivering great product and I'm doing it faster than my competitors. I'm doing it in a way that's really finding the right fit in the market.

It wins for everybody, and that's why we focus on it so much. And part of it is just the best hardware we can get, but part of it is how do we get you feedback quickly? How do we tell you right away this is what's wrong so you can fix it and get on to the next thing? And so across the stack, we absolutely continue to think about speed and I think always will. That is developer experience, which you just described. There's one...

Piers, I think that's a phrase that I like to usually highlight, which is from Charity Majors that she says, and she means it with a different meaning, is that speed is security. This is not a digrette quote. And she means it because adding more changes or fast changes to your product will eventually correct the bug, the fault, and so forth, and you will eventually make the product. But I also think that if you provide the developer the ability to go fast and remain in deep work, in that mentality that you just described,

This person will make software secure because he's at the sweet spot of concentration that will allow him or her to be aware of many things. And so I really like this approach that you guys have to develop your experience and how you provide that. So with that, Rob, thanks for being with us and enjoy the rest of the week.

AWS re:Invent Special: CircleCI with Rob Zuber 43:40 Share

Cloud Engineering Archives - Software Engineering Daily

Deep Dive

Shownotes Transcript

AWS re:Invent Special: CircleCI with Rob Zuber