We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Rajarshi Gupta: Artificial Intelligence and Crypto at Coinbase

2025/2/6

Generative Now | AI Builders on Creating the Future

AI Deep Dive AI Chapters Transcript

People

Rajarshi Gupta

Topics

Rajarshi Gupta: 我在Coinbase领导AI团队，负责保护每笔交易和登录安全，并通过个性化内容和通知提升用户体验。我们还利用生成式AI技术开发了员工助手和客户支持聊天机器人等应用，并积极采用Copilot、Sourcegraph Kodi和Cursor等工具来提高工程师效率。在模型部署方面，我们使用多云、多LLM平台，并通过多种方法评估模型性能，但LLM的置信区间评估仍然是一个挑战。GPU的可用性是部署AI应用面临的最大挑战之一。未来一年，我们将专注于应对潜在的交易量激增和利用AI优化公司内部运营，特别是在合规和效率方面。此外，我认为在LLM评估、解决企业级问题以及利用生成式AI优化企业运营方面存在巨大的创业机会。区块链技术为代理AI提供了一种独特的优势，因为它能够实现可验证、不可变的交易，并支持微支付。我们构建了多层次的防护措施，以控制模型输出、访问权限和用户行为，并保护人工客服人员。目前缺乏对LLM进行严格评估的方法，这与以往基于精确率和召回率的机器学习模型评估方法不同。 Anand Iyer: 作为Lightspeed的投资合伙人，我对Coinbase如何利用机器学习技术印象深刻，特别是其在生成式AI领域的投入和创新。 Michael Magnano: 作为Lightspeed的合伙人，我对Rajarshi Gupta在Coinbase的AI工作表示赞赏，并认为这次对话对于了解Coinbase如何利用机器学习技术非常有价值。 Eugene Chung: 作为Coinbase的长期用户，我对Coinbase在应对市场趋势方面的能力印象深刻，并对代理AI在加密货币领域的应用前景充满期待。 Josh: 作为AI职业培训平台AUX的创始人，我对LLM的置信区间评估问题以及如何将以往机器学习模型评估的严格性应用于LLM表示关注。

Deep Dive

Shownotes Transcript

Translations:

中文

Hey, everyone, and welcome to Generative Now. I am Michael Magnano, and I am a partner at Lightspeed. This week on the podcast, I'm sharing a conversation with Rajarshi Gupta, the head of machine learning at Coinbase. He joined us on the Generative San Francisco stage last year with my colleague Anand Iyer, a Lightspeed venture partner focused on crypto. This

This is a really interesting conversation about how Coinbase is incorporating machine learning into every part of the company. Rajarshi also answers audience questions and offers some great advice for founders. Enjoy.

So, welcome to Generative SF. My name is Anand. I work on crypto here at Lightspeed. And I'm excited to have Rajesh here. I'll have him introduce himself. And just as a reminder, this is a session that's focused on meme coin tradings. Rajesh is an expert. I'm totally joking. The AI folks here are genuinely freaked out. So I'm going to retract that statement right away. But why don't I turn it over to you, Rajesh. Maybe you can give a little background on yourself and

and everything from Genesis Block to how we got here today. Yeah, sure. All right, so I think going forward, I did my PhD across the bay at Berkeley, and then I spent 10 years at Qualcomm.

And at Qualcomm Research, I guess my career, the funnest thing I did was that we built the industry's first on-device machine learning engine. So this was way back and we launched it in 2015. It was the first on-device machine learning engine meant to catch malware on Android devices by looking at everything that was going on on the phone.

And so, you know, I was really excited about it. This was my zero to one project. It was my idea. I was the only engineer on the project at the beginning. And then we kind of built it and shipped it and so on. So then after my Qualcomm journey, I went and did a couple of startups. Both startups were in security. The first was a small startup called Balbix, which is a Mayfield startup down in San Jose. And then I joined a much bigger startup called Avast, which was actually the world's largest consumer security startup.

There were 500 million active users. And then we went IPO, and then since then we merged with Norton for like $8 billion. And now the new company is called Gen. Then I joined AWS, had worked at a very big company, joined AWS as a GM in the SageMaker team for a few years. And now for the last three, I've been leading AI at Coinbase.

No, that's perfect. Thank you for being here. I don't think we talked about this Android experience. I'm curious because that was pre-generative AI. Oh, totally. And so, and then you had to, I mean, malware and Android has been a thing forever. So yeah, maybe like, I'm just curious, like maybe let's unpack that because I don't think we've talked about this before. So like, what was that like? Yeah, I... And it's like, was it...

Was this model happening on device? Oh, yeah. The model was happening on device. In fact, the entire model was written in C. And we had to write our own training algorithms because there was no tools at the time. And we wrote our own training algorithms. The model was written in C. And it had to be written in C because you couldn't put the model on the regular Linux stack. It had to put it in the internal secure stack of the phone so that the model itself could not be hacked, the code. So that was really funny. We were learning how to do

training because nobody really was doing it well at the time right this is like 2011 12 13 right but to do how to do training we had to learn write the code we had to write the training well the training was happening offline but the the inference code had to be written in a very optimized way it was looking at everything happening in the linux layer all the way to the app layer on the android devices and it could literally catch a ton of malware so we shipped

from between 2050, I don't know, most people here probably have iPhones, but if you had Android, so we've shipped it from 2015 to 2019. It shipped in all the high-end Huawei and Samsung and LG phones. So eventually it shipped over a billion chips.

That was a really fun experience that we had. What was the learning process like? Because how do you know when you were successful and how did that kind of feedback loop go back into the models? So when we first started doing it, you know, I remember, so Qualcomm is a nice company that, you know, as I think at the time I was like a staff engineer. I mean, as a staff engineer, you have a cool idea and you could literally go up to the CTO and present the idea. He was coming to our office and I got a slot to present the idea to him.

And Matt Grob, and he basically looked at the idea and said, this is a crazy idea. It's never going to work. But if you think this is going to work, I'm happy to let you try. So go ahead and try it. I mean, this is what we're supposed to be doing, right? You know, I won't be doing my job if I didn't encourage you to go try this thing. And so don't believe me, just go and try it.

And then I tried it. I was the only person that I got a couple more people and a couple of interns to help me. And then we built the prototype. And the prototype was we'd taken, there used to be a version of Android called

Ah, I'm forgetting the name. So basically you could take that version of Android and make changes to it on your own. And then we made it work on a real phone. There was real malware and it was actually stopping and catching it. And then basically the power of the demo. The demo was, hey, here's the real phone. This is the real malware. We had two phones. One was not catching it. One was catching it. And then people said, heck, this works. So let's try and build it together. And we did. Yeah.

Amazing. Obviously, it's been a while since then, and you're now at Coinbase working on machine learning. And yeah, it'd be great for everyone here to understand, like, what is your role in Tail? What do you do there? And, you know, I know you have a fairly large-sized team as well that's pretty, you know, geographically dispersed. So the first thing we do, which is probably the most well-known, is, of course, our

my team makes sure that every transaction, every login is protected. Like as you might imagine, it's not just Coinbase, right? If you have a PayPal account, if you have a Visa account, every transaction is protected by a series of machine learning models. And in crypto in particular, there's a lot of people trying to attack your account and we protect it. Literally, every login, every transaction. So that's one piece of it. The second fun way of thinking about it is, you know, you open your app, we get to decide what you see.

Honestly, I mean, you know, there's so many apps, so many assets, so many things. At every stage, you do a transaction, you see a bunch of other assets that you could buy, or you get notifications. Naturally, this is like Facebook or Instagram, right? Or LinkedIn. The notifications we send are targeted towards you. We decide what can we get to see. So these are very traditional. Like one is traditional finance application, one is traditional web two applications. With the app, we have to do these things.

I have so many questions for you. Maybe we'll start with, because you've been there for three years and I feel like generative AI. Oh, sorry. Yeah. I mean, I knew you were going to ask. I just told you about the rest of it. And then I have a whole big gen AI team, but let's talk, let's get there. Yeah. I'm curious, like what, I mean, maybe if you could share a little bit about as much as you can about the kind of models you've developed, how much of it is off the shelf? Are you using any, any known flagship,

foundation models or so on and so forth. I mean, one of the things we did was that when the generative AI explosion started, I would say early, like end of 22, well, basically 3.5 came out, which is Thanksgiving 22, right? So very soon after that, we made a strategic bet that this is going to be a game changer and we are going to invest heavily into it.

Now, this was not a very good time for Coinbase because this was at the bottom of the crypto doldrums and we had just had a big layoff and stuff. But I was like, all right, this is the technology. This is what is going to really get us, help us. So we are going to focus. I mean, OK, we're not hiring, but we're going to focus everything we can to do this. And I can tell you that we released our employee assistant in fall of 23.

It was like pretty early. So, an all company employee assistant was released just after Thanksgiving 23, actually. And the employee assistant is, what does that do? Okay. So, I'll tell you. So, what are the things we are doing with JII? So, there's two halves to the projects. So, one half is to essentially build something for all our employees. And the other side is to build an assistant for all our customers.

Now, naturally, the employee side is easier because, you know, you're talking to your own employees. You don't have to worry about the dangers and so on. The first big release was there. We're doing two things. So first one is very straightforward. We have an employee assistant persona where every one of them has an integration with Glean to be able to get access to your data so that it can give you answers based on what you know. Enormously useful. A lot of people use it.

Then there's a whole series of other things that my team initially built, which was to help different people like our designers, things that help our finance people. So we built several of these ourselves. One of the most popular in the company is that we just went through our performance review time.

So we have a performance review assistant, which you basically take your notes of your bullet points and you drop it and it writes your performance self-review for you in the Coinbase format with the correct structure, with the thing, with the correct word limit. And then you need to edit it and do it. So this time we had out of, I think, 3,200 people, 2,000 some people used it.

So everybody across the company, all the way to the C-suite uses it. We can't see what they're doing with it, but we can see that a number of people in our exec team are using it to do this.

So that's like one example. So then what we did is that we really wanted to build it as a platform. So we released much of the functionality as an API such that other people could build on it. So like not the entire company doesn't have AI expertise, but we built it such that other people can use it as an API. So like somebody went ahead and built a really nice one, which we built what is called an incident bot.

which is that when there's an incident going on, you get into the Slack channel. And basically if you look at the history of Slack channel, there's basically somebody comes in, asks what the heck's going on. Somebody answers. 10 minutes later, somebody else comes in and asks what the heck's going on. And then things have changed and stuff. So now every person comes in and you just ask the bot, say, "Hey, what's going on?" And it'll DM you back exactly what's going on. And this was built by that team.

We have our data science team who built a text-to-SQL bot for it. And then, you know, there's all these things. So that's the other one. But now this is for internal employees, right? Now for external customers, we released the first few releases where, of course, you know, we are planning to do an assistant for all your user journeys. The first user journey that we tackled, it was the obvious one, which is the support user journey.

So we released our LLM-based chatbot in November. So we're now handling whatever, like tens of millions of use requests with that. We took over the search.

on our site. So now if you go to do a search on either on your phone or on the website, you'll get a Gemini like answer first from AI. And now we are doing other things like, and this hasn't released yet, but we are working on things where when a customer tries to do some research, customer is trying to do, when we provide insights on things, customer is trying to look and find out about certain types of crypto assets, which is pretty complex. We will help you. And then this is where we are going to expand. Got it.

How do you evaluate and manage deployments of these kinds of experiences, assistance agents, models? What does that process look like? So the way we do that is that, so this is called CBGPT, Coinbase GPT, and this is a platform. So it's truly a multi-cloud, multi-LLM platform. So we literally use models from Azure, GCP, and Amazon. You asked the question earlier. So the

Two biggest use cases, which is the chatbot and the help are both on cloud, but it's load shared between AWS and GCP. Now, the way we do it is that we have, this is one of my biggest problems with the whole LLM space, right? So, you know, I grew up with machine learning. I've been working on it for, I don't know, now 15 years on machine learning. And then before that, I worked on statistics. Entire life, we've dealt with, here's a prediction, here's the confidence interval.

All of a sudden, you're in this space where here's a prediction and I don't know the confidence interval. So it's a very uncomfortable situation, right? That for LLMs, you don't know how good the answer is. So you're having to do all these weird thing as a different LLM, as a judge and all these things. And it's a constantly moving space, right? Like you had an LLM, you're using somebody else as a judge. Now this LLM is better than the judge all of a sudden. So now you need a better judge, right?

So this gets very hard. I don't have a solution. We're doing the same thing as everybody else is doing. We have an evaluation portal where people can try their own ground truth sets, which suck because people just don't have a sense of... You're a normal user of the app. How do you know what's a good ground truth set is to figure out whether you're doing this? And then we are doing LLM as a judge and

And then we are doing human evaluation and we're doing curated data sets. Nothing fancy. It's just what everybody else is doing because I haven't seen any good answer. So lots of startups, you know, evaluation of LLMs,

Truly great, great, great problem to solve. We've gotten plugs for a couple of Lightspeed portfolio companies already. We've got Glean, got Anthropic. Okay. We'll keep it going. Patronus, we'll keep it going for evaluations. But that's really helpful. By the way, for folks who have questions, it's a very small, intimate crowd. So if you have anything that's coming up that's in the moment, feel free to fire away. But I'll park that for now. I'm on Anthropic's customer advisory board and they've been a really good partner.

So for our chatbot, right, when we were releasing our chatbot, one of the biggest scared thing was we were releasing it in June of 2024. If you just think back, not that many companies like Uber didn't have their chatbot release at the time. So we're like, wow, we're really pushing the edge and we were scared. And naturally, the fear was about the guardrails, right? What if you get your chatbot broken?

What if somebody, I mean, the New York Times front page scare, right? And they were very nice and they proposed to do a joint venture with us. So we built the guardrails model, a separate guardrail model in collaboration. And they were super helpful because we are, I mean, we're one of their early customers, big customers. And we did a really nice thing and it was, it kind of saved us so much time. Yeah. What's keeping you up at night these days?

I think, I mean, honestly speaking, you know, I tell people that most of your life at work, the general feeling is that you're pushing a boulder uphill and there's gravity and it's pushing it down and you're fighting something, right? Like once in a time in your life, the boulder is rolling downhill and you're chasing after it.

So right now we are in that phase and I have been since for the last two years. So it's awesome. My fear is, you know, I've been doing this head of AI kind of role for a while. And most of the time you have to tell people and people are used to the way they do things, right? You have to go and convince people that, hey,

This way is better. You're going to do this and stuff and so on. And now all of a sudden the whole thing switches and people just come and say, can you please do this for me? And so on. So I think that's the thing. So the part that keeps me up at night is the fact that our people don't plan for the amount of effort and the amount of sophistication it takes to

to make these solutions real. So I'm not talking about the hype bubble or anything, but it's just that people are, people's expectation, we can match the expectation, but that doesn't come for free. I mean, I've started telling people that, you know, AI is like magic, but before you can do magic, you have to go to Hogwarts for seven years.

So it's that seven years which is hard. And people don't get the fact that you can't just put an LLM and it'll do the work. You have to do a lot of work in getting the plumbing right, doing the testing, doing the measurement, doing the analysis.

and do seven iterations of it, and then it becomes really good. That's a really interesting point. But do you use generative AI in like, you have a pretty large team. Are you using tools like Cursor or vZero or anything like that? How is the eng team starting to adopt? That's a great point. And that is one of the great cases where we did a straightaway buy over build decision.

Like as we were beginning to look at generative AI, that was right when, I mean, if you remember, GitHub came up with Copilot within like two months of the 3.5 being released. And I just analyzed it with some of my team and said, whoa, this is a good product.

And we adopted it. So right now in our company, we have rolled out for everybody, which we've rolled out Copilot. And then we did Sourcegraph Kodi. And just today, we rolled out Cursor to everybody in the company, all engineers. So it's like, we think these are great. You know, they're doing great.

extremely good thing, all the developers love it and we are adopting it. The funny thing though is that, so you read, and I don't know how well it happens, I'm sure every company is beginning to measure this, right? So you say, okay, 25%, like Sundar Pichai is there, right? 25% of code is being written by things. So if you're the CEO, you think 25% of code is being written by AI. Awesome, I can have 25% fewer developers. Or we're going to have 25% more time

But it turns out that developers don't code for eight hours a day.

they only code for like two hours a day. The rest of the time, they're trying to find data, find what happens, do a debugging and so on. So the total, even if 25 percent of code, it's like 25 percent of two hours. So you're saving like half an hour a day. I think the bigger advantage is going to come from these systems or these agents, which not just can predict the next three lines of code, but actually understands the problem.

It's a much harder problem. If you think about it, most of us have been software developers in our life. You don't just sit and write code all day. You spend a lot more time figuring out how to solve the problem. And then the actual coding of it doesn't take that long. Yeah.

I was talking to a friend of mine who works on Gemini, and he was saying that 20%, 25% stat was literally about like auto-completion of code. Yes. Not about, you know, writing code itself. So that's kind of a misleading statistic. You know, when we talk to some folks about some of the issues they're facing when it comes to the adoption of AI, usually like GPU shortage comes up.

quality assurance comes up. You know, is that something that's been on your mind too? Oh, so like, I was giving a talk at GCP, the Google Cloud Next a few months ago. And one of the things they gave me was, like prompts they gave me was like, what keeps you awake at night? And I

I wrote down in my slide, you guys not giving me enough GPUs is what keeps me up at night. Then the person who was looking at it, she was like, "I need to get that reviewed by somebody." But thankfully they did. They were completely fine with it. Whoever reviewed it said, "No, this is the right problem."

So to my big surprise, getting the available GPUs was the biggest problem we faced. Like by far, like entire last year, that was by far the one that caused me the most grief. And it's because, so, I mean, at the beginning, we get our employee assistant and these assistive things for our agents and our developers, great.

you know, like whatever, 8,000 people are, like 600 people are using it. You don't hit any bandwidth, right? Then you suddenly switch from 6,000 people to 6 million people. And then you realize that there's bursts are coming and there just aren't any GPUs.

So we had a couple of instances early in the year when it went down and we really struggled with it and literally had to go and escalate it to both GCP. That was actually the reason why our main solutions are across both AWS and GCP. Literally, that's the only reason. There's no reason to do it otherwise. It's the same model.

There's load balancing and the ability to get capacity throughput in those places. And I don't blame them. I mean, I used to be a GM at AWS. I know Atul, who's the GM for Bedrock at AWS very well. Their problem is that

these new models are showing up every month. And when a new model comes, you don't know whether to run that model on 1,000 GPUs or 10,000 GPUs and how the demand across models are going to switch around and which ones are going to go to Lama and so on. So there's not enough predictive ability on it. And just to unpack that a little bit more, can you tell us about what is the workflow? Because you're using specific hosted instances and putting these models. Are they like...

specific weights or specific kinds of models that you're hosting on these GPUs? - We have many different use cases. For some of our use cases, we are having our own trained models based on the LAMA family. These are hosted internally, but these are not high bandwidth ones.

These are typically the ones where we have some legal security reasons that we don't want the data to go out. But these are small use cases. No problems, right? The big use cases are hitting for us. There may be other companies who are doing differently. But for us, the big use cases are hitting these models particularly. Actually, both Claude and now we have Gemini also. Both Claude and Gemini.

Now, what happens is that even if you think of it as something as simple as a chatbot, which is the most common use case. So it's not like the user says something and we send it to a chatbot. It's a chain with between five to nine LLM calls within that chain because we have to figure out is the user saying something bad? What does the user mean? Because somebody they says can't send crypto.

Almost all right, you need to get more information. You need to get context. You have to figure, do a rag call. You have to get information. Then you make a call. Then you have to change the system so that it sounds empathetic.

Depending on what the answer is. So it's actually depending on which side of the flow chart you're on, it takes between five to nine calls. So that's what we're doing. And when we hit these many, like every customer, a million customers, you take these calls, you hit them with nine calls, it adds up pretty quickly. Is there a requirement for

specific kinds of GPUs? Like, is there homogeneity that's needed for the kinds of models or like, is there a specific kind of instance that you always need from GCP or AWS? I mean, we try to get the biggest one we can get. So it's not, I don't think, I mean, you know, we don't really specify because we are not doing it. I mean, we're not big enough to demand an isolated cloud instance. That's too big. So we are shared instance. So we basically work off latency.

and bandwidth. I see. Got it. You know, if you had to look out maybe a year from now, you know, as you're starting to build up your expertise and obviously Coinbase is on a roll, crypto is doing well. So there's a lot more added pressure on your team. I guess, what is your team going to be doing a year from now? What are the deliverables that you want to hit over the course of the next year? I think there's two axes where we're really trying to make and the two axes are at conflict with each other, right? So one axis is that we always worry about the bull run.

So, you know, a bull run is happening right now. We don't worry about a bull run. Just to be clear. We worry about a bull run all the time. I mean, in fact, honestly, I don't know if you know, there's this for those of you who are in crypto, the forever, the meme has been crypto goes up, Coinbase goes down because the Coinbase side, whenever we, this is the first time, like literally, we are super proud of it that the election time and all the crypto run, our side stayed completely up with no problems because of the enormous amount of

investments we've had to do in the platform to make it all work smoothly. But of course, there is a capacity issue, right? I mean, we are handling what's going on right now just fine. But what if it goes up 10x? And so that's one side of it, right? So it's not just me. Everybody in the platform is worried about the fact that we don't want that to happen. And there's a lot of work that goes in and we have some headroom.

but I don't know how much. I mean, we're estimating how much, but these things are very hard to estimate because when that many users come, there's so many, we do load tests and do everything.

So that's one axis. The other axis is, but now naturally that sucks up a lot of resources and thinking time and so on. The other axis is that, you know, there's so many new features and new capabilities that we are trying to build. There are so many new spaces that we are trying to do. As you might imagine, Coinbase is a very regulated company.

because we end up getting hit with two sets of regulations. We get hit with the regular financial regulations because we have people's money, but then crypto tends to have their own sets of regulations in many, many jurisdictions. So we have a lot of people in the company, humans, whose job is to make sure that we stay compliant and we follow the regulations and so on.

And naturally, these processes are not efficient. Laws change. Laws are written in many languages and so on.

These are all use cases for AI. Earlier, for example, let's say a new law is written in Philippines. It's published in the Filipino language. So you see there's a law that's written. Somebody in the Philippines tells you that the new law was published. So you get hold of the law. You hire someone who can translate it. You wait three weeks. You pay them some fairly large sum of money. And three weeks later, you have the version.

Now, literally, we can do this in no time. So there are all these great use cases and we are trying to, there are many operations in the company and we're trying to optimize many of them. But that's a lot of work because of the Hogwarts seven years problem, because most of these things are not geared for computers. They are geared for human beings. And we have to do a lot of software work to

to make sure that these work. So those are the two axis of problems. - That's super helpful. We would love to hear from you folks. I'm sure you have questions for Rajeshree. So please tee them up. I'll have just one thing I'll ask you and then please start to fire away.

There are folks here who are excited to either start something or, you know, I'm sure there's an opportunity to squeeze in like a request for startups or something you want to see get built. What does that look like? Yeah, I love that. So I think what is happening is that, I'll give you a quick answer. Like we talked about a few little things here and there. Like I said, for example, a fundamental startup that does real LLM evaluation in a scientific way would be super useful for the industry. And I'm sure you guys would get enormous valuation.

But a broader set, which I think is, and you're reading this in the news right now, right? That what is plateauing is the fact that the training gains are plateauing. And this was known because as early as GPT-4,

We're pretty much given it all the written knowledge and the net was already there. So you knew that it was only the training gains that was coming, not the data gain anymore. But I think the enormous gap is from the capability of the crypto to really solving the real customer problem, especially in enterprises.

There is such an enormous amount of money in enterprise problems that can be solved with Gen AI. It is unbelievable. It's not easy though, because every company, the processes are different. The tooling is different. The data pipelines are different. But companies that are going to be able to solve this, I mean, just take, we talked about Glean, right?

Enterprise search, I mean, you would think like internet search, which is a 100x bigger problem, was solved in 2005.

It's like, why did it take 17 more years to solve enterprise? Enterprise search used to suck, right? It was so bad. And why did it take that long to do enterprise search? Because the enterprise plumbing and the things are so difficult. And somehow this is a great company and they managed to do a bunch of cool stuff. In the space is that there is so many problems. I mean, you look at anything. You look at HR, you look at finance, you look at legal, you look at finance.

All these operational functions in a company are ripe for improvement, but they don't know AI, they don't have the data pipelines. So this gap between what the AI can do and what the real problem solutions are there is huge. So that's my advice. We're wanting a startup, figure out the startup that is using these tools

But that is solving the problem. And the AI is not the hard part of the problem. That part has already been solved. But how can you use the power of the Gen AI models to solve this problem? That's where the big space lies. Awesome. Thank you.

Based on what you just said, I think that, isn't that like what SAP is kind of positioning themselves because they're being, having, they have all the data of all the enterprises, right? Every company is saying they're doing AI today. It doesn't matter what you are, you know, like a tire shop on the street is also using AI to change your tires. So it's,

That's not the point. Sure, SAP has the advantage and Salesforce has a ton of data and they are doing AI and they are doing a lot of AI. But yeah, I mean, if they can solve it, great for them and they'll be even more valuable, but I am not seeing the solutions yet.

So that's my real question. So then what is the gap, right? Because I don't really know. I don't know any solutions either, but we don't have the visibility because it's all within the corporation as well. That is exactly the problem. Like for a startup, the fact that the data isn't easily accessible is a problem. So here's what's happening. So every startup or every company that is in any space is saying we're doing AI.

But in order to do AI, you have to build an AI team. So we had this situation where just as an internal trial, we wanted to pick one of our non-technical teams and do a project with them. So a couple of my guys went and built a thing. And these guys loved it. They were like, oh, this is such a useful thing. But, you know, we want an app for it. We're like, okay, we're an app building team.

Now, at the same time, this group is getting hit with many startups. We're all saying we're doing AI and we're doing this thing, right? So we encourage them, then they decided to do a bake-off. The bake-off was there are these five companies and they were going to use us as the benchmark and say, okay, you know, we're the benchmark and we'll pick the best.

So it turned out that we were way better than all five of them. So these guys were like, well, this is free because you guys have already built it. So we're just going to pay this internal team to build an integration with our external tool. And so we're just going to use it now.

So it's that if the fundamental knowledge exists, the people don't. There is a big skills gap. And if you are a startup building databases or you're a startup building sales recommendation, I mean, if you're big enough at Salesforce, you can build that team.

and you certainly have an advantage, but I think there is a lot of space for a startup to come in. In fact, if you are building on top of a platform like Salesforce, that's actually good for you because Salesforce already has all the data. So you don't have to do integration with 20 things. You just do integration with Salesforce, you do the optimization thing, and it works.

It's when you have to integrate with like, you know, 17 different enterprise tools, like in security, right? If you're going to analyze security logs, it's such a big challenge because, you know, there are 50 different types of logs. But, you know, that doesn't change my fact that there's a big gap between the technology and the solutions, and there's a lot of money to be made here. Right?

Hi, thanks very much. I'm Eugene Chung. I am, since 2013, I've been a Bitcoiner and mostly happy Coinbase customer. If you held on to your Bitcoin, you're really happy. Yeah, I'm not much of a degen. But over the years, I've been lucky to see the various cycles, the various hype.

Um, so there's, uh, you know, there's obviously the DeFi summer, there's ICO boom, NFT craze, all these other things. Now, of course we seem to be in a boom of agentic AI. We have things like Mark Andreessen giving AI bot truth terminal 50 K of Bitcoin, and then having it quote invest. Uh, and now we have these meme coins like Gautius Maximus, uh, and

AI16z, unrelated to A16z, touching about a billion dollars, I think, in market cap as of today. So I'm curious, given Coinbase's history of reacting to market trends, is this a trend that's interesting to you all? And if so, where do you predict some of the integrations could be with agentic AI? Well, I'll call them agentic AI-themed coins, meme coins, because...

a lot of them don't have much in the way of sophisticated AI. - Yeah, so I think agentic AI on crypto, on blockchains is very important because if you just tear away all the hype and everything, blockchain brings certain very, very interesting characteristics. I mean, honestly, when I was interviewing at Coinbase, one of the questions was like, what interests you? And my answer was all about blockchain, not about crypto.

as an investment vehicle. Because the fact that blockchain is the first technology that really makes distributed computing possible, because it provides the ability to do provably,

immutably and anonymously. The anonymous is not as important, but provably and immutably. And then it also has incentive trading mechanism, which is Bitcoins, right? Okay, that's a lot of technical jargon, but in reality, one of my colleagues at Coinbase said it very, very nicely. He said, hey, an AI agent cannot own a wallet with cash, but they can own a crypto wallet. So I think that is a space that we love.

because it allows, I mean, crypto wallets are big for us and the agents allow a wonderful mechanism for, well, this is about the only mechanism available for agents. So I was just talking to people over snacks a little bit earlier that today in the internet, if you want to share, exchange $20, you know, you can Venmo and there's all these mechanisms. But if you want to send 0.2 cents,

There's no mechanism. You can't really send 0.2 cents between each other. And micropayments are such an important tool for these kinds of agentic distribution, and we can make that available. Especially if you come to Splug for Base, if you come to things like Base, which is very, very low transaction fees. So we think we love this methodology. They're two separate questions, right? Like as an AI person, what I think of agentic AI, and then

Is crypto going to make agentic AI? And yes, not crypto necessarily, but blockchains and the ability to exchange crypto payments solves a huge problem for agentic AI, which we absolutely adore. And then the other answer is like, do I really feel that agentic AI? Yes, I think once again, I mean, maybe the hype is expanding on like what agents can really do, but agents can really solve problems. And the ability to take a problem, break it up into smaller things,

put the answers back together, that's quite powerful. Quite powerful and that's, I mean, to go back to my previous answer, that's a great way by which these complex enterprise problems are gonna get solved.

Awesome. Thank you. Hi, how's it going? And thanks so much. I was curious to hear more about the guardrail product and also just how you think about guardrails from a framework perspective, right? So you're in a world where like, I want your framework, but like hypothetically, it could be, you know, bots, LLM experience could be informational. Information could be generic or it could be personal or it could also be authentic and go take actions. And in your particular product, those actions can be quite expensive if done wrong. Yes. So anyway, I'm just curious to hear a little bit

a little bit more about the guardrails and how you think about the structure. - You're absolutely correct. Actually, in fact, honestly speaking, we can take actions. I mean, even forget LLMs, even before that, you could take action on your bot, right? You could talk to a chat bot, which was pre-LLM days, right? Which was like, you know, you click saying, "Which of these seven choices you want? I want to send and I want to do this."

Like, we forget chatbot, right? I just, you know, booking, we're traveling, doing a vacation. I had to make some hotel changes. I went to Expedia. Nice chatbot, very controlled old school chatbot, but says, are you trying to do one of these things? Yes, I want to change dates. All right, which one? I pick, it gives me the three options of my hotels I have. I said, this one. He said, well, your dates are from 23rd to 25th. What do you want? I said, I want to change it to 26 to 29. Okay, this is the price I have. Do you want to do it? I said, yes, and it does it. So,

So you can take action, which can be reasonably expensive. I mean, thousands of dollars, and they'll do it for you. And we can do that too. It's not a stretch. Even pre-LLM chatbots were perfectly capable of doing a set of actions. And that's okay because we have other ML models that are looking to say, if it says send 75 Bitcoins to something, something else will trigger.

and that'll stop your transaction, but not the chat bot-ness of it. To answer your broader question, which is how do you do guardrails? So guardrails are hard. They are easier for internal facing. I mean, I would say guardrailing is the main reason why if you look at the slew of products we are releasing,

Most of them are internal and a few of them are external. But of course, the external ones are the largest scale, the largest money and so on. Now, in order to do guardrails, you have to have different levers of guardrail, right? So one lever of guardrail is to make sure that you're not giving any information extra or you're not saying a particular type of tone that you're not supposed to.

Another level of guardrail is the fact that it actually looks at what's coming in. So as you give it more capability, you have to keep upping your guardrail. Because the first version we released, you were absolutely bang on target. The first version we released was only informational. It would only look at generic information. The second version is one that looks at information on your account.

So you have to have the guardrails to make sure that I cannot look at your account and give you information about Dylan's account. And then the third version is the one which can actually take action on your account. And for every one of them, you need to have the guardrails go hand in hand and do it. So no, there is no simple answer. It's testing, it's mechanisms. We are actually, there's one very interesting use case of the guardrails that we hadn't planned on at all.

we are using the guardrails to protect our human agents. Because human agents get hit with intimidation, threats, things that a human, I mean, an agent really shouldn't be having to deal with, like abusive language and so on, which we hadn't planned on it. But then we realized that, heck, we have this guardrail,

And it was just completely for tweeters, right? We were in a meeting, it's the same CX team, right? We're meeting with them, they were talking about someone, one of them had visited the center and had seen the kind of things that they had shown. And then we're like, wow. And then as they were talking, my product manager was just on 15 or 20 and our guardrail engine caught every one of them and said, no. And then we said, wait, we already have a solution. We should just use this to, so we of course had to change the,

Then it's just a software work, right? Because the protocol was you did the chatbot and if the chatbot didn't answer, it was going to the agents. Now, every time we went to the agent, the agent had to make separate LLM calls. So a bunch of software work, no machine learning work. So we're doing a lot of guardrails and sometimes you get freebies.

Really appreciate the discussion. I'm Josh, founder of AUX. It's an AI job training platform and enablement for businesses. I want to touch on how important the confidence interval is and evaluating LLMs makes me think back to a previous life when precision and recall mattered and used to

measure, you know, each model iteration as it came out. Can you dream of any way to kind of bring back some of that rigor to modern LLMs? How to even start on that problem? So I don't have an answer. If I had the answer, I would really do it. And it's probably because if you really think about it, right? So just

I'm aging myself here, but my whole machine learning background is pre-LLM days. So the LLMs happened in the 2014, 15, 17 timeframe when I was already in companies and not doing a lot of original work anymore.

So I don't have an answer here, but I'm not the best person or qualified person to do it. There are a lot of very smart people who are working on this problem in both academia and university. And I am honestly a little bit surprised that this rigor is not coming out. And someone, I asked this question to someone in academia, actually, and the answer they gave is kind of there. And their answer was that

LLM for the first time is basically mimicking humans' speech. And all the mechanisms that we had developed about accuracy were pre-days, where they were trying to look at math. And all of a sudden, here is something that essentially mimics humans, and we don't know how to measure that.

So, which is probably a good representation of the problem. But to me, it's not an answer. I mean, we should, as an industry, be able to figure it out. And that's my ask. But I'm not qualified enough anymore. It feels like maybe, you know, a lot of the focus has been on LLMs, right? And I think maybe looking at 2025, we have some...

we will go from probabilistic to deterministic SLMs that are more niche, more nuanced, that can avoid hallucinations, can understand how to work on guardrails, evaluations become easier, more math-driven. So maybe this was the impetus we need to go from- Yeah, I mean, honestly speaking, look at how fast the thing changes, right? So even in June, like now in December, like in June when we were working on it and releasing it, the biggest problem everybody was worried about was hallucinations.

But especially for enterprise use cases and small language models and RAG,

We don't see hallucinations. It's not really, I mean, you know, we never really solved the problem, but it kind of went away. But just by changing the constraint parameters, like once you put constraint, the hallucination doesn't happen. Like really, we rarely see hallucinations in our thing. It's like, all right, I mean, it says no. But yeah, I mean, I think you're right that SLMs are more deterministic. They're more accurate. I'm not sure if it's more deterministic. It's definitely more accurate. Sure, more accurate.

Thank you so much for being here and spending some time with us. We really genuinely and sincerely appreciate it. No, we love the questions, love the engagement. Thank you for inviting me. Yeah, maybe a round of applause for Rajesh Shree. Thank you. Thank you.

Thank you for listening to Generative Now. If you like this episode, please rate and review the show. And of course, subscribe. It really does help. And if you want to learn more, follow Lightspeed at Lightspeed VP on X, YouTube or LinkedIn. Generative Now is produced by Lightspeed in partnership with Pod People. I am Michael McNano and we will be back next week. See you then.

Rajarshi Gupta: Artificial Intelligence and Crypto at Coinbase 43:30 Share

Generative Now | AI Builders on Creating the Future

Deep Dive

Shownotes Transcript

Rajarshi Gupta: Artificial Intelligence and Crypto at Coinbase