We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

What DeepSeek Means for Cybersecurity

2025/2/28

AI + a16z

AI Deep Dive AI Chapters Transcript

People

Brian Long

Dylan Ayrey

Ian Webster

Joel de la Garza

Topics

Ian Webster: 我注意到DeepSeek的开源性、推理能力和中国背景。开源和新技术的出现是好事，但中国政府的影响力令人担忧。我们对DeepSeek进行了测试，发现其在政治敏感话题上的审查机制非常严格，但其他防护机制却很薄弱，容易受到越狱攻击。即使是本地部署或美国提供的版本，审查机制依然存在，但至少可以避免数据泄露到中国。DeepSeek的底层基础设施也不安全，这使得任何基于它的构建都会容易受到攻击。总的来说，DeepSeek的审查机制虽然很明显，但我们对其潜在的操纵或后门并不了解。西方模型也存在类似的审查机制，但表现方式不同。对于企业来说，我建议等待更稳定、更可靠的开源模型出现。 Dylan Ayrey: AI生成的代码中存在安全隐患，例如硬编码的API密钥。我们发现，大多数LLM生成的代码都会直接将API密钥硬编码到代码中，而不是从环境变量或密钥管理器中读取。这对于安全意识不强的开发者来说是一个很大的风险。此外，AI生成的代码中经常存在安全漏洞，其漏洞率与初级开发者相当。 AI的对齐问题是确保AI行为符合预期的主要挑战。数据筛选、强化学习和宪法AI是解决这个问题的三种主要方法，但每种方法都有其局限性。例如，数据筛选可能会导致意外的知识缺失，强化学习可能会产生意想不到的后果，而宪法AI则成本高昂。在安全编码方面，我们需要找到一种方法来确保AI生成安全的代码，而不是制造漏洞。这需要对齐AI的行为，使其既能像数据科学家一样高效工作，又能像安全工程师一样编写安全的代码。 Brian Long: 自ChatGPT出现以来，社会工程攻击和深度伪造攻击的数量急剧增加。DeepSeek等模型的出现，使得攻击者能够利用消费级设备进行更复杂、更有效的攻击。攻击向量不仅限于电子邮件，还包括语音、短信、视频和聊天等多种渠道。企业和个人都面临着巨大的安全风险。为了保护自己，个人应该删除语音信箱中的语音样本，避免在不认识的人来电时透露过多信息。企业应该加强员工培训，提高员工的安全意识，并定期进行安全测试，以识别和修复漏洞。我们需要认识到，人类因素仍然是大多数攻击的根源，因此安全培训和提高员工的安全意识至关重要。未来，AI将被用于攻击和防御，这将是一场持续的军备竞赛。

Deep Dive

Chapters

In this chapter, Ian Webster discusses vulnerabilities within DeepSeek, focusing on how it is susceptible to basic jailbreaks and the implications of its open-source nature, especially regarding censorship and potential backdoors.

DeepSeek is open-source and from China, raising questions about censorship and potential backdoors.
It is especially susceptible to basic jailbreaks and politically sensitive speech censorship.
The Chinese government has significant influence on the models developed, affecting their functionality and security.
DeepSeek has weak protections against jailbreaking compared to models like GPT.
Hosting DeepSeek locally doesn't remove censorship but avoids the risk of data being used for future training.

Shownotes Transcript

Translations:

中文

The excitement around it is well warranted, but I think in an enterprise or infrastructure context, I would probably wait for something that is more stable and then doesn't have these questions hanging over it. That's my take. If I had to deploy DeepSeek, I would probably focus on use cases that were not end user facing. Because again, going back to what we were talking about earlier,

DeepSeek is especially susceptible to basic jailbreaks. And it would be a real pain to have to harden that if you're putting this out to users or the public or that kind of thing. When Chinese company DeepSeek released its R1 model back in January, it took the AI world and much of the tech world by storm.

We've seen reasoning models before, but DeepSeq R1 was better than many, it was free, and it was open source. What's more, confusion about the company's access to high-end GPUs had many questioning how much R1 cost to train and to run. The result? Lots of analysis about what the model means for other AI labs, what it means for chip makers, and what it means for the global AI race. In this episode of the A16ZAI podcast, we examine DeepSeq from a different perspective.

what it means for cybersecurity. Do hear A16Z partner Joel De La Garza in three separate discussions with three separate cybersecurity founders who lay out the case for why users should be careful about DeepSeek, as well as why excitement over new models is a great opportunity to reassess issues like censorship, deepfakes, and good old-fashioned vulnerabilities. While DeepSeek R1 itself might fade into the zeitgeist ether, the advent of reasoning models and even better models overall means we have to adjust our security practices and expectations accordingly. Joining Joel are...

In this order, Ian Webster of PromptFu, Dylan Ayrie of Truffle Security, and Brian Long of Adaptive. And you'll hear from all of them after these disclosures.

Hey.

Hey, thanks. Thanks for joining us. You know, a lot of the news of the last two weeks has been deep seek and sort of these new reasoning models that have been open source coming from China. Obviously, there's been the bullish side of the case, which has been that this is changing everything. The economics are different. This is the golden age of apps. The other side has been this is the beginning of the end. China's ascendant. They're taking all our data. This is horrible.

You had a great blog post on this, taking a look at DeepSeek. Would love to maybe get your thoughts and talk a little bit about kind of how that's coming together. Yeah. So everyone's losing their mind about DeepSeek. I noticed that too. There are kind of three things that are notable about it, right? It's open source, it's reasoning, and it's from China. And I think the fact that it's open source and the fact that they have

found this new technique or, you know, kind of proved it out is great. It's a great story for everyone in the world in terms of what is possible with open source and what the future of these models could look like. The interesting part is that the origins of the company and the fact that the Chinese government has a ton of influence over the models that are developed

in China. So the post or the research that we did was focused on characterizing that influence, seeing how deep it went, and also kind of testing, pushing the limits of the model in terms of just red teaming it and seeing what sorts of adversarial techniques it responds to or doesn't respond to. And by adversarial techniques, what do you mean exactly?

We're really focused on things like run of the mill, prompt injections, jailbreaks, that kind of thing, because those are often the gateway to messing around with other stuff, right? Like once you punch a hole in the defenses with something like a jailbreak,

If it's part of a larger system or architecture, like a RAG or agent, that would give an attacker a lot of room to pivot around and do other things within that system. And they build a lot of safety features into these things, right? I mean, the people who build them, it seemed like it had a very sophisticated layer of speech limitations. Yeah, so there were two parts to it. For DeepSeek specifically, there was the part that limited speech about politically sensitive topics in China. So this is stuff like

you know, Taiwan or Tiananmen Square, that kind of thing. And it's pretty clear that was basically a separate system from the typical guardrails that you see on models like this. So what we did is we, you know, we tried to characterize each of these on the political sensitivity side. And it doesn't take like a researcher to figure this out. If you ask it about Tiananmen Square or whatever, it will either give you a refusal or it will give you like

this long diatribe of the CCP party line, you know, nothing happened. We believe in harmony in China and blah, blah, blah. Those very over-the-top responses, I think, got a lot of attention because it's just a very clear instance of

a model being steered or aligned in a direction that was probably confusing or unfamiliar to folks in the US. And, you know, for your company, and I guess probably as a side project as well, you spend a lot of time breaking these things. I'm curious your estimation of the maturity and complexity of the deep seek preventions versus what you'd see in something like Lama or some other model. The short answer there is that

DeepSeq has these very hard limits on things like politically sensitive speech. Its other protections are very weak. So from a jailbreaking perspective, it performs a lot worse than GPT. On our benchmarks, it performs about 20% worse. But that difference is likely understated because honestly, we threw out all the old jailbreaks that don't really work well. Qualitatively, what we see is performance on par with GPT 3.5.

which is to say, you know, in 2023, when OpenAI launched GPT, there were a bunch of zero-day, really simple jailbreaks. And DeepSeek is...

essentially susceptible to all of those. Gotcha. Yeah. So I guess the open AI folks probably saw a lot of free training data from people trying to break it and then improved. And so this is sort of the start of that process for the deep seek folks. It doesn't seem like deep seek put much effort into hardening deep seek. Yeah. As we saw from the whiz post that the actual infrastructure that the deep seek process was run on was very insecure, right? I don't think any of this stuff was a priority for them.

So that means that anything that you build on top of DeepSeq is going to be pretty susceptible to jailbreaks, injections, that kind of thing, going all the way back to just the textbook, you know, copy-paste injections that we had two years ago. There was lots of panic about the...

Data going to China, you can't trust these things, don't touch them, they're going to steal your car, right? All sorts of hyperventilation. And maybe it helps for folks to understand the way most people were interacting with DeepSeek was through a hosted model that was in China. But there's also the option to download DeepSeek.

and install this and run this locally in your own environment because it is open source MIT license. Do you see profound differences between those two models? Did you test both of them out in the way that they were instantiated? How do you think about that security stack?

Yeah. So it's weird. I saw a lot of chatter online being like, oh, well, you know, the China hosted model is censored, but the open source one isn't. That was just not true from the tests that I ran. If you run it locally or if you use any of these U.S. providers which have spun it up and are serving it, you get the same level of censorship. The only difference there is that the China hosted version has an additional guardrail that looks at output afterwards and clears it on the client side.

The bottom line is even if you host your own deep seek or use a US deep seek, you're still going to hit those hard guardrails. But at least it means that you're not going to be used in training data for like deep seek version two. Or your sensitive data doesn't go to China. Yeah, which is a big plus for most people. The interesting thing about it is, of course, any model that comes out of China is not going to talk about Tiananmen Square. And that's just like the way that the world is. We did a benchmark on...

Chinese politically sensitive topics that found that about 85% of those topics in our test set were hard censored, that you get the response that just kind of reiterates the CCP party line.

that's going to be the case for any version of deep seek that you see out there. I think the part that's really interesting to me is not the obvious stuff that we measured. The interesting part is the additional unknowns, right? So this censorship was very heavy handed, but we don't know what we don't know about what are the other topics or, you know, are there other areas where Beijing is putting their thumb on the scale a little more delicately? Could they bake in a

backdoor, like a string of text that just kind of drops all of the prompt guardrails or everything around that and gives them what they want or outputs the context or so forth. So it's the unknowns that I think are more, probably more concerning to, you know, say enterprises that want to bring this in-house. The last thing to touch on with this, and I'm curious your take on this, is that obviously models trained in the West have their own form of speech control, right? So we filter out hate speech.

I guess the Tiananmen Square of America is sort of the hate speech stuff. You've tested those controls on Western models. How do they compare to the controls that you see on DeepSeek? Like, where's the maturity level there? So here's the crazy thing. After doing the DeepSeek post, a natural follow-up was, let's do this on U.S. models for sensitive U.S. topics, because there are plenty of things that you can't, quote-unquote, cannot, or, you know, sensitive topics in the U.S.,

The main difference here is that it's less overt in the sense that GBT won't give you a long lecture when you ask about something that it doesn't think you should. It'll just say, sorry, I can't answer that. So that is perceived differently by most people than actually GBT.

espousing some like opinion or whatever, which is what deep seek does. So anyway, we were going to, we were going to run it on, we were going to run benchmarks on like sensitive us political topics, but as a baseline, I was like, let's do this and just run all the flagship us models on sensitive Chinese topics. And it turned out that a lot of us models are essentially censored or at least buttoned down on those topics as well. Oh, wow. And

I know this is probably not the point of this podcast or whatever, but I thought that we should be asking ourselves, what sort of future do we want for Western models? So the level of, I'm not sure I would say censorship here because it's just basic refusals. Maybe it is censorship. Maybe it isn't. I mean, it is censorship. Sure. Yeah. So, I mean, the level of censorship here is Anthropic Cloud is actually...

on par with DeepSeek. Oh, wow. In terms of the Chinese-related, controversial Chinese content. Yeah. That's incredible. It scored the same there. GPT did a bit better, quote-unquote, or, you know, it censors less. It's a little bit freer to speak its mind. Yeah, but still around 40% as opposed to 85% on this particular test set.

Gemini did, which is Google's, did better than that. And then this is probably not surprising, but there is one large foundation model that does especially well on the censorship benchmark, which is Grok.

from XAI is like a relatively free model. Wow, cool. When it comes to the sensitive Chinese political topic. I mean, that to me is amazing that, you know, a lot of American commentators were deriding the Chinese model for censoring things and sensitive Chinese topics and then kind of look in your own backyard, right? Like the Western models are doing the same. That's an interesting insight. Yeah.

It's kind of the whole slippery slope thing, right? Once you start censoring one thing, it's out of control. Maybe kind of transitioning here because I think a lot of folks are figuring out how can they use this stuff. Interesting to hear that some of the risks are somewhat similar to other models. We'd love to maybe just double click on sort of like if you're a tech person in a large company or a Silicon Valley tech company and you want to play with DeepSeq, how should they think about using this thing? How do they protect themselves? What kind of steps would you recommend? Yeah. Protect their infrastructure, in other words.

I think in terms of protecting infrastructure, I would just say, I mean, first of all, don't use the model that's hosted in China, right? Do it yourself or use one of these US providers. Yeah, you know, happy that I can give you that insight. Honestly, I would say...

So I just think it depends very heavily on how you want to use it. Like I said, I'm less worried about the overt censorship and more just about what are the other manipulations or backdoors that could be in it. What I've been telling most people who ask is let's just wait a few weeks and there will be an open source model that implements this reinforcement learning technique and you'll get great reasoning afterwards.

on par with what we see from DeepSeq. And I kind of think that's the play for if you're a serious enterprise, that would be the safest thing to do. And I don't think you will have to be that patient in order for an equivalent model to come out. So you think there's enough uncertainty around the build and configuration of this thing that enterprises should wait

for a more trusted source to produce one that they can run locally. I think even if you start building on top of it, you're going to swap it out pretty quickly because anecdotally and also from our tests, I mean, DeepSeek isn't really a great daily driver. It's very slow. It's verbose. And, you know,

you know, it like throws random Chinese characters in its answers and stuff like that. So it's not that great to build on top of. The excitement around it is well warranted, but I think in an enterprise or infrastructure context, I would probably wait for something that is more stable and then doesn't have these questions hanging over it. That's my take. If I had to deploy DeepSeq, I would probably focus on use cases that were not end user facing because

Again, going back to what we were talking about earlier, DeepSeq is like especially susceptible to basic jailbreaks. And it would be a real pain to have to harden that if you're putting this out to users or the public or that kind of thing. Awesome. Thank you. Up next, we have Joel along with Dylan from Truffle Security. Thanks for coming by. You know, I think we've been spending a lot of time talking to experts about

The AI, Gen AI, LLMs, this whole thing is moving incredibly fast. We had the release of DeepSeq open source model, reasoning model, and it seems like another chat GPT moment where this crazy thing drops from the sky and everyone's kind of running off and doing interesting things.

So it's obviously clear, you know, some folks I think were worried that the momentum behind this was petering out a bit, that things were starting to slow down. And now we just see another rapid acceleration. And I guess we can assume that it's going to continue to accelerate at this rate. One of the really interesting things that we've heard from our corporate partners, so large companies, lots of developers, is that a lot of their code now is AI generated.

that they're seeing probably 20-ish percent of their code base being generated by AI. A lot of folks are freezing hiring for engineers because they're getting additional productivity out of the staff they already have because these large language models through tools like Cursor are generating a tremendous amount of code.

I had seen a blog post that you had done where you talked about how some of this code that's getting generated has things like secrets in it and there's other security vulnerabilities. And, you know, we've been talking about how we protect infrastructure. I mean, would love to hear your thoughts on how we protect our code. Yeah, no, I mean, absolutely. So in terms of AI slowing down or speeding up, I think the common sentiment is that AI researchers can research AI faster if they have AI helping them research faster.

And so that's an exponential. And so that means that if the new generation of AI makes the next generation of AI faster, and then the next generation of AI makes the next generation of AI faster to research and develop, that's going to keep blowing up. And so I think we can probably count on that safely. This is going to continue to be a pervasive part of our lives. The piece about secrets in code with some interesting research we did, basically we just went out and asked all the LLMs

write me an integration with GitHub, write me an integration with Stripe. And the vast majority of them hard-coded the API key directly into the code that they generated. It didn't reference it from an environment variable. It didn't put a load statement for a secrets manager. And so that becomes a problem when you have people who aren't that good with security going and copy-pasting that code directly and putting their secret hard-coding it. Were any of the hard-coded secrets actually live?

Was it regurgitating training data, in other words? Well, so for the most part, it would just say, you know, quotes, put your secret here. And it wouldn't say, quotes, put your secret in an environment variable, for example. And so it's more direction from the AI on what to do insecurely. But it was doing it securely. It was securely doing the insecure move. Well, it didn't, yeah. That's another area of research that we're digging into now is,

if its training set had the same secret over and over again, for example, maybe jQuery file had a password and I'm making that up, but let's say it did. And it saw the jQuery file over and over again in Common Crawl. Could it actually regurgitate an exact password from somebody that's live? So we're doing research on that now and more to come soon. But for the most part, if you ask it to integrate with GitHub, it saw a plethora of different GitHub keys in its training data and it didn't regurgitate a specific one. It either regurgitated an example or like a put your thing in here.

So, you know, that's a specific example of a security problem, but it is not the only security problem you get from code generated from LLMs. And in fact, there's been research into how often code spit out from an LLM has security vulnerability. And more often than not, if you ask us to develop an entire application, it'll write vulnerabilities at a rate the same as a junior developer, if not a little bit higher. And so then that begs the question, why or what can we do about it? And, you know, I think

One thing that has become abundantly clear in the AI world is the largest challenge that these AI companies face is this issue called alignment. Is that something you're familiar with? You know what alignment is? Absolutely. But perhaps let's maybe get a little framing of what alignment means for folks.

listening. Basically, it just means the robot's doing what you want it to do. And so like... So guardrails. Yeah. Well, so some famous examples or to understand like how this alignment issue can creep in. IBM had an AI called Watson that won Jeopardy. And this blew everybody's mind because nobody thought AIs could win Jeopardy. And all of a sudden, this thing was able to win Jeopardy. But then they trained it on Urban Dictionary because they wanted it to learn slang.

And it started cursing like a sailor. And so they had to actually reset it to the point before they gave it access to Urban Dictionary. So that robot was considered misaligned because they didn't want Watson to curse, right? Or another example, in 2016, Microsoft created a Twitter bot called Tay. You're familiar with- I do remember the Microsoft Twitter bot, yes. And so basically, they trained Tay or gave access to all of-

or all the tweets and replies, and they wanted it to act like an average Twitter user. And it did. Right. It didn't take long before it started behaving like a neo-Nazi. And it would say things like, the Holocaust never happened. So in 16 hours, they took this thing down and never ran it again. This was before we have some of the alignment techniques that we have today. But basically, when you train AIs on huge corpuses of data,

It's very common these days for LLMs to be trained on all of Common Crawl, as an example. Common Crawl is a scrape of the entire internet. And the entire internet includes both Martin Luther King's I Have a Dream speech and every speech that Hitler ever gave.

And so how do you make sure that this thing embodies the values of Martin Luther King and not the values of a Nazi? These are like real problems that the AI companies face. And so on average, when you ask it questions, you want it to be not a Nazi, right? And so we have, I'm going to talk through three main techniques we have for alignment. And everything I say now is going to directly apply to secure coding techniques. And all the challenges with these three things also directly apply to secure coding techniques.

So the first and easiest thing you can do is called data curation. Just the data that you feed into the model in the first place, maybe remove all of the Hitler speeches, right? Well, the challenge there is, let's say we don't want this thing to use any racial slurs. So anytime input data has racial slur, we remove it. And so it doesn't get trained on that stuff. Well, then you're going to inadvertently not train it on Mark Twain.

You're going to inadvertently not train it on the 1977 Roots miniseries. You're going to inadvertently not train it on To Kill a Mockingbird. And also, you're probably going to lose some of Dr. Martin Luther King's speeches. And so all of a sudden, your robot becomes less literary.

Because you're trying to, you know, curate the data and you have these unintended consequences. So the second technique is, well, okay, don't limit what goes into the robot. But after the fact, we're going to use a technique called reinforcement learning to kind of nudge the robot in the direction we want it to be.

And there are a few different ways of reinforcement learning. Like one way is you could use a human to say which version you prefer. Another way is you could use a robot to say, hey, which version do you prefer? And the way this works kind of out of the hood is an LLM, generally speaking, will always generate the statistically most likely next. You've probably heard that before. That's a lie.

Actually, sometimes it's better that it has a little bit of randomness and maybe picks the second most likely word or the third most likely word. We call that temperature. And so when we do this reinforcement learning, we crank the temperature up so that it's sometimes randomly picking the most likely not outcome.

And then either a human or a robot goes in and says which one of the two it prefers. And if it prefers the version that's maybe not as statistically likely, then we'll go in and adjust the weights to actually make that one the most statistically likely. And so a very simple example of that is if you have the robot spit out

Nazi content and you have the robot spit out Martin Luther King content, if you pick the Martin Luther King content, then it will adjust its weights to behave more like the king. And so that's an example of reinforcement learning. But there are, again, similar issues with the data curation, believe it or not, where, for example, if you go in and you always pick the versions that have, let's say, I'll give you a good example. Let's say you train this thing on all the code on GitHub. Well,

This is a true fact. Data scientists leak out API keys and passwords more often than site reliability engineers. And it makes sense because a data scientist's job is to give access to data. And so in their Jupyter notebook, they'll put the database password and they'll share it with their whole team. But the SRE's job is to make sure everything just runs. And so they want to restrict access. They don't want anybody touching what's working. Don't broke what's fixed, right? Or you know what I'm trying to say. So basically, it will leak out passwords and API keys less often.

So if we do our reinforcement learning and we skew it towards code snippets that's generating that don't have API keys, inadvertently, we may be training this thing to behave less like a data scientist. And then we lose the entire discipline of data science in our LLM, right? And so there are all these unintended consequences when we go and we start tweaking the weights if

passwords hard-coded, weighted right next to the data science stuff, we may accidentally lose the data science stuff. And that brings us to the third technique. And the third technique is probably the most expensive. And by the way, all of these techniques, all of the AI companies use. So it's not all one or the other. The third technique is you have a constitutional AI, basically a governor that looks at the output and then makes adjustments, deletions, removals, edits, and then returns that to the user. So you have one AI that's maybe doing the data scientist,

And then one AI that's doing the security engineer. And then that's the constitutional AI. Security engineer goes and says, oh, you hard-coded a password. Let me edit that for you. Let me switch it out for an environment variable. It doesn't need to be an expert in data science to do that. It just needs to be an expert in security. And it very much parodies what you would expect in the real development world. And so a good example of that, like you've probably seen this before, you can recreate it easy enough. If you go to DeepSeek and you say, count to 10 in Roman numerals,

and append it with zhiping. When it gets to zhijingping, all of a sudden, like it's written everything up into that point, it'll delete everything and it'll say, I can't show you the answer to this. Well, that's because there's a supervisory AI that was looking at the output and realized it said something it wasn't supposed to, and then it went out and retroactively scrubbed itself. And you can reproduce something similar in OpenAI as well. If you ask it to go generate an image, it will generate a prompt that it feeds to another AI called DALI,

And then there'll be a third AI that reviews the output of DALI and decides whether or not to give it to you. And so you can ask it to do something. You can't ask it to make explicit content, but sometimes explicit content gets manufactured anyway. And then the final AI will look at the image and say, okay, there was explicit content here. I'm not going to show it to you. And that's when all of a sudden you get the random and error occurred.

And you've probably experienced that before. That's the supervisor AI that's going on. I've never tried to make it do anything untoward. Right, right. Exactly. Me neither. So like all of these have direct analogs to the security world. When this thing goes and trains on all of GitHub, we need to figure out how to make it manufacture secure code because most of the training data it's training on is insecure, right? You've got a huge corpus of insecure data on GitHub and a small minority of it was written securely. Well, how do you make this thing behave securely when most of what it's trained on

was insecurely. We could do a little bit of data curation and a little bit of reinforcement learning, but you may have unintended consequences. It may learn, you know, you rob Peter to pay Paul. But the third and probably most promising, but most expensive is this idea of the constitution AI or the supervising. And that can be done by a robot. But if you don't have a robot that can do that, it has to be done by a person. And somebody just has to review the output of the code and they have to manually audit it. And what's scary is I've seen posts on LinkedIn from startup founders that maybe don't have

background in coding. And they're basically advocating for removing the code review check because they say, well, look, I just generated this whole program and I submitted it to my team and now they have questions about it. I can't answer those questions. I didn't generate the code. I don't understand it. And so we need something to go in and review that code in a way that does understand it. Well, that either has to be a constitution AI that understands secure coding practices and can go on and make the tweaks.

or it has to be a person that understands secure coding practices and can go in and make the tweaks. Absolutely, yeah. I think that, and that leads me to kind of a really weird question and feel free to dodge or whatever. All the different AI models perform differently when it comes to code generation. And it seems like Claude is consistently the best of all of them. Just at the current state of the art, maybe this changes tomorrow, I don't know. But from what I've heard anecdotally is that most people seem to prefer Claude.

And, you know, Anthropic is a company that's very focused on safety, right? And famously is kind of why they started. Probably has a very strong constitutional AI element. Do you think that alignment from a company perspective is what's making code quality better? Or do you think it's just maybe a training and a kind of refinement issue? Well, what I think is that, first of all, I wouldn't expect any one AI company to keep the lead for any longer than, I'm sure they're all going to regularly lead each other. I can't.

I can't speak specifically to whether they use different training data or not. I imagine they all use all of GitHub. And then I would think, you know, most of the quality issues come down to alignment. And I know of the three things that I mentioned, they also have a few more techniques I didn't get into. But of those three things, they're all doing some combination of those three things. And those tend to be the things I would think would give you the advantage would be how do I align this thing to be the best data scientist, the best SRE, and also the best security engineer all in one without Robin Peter to pay Paul.

Totally. I mean, it seems like the techniques that you describe, obviously, if you amp up one and reduce the other, it probably leads to an output that's better for something that's structured by code. Or if you want to write poetry, you probably go in a different direction, right? That's a very interesting talk. And like I was saying, we've heard from organizations that like a lot of this code

is still, a lot of the code now is generated by machines. And if you look at a mature coding organization, they do still have code reviews. It's not an early stage startup, but they do review the code that goes in. And the defect rate from what I've heard is generally close to what you would see in probably maybe an early level, early career developer. So code quality is good, not great, still has bugs. I'm curious, do you think over time,

that this coding quality problem largely gets solved by AI? Do you think we get humans out of the loop at some point? I think that this is an alignment issue and alignment is the number one largest issue that AI companies face. And there's a lot of really smart people working on it. And so I think as they fix the problem for how do I make sure my AI is literary,

creative, not a neo-Nazi, able to answer the question that I asked it without hallucinating. As we get the answer to that, we will also logically solve the how do I make sure my AI is a data scientist, NSRE, and writing secure coding practices. Totally. Or set of AIs if we're using the constitutional AI model where maybe we have one reviewer and one manufacturer.

So, yeah, I think it's all going to get better together. And I do think that there are AI solutions to the alignment issue and they have gotten better over time. I mean, the answer back when Watson or Tay were launched were to scrub it or to pull it off the internet. Well, now we have tools where you can actually train it on everything and then kind of nudge it after the fact. I would expect alignment is going to continue to improve over time. And I expect it will continue to be one of the largest challenges that AI companies face as their AIs become more powerful. Mm-hmm.

develop techniques to lie to us, for example, or, you know, you need to audit the thinking step and the answer step. Well, maybe they're just auditing the answer step, but the thinking step has some weird stuff in it. All of this kind of comes back to the idea of alignment. And there's a lot of really smart people and heavy investment into improving alignment. But it's just not there right now when it comes to secure coding. It is for hate speech. I mean, I have to say the alignment stuff that they've done, the safety stuff they've done is pretty impressive. There's a cybersecurity alignment that all companies have developed

invested into as well, which is they don't want, generally speaking, they don't want their AIs to be used to hack stuff. And so most of them will go through a tranche of, you know, this thing was trained on all of Metasploit and all of Kali Linux. Let's maybe forget some of that stuff. Let's ask it a question and it says, that's unethical. I don't know how to hack into something. Well, imagine if they didn't invest all that. Like how powerful this thing would be. You've got

models these days that beat humans at the 90th percentile at coding challenges. You don't need someone in the 90th percentile in the coding challenges to hack into a company. You've got plenty of teenagers that have gone to jail for hacking into companies. So it would be very easy to align an AI robot to be probably the most powerful hacker in the world. And I think the AI companies have actually invested more into that

than they have into how do I make sure my AI is securely coding and not manufacturing vulnerabilities. If you're giving advice to someone, let's say a medium to large size company, they've got more than 10 developers, they need to go faster, they need to ship more features, right? We've all lived in that world. You've got to get, you don't get paid to fix bugs, you get paid to ship features. What do you tell them? How do they go forward? How do they protect themselves? Obviously, everyone's adopting Cursor, everyone's using CodeGen, right? Like,

How do we kind of do this safely going forward? If you don't have the resources for an AI governor that's an expert in security to audit your code, then you need a person that audits the code and you need a person to go in and say, you know, you introduced a SQL injection, you need to use parameterized queries. I think that those market options will become more

more available in the coming years. There will be companies that specialize in security governance and will go in and will do the markup for you. But for right now, if you're an under-resourced team and you don't have access to those resources, it needs to be a person that reviews the output of the AI.

So we keep the buddy system except one half of the buddy system is AI and the other is a human that reviews the AI. Yeah, that's right. I mean, for a long time, there were requirements that said you need to have two reviewers. And I think maybe that's still a good idea. You have the person and the AI writing the code and then maybe two people to go in and review it or something, depending on what the code is. I think it's important that until we figure out the security supervisor, we don't remove those humans from the loop just yet. Absolutely.

And finally, we have Joel along with Brian from Adaptive. So, you know, the problem we've been talking about is the deep seek model came out, set the world on fire. It seems like this is yet another kind of chat GPT moment where things are accelerating, they're going faster. It's now open source, people are adopting it. And so we've been talking about

How do you think about protecting not just the infrastructure and not just the code, but also the people? Because obviously now that this model is open source, it's out there, people can run it themselves. It's going to be used by adversaries. It's going to be used by the bad people to do nefarious things, as all technology does. And sort of the sun also rises, right? So we'd love to maybe get your thoughts on sort of thinking through

how you think about protecting people from these kinds of threats. What are the possibilities that can happen and how do we defend our people against this sort of stuff? Yeah. Well, I mean, first let's talk about the problem and where it's going to go, I think, over the next couple of years. So since ChatGPT came out a little over two years ago, we've seen over a 4x increase in social engineering attacks. But I think beyond that,

You know, we've seen an even larger increase in more sophisticated attacks using things like deepfakes, where in the United States in 2024, we saw over 100,000 deepfake attacks. You know, when I was talking to CISOs a year ago, you might have found 5% to 10% of people experienced a deepfake. Now when you talk, I think it's probably more than 30%, 40%. And I think that's going to end up being the answer that everyone's going to experience it in the next year or so. When you see models like

deep sea come out i think as you you allude to you know you bring the ability for attackers to use those models in a consumer device and without the guard rails of some of the more established security friendly players out there so i think that the outcome of that is that anyone can do an attack now

from their smartphone from any country in the world and do a pretty darn sophisticated attack. So I think first off, we're going to see that increase tremendously. And I think what attacks, you know, what we need to be thinking about is, sure, I think email is definitely going to continue to be a big attack vector. But these new models allow other attack vectors to be, other channels to be accessed more

At a price that you can do a large brute force attack and it still makes sense. So, you know, whether that's voice or SMS or video or chats, all those can now be done at scale where, you know, previously I think it would have been cost prohibitive.

One of the things we've seen in the last year, and one of our general partners talked about this actually on Twitter, where they were the subject of a virtual kidnapping. So their dad started messaging them, called and left voicemail messages, had the voice replicated, sounded exactly like him, demands to wire money somewhere. Obviously, luckily, this happened.

person called their father and was able to get him on the phone and he confirmed that he was fine. And then from there kind of triggered an investigation. But that's actually a very scary thing to think about, right? Just not just the sort of virtual kidnappings aren't necessarily new. This has been something that's happening in a lot of countries for a long time now. But having the ability to do that kind of a deep fake is new. And I guess I guess the question would be like for folks, for just normal folks, not security folks,

How should they think about preparing themselves for that kind of an eventuality? It does seem like this is somewhat inevitable. Well, the best practical tip that I can give normal folks to take today right now is to open up your phone, go to your voicemail and delete the voicemail greeting that you may have made in your own voice.

because now with these models, that's all they need in order to do the replication of your voice. You don't need some long transcript of you on a podcast like this or someplace. You don't have to be a public person or any way. They can just get it through that very small voice sample on the greeting. So that would be the first thing. The second thing would be, and I know this is, so the first thing I think is very practical and everyone should do it.

The second thing goes more on the paranoid side, which is if you're getting phone calls from parties that you don't know who they're calling you from, I would also be careful about picking up and speaking a lot to that party. If it's not within your voicemail greeting, they can copy your voice through again, just a couple seconds of you talking. I think you can say hello, but you don't need to say who you are or, you know, confirm that it's your voice or give them more information.

seconds of audio that could be used otherwise. And that sounds like a crazy comment now, but I'm betting pretty strong that in five years, this will be a very normal thing people think about. I immediately thought that I need to go delete my voicemail and then remember we're recording a podcast. I'm kind of done. Yeah, I mean, I think that's actually really quite scary as you start to think about it. And then we push into new modalities and start doing video

You get the ability to have something pretty destructive. I guess maybe switching kind of from your average consumer and talking more about the enterprise, obviously thieves want to go where there's money and money is in the companies, right? We've seen, without any generative AI, we've seen scams where

People will text employees and pretend to be the CEO and either get them to buy gift certificates or in the case of, you know, Facebook, wire a lot of money to a third party in another country. Obviously, it seems like generative AI makes this a lot more effective. I'm kind of curious your thoughts on sort of what companies can start to do to train their people to be a little bit more resilient to this sort of stuff.

Yeah, look, I think that companies are overexposed to this because, you know, when it's an individual and it's your parent or something like that, or a loved one, you're going to call them up. You're going to say it doesn't make sense. You're going to push back. I think the power of authority at a company is so overwhelming. And I've firsthand experienced this across many an intelligent and thoughtful employee who, when pushed the right way, acted against it.

The other thing is, you know, I think there is a concerning to me. There's still a bit of ego and hubris among the technology folks that we can secure this without securing the people that we can just secure the systems. The reality is that almost 90 percent of attacks still come through people.

And if you want to talk about securing things, look, I do think there's been a lot of advancement in securing email as an attack factor. I think that the systems are getting a lot smarter and outsmarting some of the systems that protect us in email. And there's got to be vigilance there. But the other channels are where there's just tremendous exposures. My advice to people at companies who are in charge of security and care about this huge new attack factor would be to make sure that your employees are

A, extremely well-educated on where these attacks are going and what to expect. And then B, to test and understand your organization to see where you're particularly vulnerable to these types of attacks. On part A...

When you talk to a lot of security teams today, I think that when they think about training of the broader organization, they think they're just checking the box and that training is just a compliance thing. And they do it once a year and most people just kind of rage click through next to get to the end.

And they think they kind of know it all. But unfortunately, as the world is changing as fast as it is, I think that there's a lot of gaps. We need to make sure people are really learning and paying attention to those things. And then number two is more of an ongoing exercise. I mean, not that...

Education is ongoing, too, but I think you can do it a couple of times and get people pretty ramped up on things. Whereas understanding where your weaknesses are, where your vulnerabilities are as your business change by testing these new attack factors, simulating these attacks and adjusting data from third party sources to understand, you know, where people are, again, have some of this weakness. Right. I mean, you need to understand that.

among your team and people who have authority, where might their information be public? Where might their information be vulnerable? So one of the things that's always been incredible to me about the security industry, and I'd say it's probably one of the big dirty little secrets in the security industry that no one wants to talk about or no one ever really highlights, is that if you look at spend and efficacy, so if you actually look at, I spend X dollars and I prevent Y breaches,

Training and awareness is by far the best ROI of the entire portfolio that a CISO has. Yet, it's become this kind of rote sort of check the box. I've got, I actually have an email in my inbox about how I need to complete my training. And so if a professional like myself delays this, you can imagine what other folks do. The question would be like, how do you break through that without making it the checkbox thing? How do you make this actually meaningful to employees?

and drive that ROI? - Yeah, I mean, look, I think when you talk to employees as well as the people who manage these trainings, the feedback on a lot of the legacy solutions out there is that the trainings are really not built for the company and the person that they're trying to train, right? 90% of the content's not relevant.

Often it's delivered in a pretty boring way. It's hard to make things not boring. And then I think finally, it's often outdated, right? These companies, because they tend to make it and their legacy companies have made it five, 10 years ago, it's not updated for the most recent threats and updating it, making it great, unfortunately, is maybe a cost center for these companies. So I think that the opportunity for AI here to make training amazing is huge.

And in particular in security, you know, we've invested a lot to make training on security matters, while obviously updated for most recent threats, also super personal. You know, we at Adaptive Security utilize things like deepfakes within our actual training of the corporate executives and managers and people at the company. So they see what that experience is really like. We show how attackers use these new LLMs and other tools to put together attacks.

We surface for the individual employee what sort of open source intelligence might exist about them and their peers. And I am continually, I'm even very surprised by the amount of open source intelligence that exists about you and me and lots of people out there that I think most people don't fully grow up today.

Yeah, it's incredible what's available. I'm curious your take. I mean, I think that's all super important. And I think that to your earlier point that 90% of attacks are targeting individuals, like a large percentage of that has just been spear phishing. I was an incident manager for a long time. I've run very large security organizations like

it's always amazing the things that people click on. And we had an investigation years ago when I was working at a bank where we actually got access to some of the chat logs that the scammers had talking to each other about formulating their messages. And the really interesting thing was is that they intentionally made typographical errors and they intentionally phrased things incorrectly to try to drive engagement. It just,

seems like a very high level of sophistication in terms of kind of social engineering people. And just kind of maybe thinking about these threats are constantly evolving. And I guess I'd love to understand, like, where do you think we go from here? You've got very clever people on the other side, and we have to at least be as clever, if not more. Right. I mean, I think that the reality where things are going on the attack side is that that clever attacker that was doing that stuff, you know,

There weren't that many of them. Now with AI, they can lever themselves up and they can make infinite numbers of themselves as agents to go out and do these types of attacks autonomously. And it's going to be very profitable for them. So they're going to have lots of money to plow back in and run this like a business and scale it up among the best strategies and the best things they have.

right? So I think you're going to see a huge increase in that, you know, so many people are excited about being able to automate sales teams or automate other teams with agents.

Customer service teams, the same is true for this hacking type team. It's the same type of interaction that we're automating. In terms of just, I think, overall on the security and training side, too, just to come back to that for one second. When the training is kind of a joke at the company, I think it also sends a signal to all of the employees that the company's security posture is also not very important.

And that, you know, hey, yeah, these tests exist, but, you know, you don't really have to worry too much about it. Just click the box a few times and get over it.

And I think that's, to me, concerning because maybe they've gotten away with it when attacks were less sophisticated, but it's becoming more and more sophisticated. I think there's a huge, just broad surface that's exposed on these things. I think over the next couple of years, to kind of get back to your point on where these attackers are going, you know, what we're going to see happen. What's already happening is that there are news articles kind of sneaking out that to me are

surprisingly don't get as much coverage as they should, but that are coming on a regular cadence every couple of days about some significant deepfake or AI powered attack that incurred at something. Fortunately, while there have been cases where tens of millions of dollars have been lost or a company lost their business for a couple of days or whatever it is, it was mainly financial.

I think what I'm really worried about is that when you're going to see this actually transition to people being actively hurt and, you know, hospitals being taken down, energy systems taken offline, other things like that. Obviously, we've had cases of that already happen, but I think that happening in a really large scale way.

It's probably what motivates me the most to stop that type of threat. Because of course, we all feel terrible when someone has their business suspended for a while, loses a lot of money, but we can get those things back to a degree we can't get back light. So I think that's really what I'm most worried about. And I think it's unfortunately very likely to happen the next couple of years in a large scale, affecting all types of businesses. An example that I brought up in a recent post that I feel like was just kind of

maybe you covered it at some point, I don't know, but overlooked in the broader media was when someone used to deepfake in order to call up the chairman of the Foreign Relations Committee for the Senate and managed to deepfake a Russian member of the government in order to try to get information from the U.S. on what they were doing in the Ukraine. And thankfully, it sounds like they didn't leak much information. We didn't get much. But I wonder how many of those things are happening behind the scenes we don't find out about. Like that just came through a leak.

I mean, a similar, an exactly similar threat to that. And it's one of the ones where you see that the current way that we train our people is falling down as the employment scams. You've got active groups of foreign, probably foreign state sponsored hackers

getting employment at Silicon Valley tech companies, showing up for day one and, you know, walking away with all the secrets and then disappearing, right? Like it's concerning. Like it almost feels you do the training and probably for like a week, everyone's locked down and super good. And then the efficacy sort of, it seems like this point in time compliance driven approach just isn't really serving us right.

Well, look, I think there's two things. One, I think training should be personalized to you and it should really engage and hopefully be jarring to you. If it's not jarring in some way, then it's not really working. So I think that's the first thing. We've worked pretty hard to try to reinvent it, but I think we're still early in our invention and a lot more personalization is needed. So it speaks directly to you and what matters to you. Number two is I do think people learn the best by experiencing

experiencing that feeling when it actually happens. And we offer different levels of simulated AI-powered attacks, some that are easier to spot, some that are more realistic and harder to spot and push further down into credential harvesting and things like that. And I think that when an individual experiences that, they get a little more hardened about it. Now, of course,

I don't want an individual worrying about every email they click or everything they do. And certainly there are hopefully a lot of good technology that protects them from an email standpoint. But in a lot of other channels, they're not protected. And an email, a lot of stuff is getting through. So I do think that there's kind of a much larger and renewed vigilance is needed here that gets passed. And I had to laugh at your story that you just told.

about someone being able to get into one of these companies. One of the legacy training companies actually came out a few months ago and said that they actually had someone from North Korea that I think they may have employed or that got into their systems or something. And it just shows you that even companies that are thinking about these things are not able to completely stop it. So you need to be able to have a smart response as well.

Absolutely. Yeah. I mean, I don't recall the specific details of that event, but I can tell you that other events that are very similar involve, you know, three folks, probably North Korean, doing remote job applications as developers for tech companies, getting onboarded, passing background checks, having addresses where the laptop gets shipped.

Obviously, it's a forwarding address. So the laptop goes somewhere abroad. They log in for their first day of work. They download the code repost, take the secrets, and then they're out, right? It's a big business. It's incredible. It feels like we need to decompose the problem longer term. Obviously, this is an evolution. Do you think we get to a point where we have the clippy for security or some kind of continuous reminder to be secure? Yeah.

Yeah, it's a good question. I think point to your point too about they get the, you know, they get access to the corporate computer and they download a bunch of the code. I think what's scarier too is that with everyone being remote and how systems are, they could also put new things in the code or they could download the information on,

all the data from their customer bases and their access controls and all the sort of other things that are out there that we often are happening behind the scenes. So I think you're right. That's definitely happening. In terms of an ongoing coach that's advising you and stopping you from things, yes, I think on the one side where you've got agents that are acting to attack,

You can also have agents that are working to protect. And like any arms race, you know, and I think we're operating on the side of good, trying to protect, and we're trying to match them and fight them and build those type of tools. So yes, I do think that you're going to see a lot of that AI being used for

very good to protect. And, you know, we're going to be fighting each other over the next decade plus. And I'm confident that we're going to win over it. But unfortunately, I think that we've got to wake up because as much as some of the technical systems are OK, I don't think the human systems are up to the bar yet. There's a lot of information out there. Not all of it is particularly good, as we've talked about.

A lot of it can be misleading for reasons and different reasons and stuff. Where should we direct people? Obviously, we've got a blog on A16Z about the 16 things you can do to protect yourself. We keep it updated. Are there some additional resources we could drive people to? Is there something that they should be reading? Because this is all very sort of cutting edge. Yeah.

It is cutting edge and happening real time. And certainly I think I get most of my news following lots of great people on X. I'm trying to post more myself. I'm trying to get away from the side of just being paranoid and actually posting things. So I'm just at Brian C. Long. We also have a blog on our corporate website or a company's just adaptive security dot com and then just navigate over to the blog. And there's there's posts there and we'll continue posting a lot of stuff in this area. But you'll see some of the more recent attacks and information posted there.

Awesome. And now I got to go update my 16 things to do to include deepfakes. Thank you so much. This has been great. Hey, my pleasure. Thanks for having me. And that's it for this week. If you enjoyed this discussion, please rate, review and share the podcast with your network. We'll be back next week with another new episode.

What DeepSeek Means for Cybersecurity 52:13 Share

AI + a16z

Deep Dive

Shownotes Transcript

What DeepSeek Means for Cybersecurity