We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Hacking Security Camera AI

2024/12/2

Hacked

AI Deep Dive AI Chapters Transcript

People

Kasimir Schulz

主

主持人

专注于电动车和能源领域的播客主持人和内容创作者。

Topics

Kasimir Schulz：我们研究了Wyze摄像头这种边缘AI设备，发现其AI模型存在安全漏洞。通过逆向工程分析其固件，我们找到了AI模型的运行方式，并利用其在WiFi设置过程中扫描二维码的漏洞获得了新版摄像头的root权限。我们还发现AI模型会将检测结果（包括置信度）记录到日志文件中，这使得我们能够评估对抗样本的有效性。我们利用现有的对抗样本生成工具和技术，以及一些手工制作的对抗样本，成功地欺骗了Wyze摄像头的AI模型，使其将人识别为其他物体，从而绕过了人脸识别系统。我们还发现，该AI模型主要依靠物体轮廓等特征进行识别，因此通过改变姿势或遮挡部分身体部位就能降低识别准确率。虽然我们的攻击方法存在一些局限性，但它证明了边缘AI系统中存在的安全风险，并为未来的研究提供了方向。我们应该继续使用AI驱动的安全摄像头，因为其益处通常超过其他任何风险。但是，公司在部署边缘AI系统时，应该考虑AI系统失效时的最坏情况，并采取相应的安全措施。主持人：本期节目讨论了Kasimir Schulz及其团队如何通过逆向工程，利用类似二维码的图像来欺骗Wyze摄像头的AI模型，使其将入侵者识别为鸟类。研究者通过向Wyze摄像头展示特定的图像，使其误认为是鸟类，从而绕过其人脸识别系统。研究者希望找到一种隐蔽的方法来欺骗摄像头，例如携带不易察觉的物品。Wyze摄像头最初将所有处理都发送到服务器，但由于隐私问题，AI模型被转移到设备上，因此被称为边缘AI。研究者利用摄像头WiFi设置过程中扫描二维码的漏洞，获得了新版摄像头的root权限。对抗样本的目标是让摄像头无法检测到人，即使人就在摄像头前。研究者通过修改AI模型的配置文件，降低了“人”的检测阈值，从而绕过了警报系统。研究者创建了一个自定义套接字，将图像直接发送到AI模型，而不是通过摄像头。大约20%的对抗样本能够从YOLO模型转移到Wyze摄像头的AI模型。该AI模型主要依靠物体轮廓等特征进行识别，因此通过改变姿势或遮挡部分身体部位就能降低识别准确率。

Deep Dive

Shownotes Transcript

Translations:

中文

As somebody who has had one of these models now, I still think it's great that people are actually employing them. I am firmly in the belief that, yes, we should still keep using them because the benefit usually out with anything else.

If you picture your computer network and you're looking into the network, you can think of your device as being on the edge of the network. IT calls into and receives information from devices and servers that are deeper inside. For this reason, some people call computing done locally on that device, edge computing.

This is a whole category of thing. And if you do artificial intelligence tasks that way locally on the device, on the edge of the network, instead of calling a server deeper inside, people have started calling that edge AI. There are a lot of devices that do some version of this.

Maybe the cheapest, most successful is something like a wise camera. A wise camera is a security camera. So IT relies on A I and machine vision.

For one thing in particular, a working modern security camera needs to be programmable to send an alter to the user if IT sees a person stoking around whatever the cameras point to that. But I can't send an alert any time and these motion, because birds and cars. So the model needs to be able to tell the difference between a person and not a person.

You can do this in one of two ways. The camera can send the video to a server where runs and A I model people of distinguishing that's a guy, that's a duck, that's a burglars, that's a goose. He can run the duck versus guy model on a server or IT can try and do IT locally on the device.

And there are some real security and privacy reasons why this is preferable. It's a video feed of your house. How much do you want to send to some server you don't really know about? But the question we like to ask, can you hack IT without touching IT? Because if you get close enough to a security camera to plug something into IT to hack IT, it's going to see you.

It's gonna. That's not a bird. Our subject this episode is casual shirts, principal security researcher at hidden layer who took a run at this problem. He was busy all death kon, given a bunch of interesting talks.

But I wanted to understand what he and his team did to crack the wise camera, not by walking up to IT and plugging something in, but by figuring out what the A I model running on IT is doing to distinguish a person from, not a person in reverse engineering, something that you could show that camera. It'll make IT think, oh, that's a bird, when in reality that's a burl's like A Q R code. But instead of bringing up a menu, IT tricks a security camera into thinking you're not a person.

If the person is in the camera with whatever bad thing they have in there, the patch um the camera does not detect the person even though a person is there. And then ideally, we are going to have to set up in a way so that you are just know, Carrying a bush, holding a train front of you, you something that people around you might. We wanted something really. And so somebody, could you come up still package off your porch, you would never notice.

So I caught him up. This one bears little technical on occasion, if that, where is? You know that I am not that technical. And I found this process fascinated. This is hacking edge A I, with our guest casual shirts here on hacked.

So thank you've join to me.

Thank you part of me.

okay. So we're here are talking about the wise camp. And ji, what before we get into the story itself? Um what LED you want to research this?

yes. So we were trying to see if there were any actual devices out there trying to use A I on device. So rather than calling a cloud server in the model, being in the cloud, having an undevout because that way we can actually attack at see you know what people actually using, trying to see if there is a way to really utilize these models in a malicious way, are tried to by pass them, especially with the new advancements in hardware. So people using M P S, uh, which allows you know, up be small devices to run low power A I models of a long course of time.

And just so I understand that's what I J I is. A I being run locally on the device verses a thring to the cloud.

Yeah so E G I is a term that um we were actually kind of originally looking for. So we were trying we actually define the term and enjoy ourselves. And then while looking around, trying to find if there is any AI being run on the edge, we saw that wise cam was marketing A G I. So wise camp had actually marketed named IT as A G. I, which worked out anyone.

what's the like? The animal logy there. Why called E J? I like all the devices, somehow defined as edgy devices. Like what? What does that mean?

yeah. So the way that the, uh, wise system used to work is that when a wise camera detects any sort of motion, IT triggers an event. And that event used to send a photo off to one of the wise servers.

So in which case the camera was the edge device and then the wide server was the main server where everything was being sent to and then the A I model would run on the server, see, you know, is the person in the photo, is there package of pet, and then send the detections back to the camera. So all of the processing was done off device. However, some people had privacy concerns that didn't want their photos being sent to a server. So instead the A I was actually put onto the device. So that's why they called the G I.

And then for anyone that doesn't know this kind of intuitive at this point, but what what is a wise cam like.

what is this yeah device? So there are little budget cameras um they are fairly popular I believe. Don't grope me but I believe that one of the uh cameras has in Operate to seventy million sales on amazon um and there are originally meant for indoor camera.

So watching your pet law, you're gone. You know nana type work, uh however, since then, some outdoor cameras have develop there's cameras for doorbells and they have developed the line about their products. But they Normally the ones that we were looking at Normally range in the thirty to fifty dollar range. So it's camera that successful to different people.

We talk a little bit about what guy in this, but broadly speaking, why look at this device was there is something that you thought you were looking for, something you were trying to do when you started peeling this thing apart? Or what got you look into this? yeah. So first .

off was IT was somebody actually marketing A J I? So there was a lot of A I models run on lots of devices, however, most of them are could do as well. And what actually have a fun turn event is that i'd actually hacked uh, the wise cameras a few years previously, so I had experience with them. Uh so I actually a lot of other a hackers verse engineers out there I know have two voice camra the form because they published the firmer online uh and because they're on the cheap er side to buy, uh, you can actually have a device, the few brick one, you always get another one, but you can instead having to try to extract the firm work and yourself you can just download, start doing from your engineering. So just the prior knowledge of the device having reversion noted, plus the marketing and J, I made IT a really good use.

So they were cheap in the firm where was publicly available. Why do you think they publish the from? Why do you think they give you that told ld.

yes. So a IT actually I thinks worked up barely well for them just because you've gotten so many reports over the years. And um the other year they were in town to own as well. Um but yeah, so it's just a choice, something company I don't think IT actually makes IT doesn't make them more insecure anything. There's more researchers looking at .

them got IT if anyone that does not pone .

on what's that the point on event that happens, I believe once a year um and these companies they go out, they say, hey, we have this device, we want you to have them. Everything from routers to even a tesler a few years ago was part of phone to own. And then uh, you get a big prize if you actually are able to exploit the vulnerability. Uh, the day of the copy of few.

this is the total aside. But I know you you talked about this a def con. I I believe I did.

yes.

Did you see any of the similar? Like did you see the tesler that they had on the floor there in the river? Like did you did you walk around any that stuff now?

Now not too much. I actually had six talks the week of that. I was okay.

fair enough. Yeah I was not because I did not. Uh so I spend a lot of time having know what those people is very is very fascinating.

Um okay so cheap, the plentiful firms available online. You had a little bit history with that. There's almost like a culture around the hacking these things.

And why seems down for IT? How does the investigation start? What are some like the early discoveries? What what kicks this all off?

Yeah, so as I mentioned, they were part of ponto on which meant that they were publicly available exports for older version of the ramework. And we actually had a few older devices lying around. So we decided we would try to see if we could phone one of the get a shell on one of the older devices without older for more since we had updated yet.

Um and once we were actually on the device, our goal was to see if we could find the A M model. So that kind of where we started our journey. Um so the order device we had wasn't actually supposed to have the A I model because the new devices are were marketed as having A I uh other ones you couldn't even enable IT.

But when we actually got on there, we saw there was a folder called A G I, and there were some binary in there. Um what happened though was that the voter didn't exist inside of the actual farmer. So we had initially present from where that wasn't there.

So we noticed that he had to be down a bit somewhere. And this E. G, I was actually actively being used by the binary of the latest cameras.

So even though we knew that, so the current Cameron, the other one would not access the folder. Uh, however, the new cameras all access the folder, but we couldn't see the folder on the new cameras. So we decided to poke around and verse engineer, and we actually done that, that there were A I models on in that folder.

Um so the way we reverse engineered them is uh the A I model instead of being by itself, was built into a shared object. So IT loaded up by an executive and then run. And after bit of reverse engineering, we saw that there were a few layer names in there.

So uh, A I layer names are going to be things like convolution or quantize. Uh, input output and reverse engineering that we were able to see the actually we were able to get a model out there. Um at that point, we wanted to step back a little bit because we were concerned that maybe we were putting too much time into something that isn't actually on the new devices the same way.

So we decided to see how they're actually being how that voters being downloaded onto the device. And what we did is we set up T C P dump um and then leaded the folder. And we found that there is a binary on the device called sinker that just redowa loaded voters.

And we just rent graphed for the string. re. I, under that way, and having T, C P dump, uh, we ran the command, so reduan loaded the binaries.

And then, uh, what we'd also done is we had dump, ed, the client secrets for the H D, P S traffic. This was H T, T, P. S, not H T P.

And the way that you do that is on some linux devices and other devices as well. There's an environment variable, you can say. And then every time in H D D P S, for request happens, IT saves the secret or that.

And then you can just upload that to wire shark. Are we able to see that there are multiple calls, and in those calls, we could actually see where IT was going to get the binary. So IT was doing a call to see you based on this firmer version.

What should I have? Uh, a firmer version camera, then I would tell you where to download devices and then a final call to actually a one time link that expired to actually download content. So IT wasn't just always hosted ah, which is pretty interesting. So the firmer version and the camera I D or something that we actually had for the new ork cameras.

So instead of trying to phone the new cameras at first, we then went ahead, put the new uh uh prime version, new uh camera I D and and we were able to download the E G I directory for the new cameras and was actually really interesting, was that I was completely different files. So uh, with the older version, IT was live J Z D L. With the binaries of and the new one, there was a live veness and we did some online solute thing.

We found one open source review that's documentation for, uh, ingens, which is the chip set that these cameras are based on. So we were able to see that was a prieta model format for that chip set because that chip set had the new NPU to run A M on IT. Um so we started rehearsing that as well.

And uh at this point, we had two different models, uh, and we were able to see that they were fairly similar and they actually based off of vio o and you low is a image recognition model. So the way that IT works is the active image or number of images, a video. IT will drop the bounding boxes around all the items that are in there and then classify so you can see if there's a person or two people or person in a pet um right which makes sense things.

These were the delectation actually coming back out um and then from there we needed a way to actually run uh the mall, but this he was because he was a private format and was for the specific set of chips. We actually couldn't run IT locally. So we had initially tried emulating because there was a chemo.

a Operate system. You start, you try to emulate the entire camera so that you could run this A I model through that whole process. Uh, you get that running on this simulation of the camera.

Yeah, got. So we found that there was a MIPS architecture. Uh, we got to the point that we were actually able to emulate the different binaries on the camera.

However, we couldn't emulate the AI model because I needed that very special instruction that was only for that special C P U, N P U, and wasn't actually in chemo. So at this point, we were saying what to do next? Yeah at that point we needed to actually run on device. However, the newest AI model didn't run on the old cameras because the old cameras didn't have that special chip set. Um so what we did is we decided to find a zero day to get onto the new camera.

She's student yeah, a lot easier .

to do once you are actually on the device. So instead of just reverse engineering and statically finding a zero day on the new firm where we decided to see click find a zero day on old camera and see if IT still exist in the new camera, cool. So will we decided we decide to see.

There was instead of trying to find a really complex attack vector where, you know, you trying to send traffic to some cold water or something, uh, you know what other people actually checked into? In the past, we decided to see if there was a secular vulnerable that might not be as relevant to attacker, but is really relevant to us trying to get onto the shell, onto the device. So part of the camera set up process is the scanning Q R code for your wifi.

So that has your wifi as society and your wifi password. And what's really cool is that when you have your wifi access, I D IT adds that string into a command that runs. So then he tries to find if that, as societies available, also means when they just adding the string, and if you have a sumi coin or anything else at the end, you can add whatever other command you want in there as well.

And when we looked into the framer for the new camera, we were able to see that IT actually did exist on the new camera as well. That point we were finally on the new camera, which is great. Uh, we were able to uh just see all the detections um see that all uh all the files that we had pulled from the server over the same files on the camera, which is grave because meant we had actually done good reverse engineer and not loss of at work um still from there now we were able to see that detections occurred and that the files existed.

But what we needed to see was we needed some way to find out what actually were the percentage is being returned. So right now, all we got was if there was a person in the photo that camera saw, I would send a message to our phones. But that's not really useful if you're trying to find create members example like we did and .

just so that people understand, when you see creating adversarial example, what is the what with that example look like? What would the negative what would the bad actor trying do that even trying to create .

yeah so as a that actor, the adversary example we were trying to create was if a person is in the camera with whatever bad thing they have in there, the patch, um the camera does not detect the person even though a person is there. And then ideally, we are going to have to set up in a way so that you are just, you know, Carrying a bush or holding a train front of you. You something that people around you might just, right, we want to something really subtle so somebody could not come up still package of your porch, you would notice.

So when you say patch are talking about like a small physical thing somewhere on their person that would cause this camera to o whatever that detection threshold of, I think that's a human you're not going to trick, or that if you're wearing whatever this thing is cool, okay, please continue. Yeah, yeah, yeah.

So at this point, to great versions, al example, it's so much easier when you actually know the percentages that are returned. So you know are especially with the old since multiple classes, we can say, hey, this is ninety percent of person and then if we add, you know, a small patch of sudden its eighty percent of person, ten percent dog, and then you there, we can slowly try to get the percentages more in our favor.

So luckily, and I reverse engine advice in the past, I knew that they dumped a lot of information into their logs h sometimes more than was necessary and I also knew that the logs were all encrypted with a key was the same across all devices um and the reason for that is they didn't want the logs to necessarily be open by a person. So when there's a crash, the logs that saved a card and then you send that over to them and then easy to have one key that they can just decrease the logs. Uh, so I ended up just being an A S D B C.

Uh, we double checked and crip file on the local file system that was all the same. So that point we were able to take the encysted log file, the ccrtain pt IT and see all the logs of the varies. And we just looked for inference.

Uh, our person, you know other thing like that. And we were actually really happy to see that in the logs IT was logging all detection results with the percentages, uh, which is awesome. cool. So now you have .

A A way of measuring whether or not this patch is successful or not. You can see i'm getting a great average here. Oh IT, a hundred percent knows i'm a person.

This got IT down to ninety. This got a down to eighty. You have a road down.

basically got IT. yes. So at this point, since we could see that and we can see a few different files in um the E G I for we decided to take a look back at the E G I folder and see there are any files we can mess with.

And in there there were two files A I programs of I N I and excuse me, model programs of I I and uh I N I is Normally used for configuration. Um so we decided to look into those and you could see that all the classes that the A I model detected were in there. So a person pt package uh and face and vehicle um and then there were threshold as well. So we saw that person was set to fifty um and then what we did is we said the person detection to IT had to be a hundred percent show and we've started walking from of the camera and now we saw that the tech event was fired.

It's a person ninety five percent confident but we weren't getting alert on our films uh which meant that after does the detection and make sure to see if you are a certain thresh hold or something in alert and even though person and face were both classes, if face was detected, the person wasn't over the threshold IT would not send alert from so that I meant that we now knew that our criteria was to get persons below at fifty percent resoled. So even though we could change the I my file, that's not something a regular attack can do, things they have actually beyond the camera. But I let us know that, that is our goal, which helps a lot.

Um then we reverted that back since now we knew what we need to do and we wanted to find some way to send a photo directly to the AI instead of having to walk in front of the you know camera because, uh, we're trying to quit. Never show patch, you sending lots of photos, uh, you know always doing them the physical space. You might you know, try pixel here and there and just kind of get my idea what can happen.

And IT wouldn't have been the best of, you know, we spent hours just holding up signs in different ways in front of the camera, even though would have funny yeah, pretty fundy though. Yeah, we have been really, really funny. We do have some good ones. I am at some point we trust up like a package, uh, the end of actually working. Yeah.

wait, let you put like a cardboard .

box on like medal, solid style, and there to take the package instead of a person which was really funded that a laugh when we showed a death gone. But yeah, so anyways, while we did do that or fun later on, that wasn't really the best way to go about things. So we needed some way to have ourselves send an image to the AI instead of the camera, send an image to AI.

And so again, we did some reverse engineering, and we saw that there were two main binaries. So there was I camera, which pretty much govern the entire camera. So there's all the logic, main logic, cause other things.

And then there was this egg A I protocol mobile blog ile like really long time uh in the E G I directory, which loaded up the model and the breath, and they talk to each other over a local socket on the camp. So what we do then is we created our own socket hatred, the uh really long name binary that actually runs the ai to go to our new socket instead going to the originally created socket yeah. And then we wrote a pipe on script that opened the port on the camera and we sent a photo to that port.

Our python script would added to the socket, which would then trigger the camera or the A I and the back 啊。 In the end, we had to do some patching, and, uh, we had a hook into shared memory. Because the way that the cameras works are the I camera and the A I work is that wrote the image to shared memory sended over socket, send alert over the socket um so we sent alert over the socket after writing shared memory A I reads shared memory does or does uh and then send the result back over the socket.

Hey, Jordan, yes, Scott.

who is .

premium o i'm glad you asked. Premium o is the trusted guide to ensure you get the most out of your google cloud products.

I don't know anything about them, right? To keep going.

You're doing right. You started by ask me who they are, and i'm really excited to let you know. As a google premier partner, premium o is one hundred percent google focus on, can help your organization get the full value from google solutions, things like google workspace, google club platform, gi vertex AI in google chrome, the hardware, all of them. Oh, you know.

we actually need some help with our google workspace. Maybe I should reach out to them because I feel like we're paying way too much money for how little we're getting from IT. And IT used to be free and a mad about that. And I don't know, i'm clean and this ad, but maybe you should tell me why should .

we partner with them?

We should partner with them .

because security is one of the biggest non negotiable in business. They don't think more important than data is making sure you don't lose IT. So whether it's your organizations data, your customers data, a rather important information premium s comprehensive management platform, g panel is designed to enhance google workspace security with real time reporting, alerts and automation.

I'm waiting for a as far as I can tell, the cops are coming. You're converging to arrest.

It's a crisis. g. Panel in power's organizations with unparalleled control over user management, compliance, efficiency and more, including creating, editing or removing users in just a few clicks, managing permissions, role signature template, tes devices and more from a single dashboard, real time alerts and actions when suspicious activity occurs, and creating custom policies that seamlessly automate on boarding, off boarding and decommissioning workflows.

That sounds great. Learnt how you can secure your google works space with g panel by heading to promise o duckin slash act. That's prem P R O M E V O document slash hacked.

This episode of act is brought to you by flashpoint for security leaders, twenty twenty four in a year like no other cyber threats and physical security concerns have continue to increase. Now you got geopolitical instability, adding a new layer of risk and uncertainty. Let's talk numbers.

Last year, there was a staggering eighty four percent rise in ransom, more attacks in, a thirty four percent jump in and data breaches the result, trillions of dollars and financial losses and threats to safety world point. That's where flash point comes in. Flash point in power's organizations to make mission critical decisions that will keep their people and assets safe.

how? By combining cutting edge technology with the expertise of world class analyst teams and with ignite flash points award winning threat intelligence platform, you get access to critical data, finished intelligence alerts and analytics all in one place. It's no wonder flash point is trusted by both mission critical businesses and governments worldwide taxes the industry's best threat data and intelligence.

Visit flash point dot I O today. That's flash point dot I O for the industry's best thread data on intelligence. That flash point dot I O Scott, what do you like best about shop phy?

Paul, shoplifting? Well, let you change sound door, but mean .

ching sound.

yes, stern batching ing down, but truthful. Ly, I love show fight just because IT is a well thought out, well designed, well conceived, well executed service that makes my life easier. And what what more can you ask for in today's world of and paying for a service that you don't hate, you actually love?

I like sharp fy in the the same way that I like all a lot of kind of creative software for a lot of people. You ve got an idea in your head. You want to put IT out into the world, but you don't have the right tool.

Do IT selling stuff on the internet is one of those things that seems like you should be really trivial and simple, because lord knows everyone is doing IT. And then you try and figure out how. And it's complicated, not with shop pifer shop fight.

Ed lets you plug all the different stuff you want in to one place. Gives you a really nice, clean, easy front end for people to shot from. Let's receive payment.

Let's have run your product through IT. It's how we got the hacked store running far easier than a bunch of other tools that exist. We genuinely really precious. That's what I love about chapter fy.

Yeah, I completely agree. IT is as complicated as you want IT to be, or you can use IT at a pretty high level like video, and it's very easy. So upgrade your business and get the same checker we use with sharp fy.

Send up for your one dollar per month trial period at shop fied dot com slash hacked all lower case go to sharp fy dark com slash hacked H A C K E D to upgrade your selling today, Scott, one more time .

that's shop fy duck com slash hacked everyone's in a while a new security tool comes along in, just makes you think this makes so much sense. Why has nobody done this already? And why didn't I think of IT? Push security is one of those tools.

I'm in a browser right now. Most of us do pretty much all of our work in a browser nowadays. It's where we access our tools and apps using our digital identities push turns your employees browsers into a telemetry ary source for detecting identity attack techniques and risky user behaviors that create the vulnerabilities that identity attacks exploit, then blocks those attacks are behaviors directly in the browser, in effect making the browser a control point for security.

Push uses a browser agent like end point detection response uses an in point agent only this time it's so you can monitor your workforce identities and stop identity attacks like credential stuffing, adversary in the middle attacks, session token theft. Think back to the attacks against snowflake customers earlier this year. These are the kind of identity attacks that push helps you stop today.

You deploy push into your employees, existing chrome, arc edge, all the man ones push then starts s to monitoring your employees. Log in so you can see their identities, apps, accounts and the authenticate methods that they're using if an employee gets fished, pushed to taxi and blocks IT in the browser. So those critics als don't get stolen. Like we said before, it's one of those products were ask yourself, why isn't everyone already doing this?

A team to push all come from an offensive security background. They do interesting research into identity sas attack techniques and ways to detecting them. You might know of the sas attack matrix that was the folks that pushed that helped developed, and those are the kind of attacks that they're now stopping at the browser.

A lot of security teams are already using push to get Better visibility across their identity attacks services and detect attacks they couldn't previously see with any point detection or their APP network clocks.

I think this is an area that's blowing up and not just identity threat detection response, but also doing threat hunting at the browser level like IT just makes sense.

But security is lead the charge share. It's a very cool product, a very cool team, and it's well worth checking them out and push security dot .

com flash hacks that's push security dcom slash hacked.

So you have a mechanism by which to see how confident this A I is and what it's looking at. And you have a mechanism by which to feed an image into that A I that isn't just the camera. So you don't have to dress up like a package exactly.

Um and was really great is now that we were hooked directly into the AI, we didn't have to look at the log files. We were actually getting the response to straight back from the AI, which is really nice um because to trigger a log file on the camera you have to get the camera to crash um which we didn't want to cross the camera time we had image yeah right. So now comes the really fun part.

So since we knew that this was yellow bish model, um we had read the a bunch of academic papers about attacking yellow models are to more common model. People use IT. And we'd also read some papers about, uh, attack transferred ability between models that were pretty much the same.

So what we did is, uh, there is a bunch of tools out there, uh, so we use deep patching r to generate a bunch of adversarial examples as well as hand crafting a few of our own. So for a few of them we um took photos of ourselves holding up a small poster board and then we put images on there of the other glass asses so you know put a car or a dog. So we did a few of those and then we did a few of the adversary example ones.

And we saw that about twenty percent of the adversarial example once transferred from the yolo tag to our camera, uh which is awesome because we didn't have to come up with a brand new technique to tack the A I model. So we were able to take academic a tactic makes and apply them to a real production system with twenty percent is a really good great. Um the issue with a lot of those is that they were generated for none. It's cal. So if you're only thing in the virtual al, so if have a photo of myself, it's not a problem if you know draw smile, face appearing, hack IT, but I can't just walk around with a red smiling .

face there yeah yeah sure.

So that's why we went more with the holding up and boring. So our idea was um if we could overwrite the classes that are there and make another class more confidence than we would be able to in a decrease person. And I actually worked really, really well.

So um in our blog that we had released with about a forty page blog with all the test, um we have that in, you know we have photos up there. So for holding up a photo of a car, you will detect car things like that. Um so there are a few limitations, and I always try to make sure that I always list limitations, especially for things like this.

Um well, we were able to do IT. You had to kind of hold IT a specific angle. So if you're walking, you might messed the up or something.

Um so we were able to bypass the protections fairly easily. Um but IT might you know IT works in that type setting in might not was work for like a porch prior. So the takeaway is not like, oh, everyone can steal our packages. The takeaway like, hey, this can work against the production system and you know somebody might come up with a Better patch like a teacher or something that is able to be moved. Um but IT still opens a really interesting door for .

a lot of research, extremely. So at the end, the best way you found a compromise. This had less to do with like a random cluster pixel that causes the AI to wake out and more to do with getting the A I to think that that human being is actually a car or a package or a dog or any number of these discrete categories that IT has been told. You don't need to a learn the owner in the event of dog, you only need to alert the owner in the event of person.

And that is mainly just because we were working in a physical space. So if you know IT was right, like camera IT was some server or something sending those other random pixel ones s would have work great because they were transferring really well. It's is not something you can have insistently in the physical space.

Um do we see this this type of AI model used in anything to larger scale than a consumer hundred one hundred dollar camera? Like is this type of on device AI being used in any other types of hardware where your research might be uh uh relevant?

yes. So I mean, image classification models are being used everywhere. Um so consumer camera, maybe non consumer cameras and see if you have a security system for a larger building, uh, we see them being used in industry.

So uh for example, in the industrial setting, you have these classification models, world bills try to sort out errors and parts. So then you know maybe if you're there, you could modify you know the part a little bit. And I just doesn't get the take dozens error stuff like that.

Um so it's really, really interesting of just what is the pentium there, especially cars. The nearer car is also plastic models. Um so they are being is quite widely yeah cars was .

kind of what was sitting in the back of my head is like I obviously all modern carcer network connected computers, the constantly you reach eed out to a thousand and different things.

but a lot of IT would have to be local yeah and you you actually have seen things like this in the past. So a few years ago, test love, there was an issue where if he taped over the stop sign, I would like run through the stop sign, think like that um or you know change the speed limit number. But i'd like putting a little bit of tape to make a look weird. Humans can say, no, no, it's not seventy five instead of you know fifteen .

but come eight h yeah, I did. The thing I feel feel fascinating about this between when we agreed to have this conversation, and now read the whole report you'll put out, was that potential for kind IT changed how I understood what these models were actually seeing as a human being.

You put a single line through a stop side of inferring it's a stopped and I can still see at the I know what that is and IT shifted how I understood what these models were actually perceiving when they look at something that guy wearing package outfit can trigger IT all the way down to holding up a sign with the right depiction of a dog. Oh, they are not really they don't have an internal model of the object we're looking at. We're looking for very specific patterns that are quite easy to disrupt.

exactly. So what we found was that once specifically um a lot of image classification, as you mentioned, he looks for patterns. Um so we found we had a lot higher chance success, if you know we were disrupting the shoulder outline versus holding IT over your chest. Um so that seemed that that was one of the patterns they were looking for for a person detection.

just a shoulders and shoulder for interesting. Did you notice did you get any other wear little insights into what's going on inside of these? This like relatively commonly use model of a variation on IT sounds like. But what else did you learn about .

how this thing thinks? I mean, so like with pets, like the point eight years, uh, Prices, you know, other thing like that, uh, so I mean, just the shapes that a person matter that I can just the person's is going to use a bit more logic rather than just the technical ic shape.

Um so he is try IT is kind of reproducing the way human being in first from a limited amount information to a point is it's sort of reproducing that but it's still earlier days ha. So privacy is an argue has benefit of this is not constantly calling to the cloud. Um I can run local is probably a little more efficient like I I I can see I can imagine some of the benefits of .

having these models running locally. yes. So I mean, as somebody who one now, I still think it's great actually implying them. I am min, the belief that, yes, we should still keep using them because the benefit usually out with no anything else and we're still early days. So the fact that something was hack, that's not a bad thing.

That just means that people are doing the research and then people are other securing the models as well um which is great um but you know instead having your photos sent off to a server, especially you're using the camera inside our house. Uh the huge benefit. Um it's just companies when starting to employ these I G S, they should just think about what is the worst thing that can happen when the A I system feels right. So in this case, you know you might not get a detection but if you're still saving off all uh the video, which most people aren't, just just a huge amount of storage, you know you still have something there or you still an alarm system, you still there as well. So it's that kind of trade off there, but it's still definitely worth having those new systems in place.

You you still prefer IT. You like the handles much? This is possible locally on the device. interesting.

I think it's intuitive and it's certainly you're starting to see that a little bit more in the way that certainly I functionality seems smart phones, that weight being marketed, that we're onna handle much of this locally as humanly possible. Um I was fascinated by that transition of fillip AI. It's everywhere now.

And then about nine minutes later was like we're going to do IT locally on the device. Don't worry, because the privacy implications of some of this stuff h first remarkable as IT is horrifying like that this piece of information that i've just fed into IT as like others gonna throw that off to a server somewhere and you're going have no clue where it's going exactly. You gave us a talk about this um at defcon. How did you how did you find people responded to IT? Like is there a lot of excitement about this right now?

Yeah um so this was actually I was one of my most well receive talks um so as I mentioned at six talks that week and uh for this one people so many people like he came to the talk that they ran out of seating space and people are standing around the seats to watch um which is always a really good feeling and then you know we were able to put a lot of jokes and you know package ban and things like that. Uh but for the rest of the conference that people coming up to me talk king to me about, you know what, we have been able to go, uh, people asking when we releasing the blog post, when they could read the blog post. So there is a lot of buzz around that, which was really.

really great. What surprises you most like outside of really specific technical details? What about this whole process? I guess, this emerging world of ebdon? I like what?

What shocked you. I was surprised by how to I come from a vulnerability research background. So my job in the past has been to find zero days so go out find the cvs before they even become C V S.

And then report them they become cvs and they are patched. And um so all of that skills sets reverse engineer digg down deep into the compilers and doing all that fun stuff. And a lot of people, especially in my peers, they don't think those skills sets really transfer to the A I security side things.

Um just because see some people hear A I you know think something that they don't really understand. So one of the things that we are actually trying to show with our talk, which I think we were able to people told us that that's what they understood from IT was a lot of the old skillset still applies. And it's because a lot of the new A I security that tap me out there is a little bit too focused on just you know the A I model or an l um meanwhile, there is all the supporting infrastructure around A I that might not always be considered.

Um so that was a surprise, happy surprise. I was hoping for IT. But the fact of that skill set was able to transfer over so well um which is actually how we ended up with be um taking a chip off the device uh is actually happened after we able to get the model backup.

I don't think we are served by this stuff seeming completely inaccessible to people with even high level of technical acy. The idea they like, oh, I am simply a vulnerability researcher. I could never hope to interface with like engage with this stuff.

It's not good. We I think it's useful when people can see that the existing this is just built on more of the same technology. The knowledge that you have is still relevant towards ds.

This exactly. And that's just something we've been trying to say because the more people of that are trying to hack A I, the Better, I mean, huck, A I good, of course, but the more people who are out there doing that security research, the Better it's gonna for one.

especially for stuff like this. What's consumer facing like this is a camera in your home that you are relying on for potentially the personal safety issues. The idea that this feels like this like nebulous black box that no one can ever hope on this is like, that's not good.

We need to be figuring out. We need this ecosystem of hackers to be tarring these things apart, figuring out how they're vulnerable. Very cool, guess, but I appreciate take the time to sit down and chat with me about this was a lot of fun.

Yeah, thanks. Having on here was really great to be her to office.

Hacking Security Camera AI 42:43 Share

Hacked

Deep Dive

Shownotes Transcript

Hacking Security Camera AI