We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Criminal Networks

Criminal Networks

2025/3/17
logo of podcast Data Skeptic

Data Skeptic

AI Deep Dive AI Chapters Transcript
People
A
Asaf
J
Justin
No specific information available about Justin.
主持人
专注于电动车和能源领域的播客主持人和内容创作者。
Topics
主持人: 本期节目讨论了犯罪网络分析中数据质量的重要性,以及如何利用网络科学技术识别关键人物和犯罪团伙结构。我们还探讨了数据缺失和不准确对分析结果的影响,以及如何改进干预策略。 Asaf: 我认为即使数据不完整,少量数据也能提供可操作的情报。利用‘朋友悖论’,即使只掌握网络5%的信息,也能找到网络中的关键节点。在犯罪网络中,‘重要’节点的定义取决于分析目标。高连接度的节点,即使不是头目,也可能成为重要的调查线索。社区检测技术可以帮助我们识别关键人物和犯罪团伙结构。 Justin: 我的研究表明,要准确估计网络规模,至少需要60%到80%的网络数据。犯罪网络数据不足体现在数据缺失和数据不准确两个方面。数据缺失可能导致无法有效拆解网络,而数据不准确则会影响分析结果的可靠性。我使用了四个公开可用的数据集,这些数据来自学术研究者与执法机构的合作项目。在犯罪网络中,节点代表参与者,边代表参与者之间的互动方式,互动方式的定义会影响分析结果。我选择的小型数据集更符合现实情况,因为在面对新的犯罪网络时,我们通常只掌握有限的信息。犯罪组织的结构类似于其他类型的企业,但其网络结构并非完全一致,有的更中心化,有的更分散。执法机构打击犯罪网络时,往往难以直接抓捕头目,因为头目通常隐藏在网络边缘,并通过中间人控制整个网络。打击犯罪网络,除了关注中心节点,还可以研究网络动态,识别关键角色,并采取基于价值链的干预策略。执法机构应用网络科学技术时,不必追求数据最大化,而应找到数据量和分析效果的最佳平衡点。与执法机构有效沟通的关键在于理解他们的实际问题,而不是直接提供理论解决方案。

Deep Dive

Shownotes Transcript

You're listening to Data Skeptic: Graphs and Network, the podcast exploring how the graph data structure has an impact in science, industry, and elsewhere.

Welcome to another installment of Data Skeptic Graphs and Networks. Our topic today, taken from the paper we're going to discuss, is garbage in, garbage out. How bad data can corrupt your analysis. This specific case being around looking for criminality in networks or structures and networks of criminals who are connected in certain ways.

Asaf, you had a chance to look at this. Give me your overall thoughts on Justin's paper. Justin comes from a very conservative view of how to do network analysis. For example, he said in the episode that you need at least 60 to 80 percent of the network to get a proper estimation of its size, for example. Otherwise, it's garbage in, garbage out.

I guess it's true if you're looking for a ground proof or estimation of size, but I guess it's enough for actionable intelligence, right? If you're looking at criminal networks, I have to say, I always prefer some data than no data at all. Fair point. It was Nicholas Christakis, a very famous social network researcher, that shown that you can

have actionable insights on a network with only 5% of it, using only 5% of the network or discovering only 5% of the network by using the friends paradox. Is this where the data tells us most people have three friends on sort of the modal or mean average or something like that? It's less accurate, but it says that

It says that on average, your friends have more friends than you. This sounds like a riddle. The logic behind it is that

you are probably connected to a hub in the network, and this hub has much more connections than you do. So on average, your friends have more connections than you. So following this idea, what Christakis did was to, when he tried to, I think it was to map a village for, in this case, I think vaccination purposes or healthcare programs or something like that. He wanted to get the hubs in the village, but he couldn't.

map the entire network of the village. So he randomly chose 5% of the people and asked them to name someone from the village. By using this technique, the friends of friends, chances are that you'll find the hubs in the network by getting rid of the random selected people, but just by following their pointing, following the persons they pointed at. And then you can find the hubs in the network

although you mapped only 5% of it. Sure, that's a very interesting case. I can see where that would help us find hubs.

I'm thinking of the term social butterfly, you know, like a community organizer or something like that. Yeah. A community organizer knows all the people. They're the hub. Because they're the hub, they know so many people. If you ask me, who do you know in your area? I'm likely to name that community organizer. And through them, you're connected then into everybody else. So for the analytical query, can we find the hubs? I definitely can see where a sparse network will get us there or a sparse sampling would get us there.

When you're looking for the most important node in the network, in the criminal network, the word "important" depends on what you're looking for. Let's say I want to disrupt the network. The highest degree node, though it might not be the biggest boss, but it's just used for, let's say, administrative purposes, okay? But still, he could lead us to the head of the gang. He's important.

One example Justin gave was of a delivery person that was found in the dataset. Justin uses it as an evidence to show that high-degree nodes aren't that important, right? Because we guessed that this delivery guy wasn't a criminal. But if this person was criminal, it's a great find. Though the highest-degree nodes might not be the bosses, they are still very important in the investigation.

Sometimes it's hard to map the network, especially criminal, clandestine. It's going to have a presumably weird structure about it.

I don't know if it's a network is a network. It can be criminal network, an organizational network, LinkedIn network. All networks share the same, as we keep on saying, the same laws. But every network looks different, but not so different.

Okay, so I'll expect to find central nodes in a criminal network and in friends network. If this person is a criminal, it's a great find. It could be a great lead to find other persons in the organization, right? It's a hub, it has many connections. You can use these connections maybe to find the boss or other members of the organization. And second, by using community detection, you could easily see the context.

If we use community detection on the criminal network, we can see who are the important nodes in the connection of the crime network. Let's say if we look at a criminal organization and we see a very big hub, but it belongs to the, let's say, let's call it the more administrative community, it might less interest us than a hub that has

lower degree, but still is very central in the community that is, let's say, let's call it the more physical, the more physical, but more aggressive community. The basic idea is that community detection helps us to see nodes in context in the network. And that's a thing I think you should apply on criminal networks.

So all this discussion has got me thinking, Asaf, should we pitch a new network TV cop drama where every week they catch the criminal using network science? That was my dream. And I actually asked Chad GPT to do it. I told him, make a drama series about people solving crimes using network analysis.

One of the cops said in the Chet Chepiti script, they said like, that's an interesting case. It looks like we need network analysis. And the other cop said, right, we need network analysis to solve it. And then they did some network analysis and solved it. Sounds like a montage to me. Montage moment, yes. Let's jump right into the interview then.

My name is Justin. I'm currently a PhD candidate at the Network Science Institute, Northeastern University in London. And I'm currently working on modeling and inferring social behaviors online, but actually offline as well. I think as we'll discuss a little bit later, I'm also working a lot on criminal networks and criminal intelligence.

So currently I'm organizing a satellite conference at the Network Science Conference in the Netherlands in 2025, June, on networked criminal complexity. So we're inviting scholars and practitioners working on this topic to contribute their work to our conference.

I obviously want to zoom in on your specific work, but could we kick off talking a little bit about what goes on at the Network Science Institute? We're called the Network Science Institute, so we do network science, but I think we're way beyond that. In London, we have quite some physicists. We have quite a lot of mathematicians as well and data scientists. I'm a data scientist by trade, but my supervisor is a physicist and other supervisors are

Well, he's kind of like a mathematician, but now he's doing really number theory related kind of work. And then my other supervisor from Oxford, he's also a physicist or computational social scientist.

We do a lot of things about network science. So the fundamentals of network science, so graph theory, you know, the mathematical framework of networks. There are also people who are working on epidemics here in London and also in Boston as well.

We also have multiple people working on network medicine. So probably you've heard of Barabasi. He's a big person here at NetSea. He's working on network medicine right now. And also graph learning is a big thing right now. And kind of like looking at machine learning problems from a graph perspective. These are kind of the things that

you know, what we're doing at Network Science Institute. In my own lab, we are particularly focused on mobility network and also economic network. And for me myself, I'm really all over. So as you know, I've worked on this criminal network, but currently I'm actually working more on complex human behaviors, doing inferences on when people actually respond to memes, for example. So we're quite diverse.

How did your academic journey lead you there? It's a really windy road, I would say, because I have a background in computational social science. So basically using computational methods to study social scientific questions. And I was particularly interested in the relationship between news and the media and also the usage of AI in the newsrooms.

I kind of heard of networks before when I was in my undergrad. It was years ago. I saw this research assistant position about using social network analysis to understand how people talk about technology on Twitter. I've never worked on networks. I've never worked on this whole field of science technology studies. But I thought, okay, this is really cool. And I have the computational skill.

I asked the professor, hey, can I join your lab for a few months and I can learn? So I start with Gephi, Gephi, and R.

I thought it was really cool. I mean, at that point, we were just doing some really simple centrality analysis or really just looking at who are the most important players in the Twittersphere. But then afterwards, I figured out that network science is much bigger than visualizing stuff and doing centrality analysis. One thing that I was particularly interested in is actual simulations in networks, so agent-based modeling.

And I started doing it for one of the coursework in my undergrad. And then I went on doing my master's in data science at the University of Oxford. Probably you've heard of Leuven method. And this is a really famous clustering method that basically everyone uses. And also the first thing that everyone would use when they study network science. One of the creators of Leuven method is actually my supervisor,

Then I really got into the forefront of network science, doing a lot of mathematical analysis or mathematical simulations. And that's kind of how I got into it. And then by talking to him, because he's a physicist, so I really know a lot about the

underpinnings of network science. And it's actually coming from statistical mechanics and physics instead of just simple data science methods. So that's kind of how I got into it. And then I did my whole thesis on networks as well.

And then now I'm at Network Science Institute. What was the focus of your thesis? This paper, Garbage In, Garbage Out, was actually part of my thesis. First, when I started talking to Reynold, my supervisor, my idea was to just to talk to do something about link prediction. So basically what you want to do is to understand who's connected to whom when you don't have enough data.

But then we start thinking about, hey, like, instead of coming up with a machine learning problem or approaching a machine learning problem, maybe we can ground it in a really particular case, which is criminal network. Because that's one of the use cases where data is really insufficient and network science is really being used in practice, but also theoretically it's being used.

But at the same time, we thought network science is not good enough at this point because we have a lot of limitations. But there are not a lot of studies that really examine this problem. So this kind of starting point of my thesis, and that's how I kind of came up with this whole garbage in garbage out paper, which is, hey, we need good data to do good research to put it in practice. In what way is the data insufficient for studying criminal networks?

There are different ways. For example, in a corruption network, you will probably know that, okay, there are 50 people in the network who is corrupted, but you don't know actually who is there. You have an approximate of what's going on, but you don't really know what's going on. And the same happens to a human trafficking network. You probably know the rough amount of people who are missing, but you don't know who are actually missing.

By knowing who these people are, you can actually trace back why they were being kidnapped or why they're being trafficked and who would possibly be doing the trafficking. Because you would know the location, you would know the time, you would know where these people appear and then their behavioral patterns. But we simply don't have enough information about who these people are.

and why they go missing. And the other one is inaccuracy. Sometimes we know who's who, but then we only know the names or the IDs of the devices, but we don't actually know who are using the devices. Sometimes what they can do is to fake communication. So they would communicate using these devices, but actually there's nothing going on. And sometimes it's just pure human errors.

And it sounds so trivial, but it's not trivial because you can fill out a form and say, okay, these are the people who were arrested. But actually, you filled in the wrong name, you filled in the wrong information, filled in the wrong ID number. You put it in the system and you start making this data set. And then you'll be like, who is this Kyle? Who's this Justin? I haven't seen him anywhere else. And you basically exclude them from the information.

investigation, basically. So there are really a lot of different kinds of challenges that we need to address before we can use the right technique to tackle these networks. Well, I imagine there's data available on missing persons, although not so easy where you just go download the CSV. I'm sure there's some complications there, but further criminal networks probably don't advertise on LinkedIn as to who reports to who. What sort of data set do you have to work with?

In this paper, we did use four publicly available datasets, and they are previously collected by different researchers in different generations, so early 2000s and mid-2010s. These data are actually usually coming from research projects where researchers

there are collaborations between researchers, academic researchers and law enforcement agencies. So for example, the one that you saw with Juvenile Gang, I'm pretty sure that it's actually coming from one of those collaborations and then they published the data somewhere. If you look at the terrorist network that I'm using, actually it's not coming from any researchers. Actually someone collected the court decisions and the court documents

and then look at who contacted whom by looking at who they communicated with, for example, using memos, using some sort of devices. Those are actually coming from law enforcement agencies or some people actually go into court and do the documentation of it. And there are also, of course, then you have this information and the researchers do take this information and make it as a data set. So for a downstream user like me,

What I simply did was to download a CSV from the Open REPL and then I can manipulate them in some ways. So then what is a node and what's an edge in your network? So a node is really easy to define. Edge is not that easy, so I will explain why. So a node is basically a person, people. There are people who were involved in some sort of activity. So for example, participating in the Madrid bombing,

So now it gets to something really particular, which is the edges. That's a way how you define interactions. And the way that you define interaction will give you really different results. One of the really interesting things that we observe in the juvenile gang network is that

These edges are actually constructed by looking at whether the two people are arrested at the same time. They could be a dealer, they could be a buyer of the drug, or they could be both dealers. But then in the mafia network or the mob network, it's slightly different. So it's just the participants who were there during the conference, so the mob conferences. But they may not be talking to each other, right?

But in a terrorist network, we're quite certain that they actually communicated in some ways because they passed memos, they wrote something down and put it in a safe space and someone would pick up the memo and they didn't know because they were presented in the court. This is the data that we're working with and this is quite constraining in itself, but this is the only thing that we can work with for now. I mean, of course, there are

really like a lot of different kind of data sets, but these are four data sets that are really well known in the criminal network community, research community, not the community of the networks, but the research community who are studying criminal networks. But also one thing that I kind of want to highlight is the number of nodes in these networks. They're quite small. We have like something between 20 to 150 nodes.

This is quite interesting in comparison to other studies that you could find on Google Scholar. They're usually looking at like 50,000 nodes or 60,000 for a criminal network, of course. If you look at social network, it could go up to 30 million or something.

But there's a reason why we did this, because most of the time when we encounter a new network, a new criminal network, we actually don't know a lot about them. So we want to constrain ourselves to a more practical sense and say, hey, okay, now we only have 20 people that we know. We're quite sure that they are connected in some ways and they interact in also some ways. How much can we do about these networks? And this is why we chose these networks.

Does the data you have available allow you to make commentary on the structure of a criminal network? Does it look like a standard corporation or does it have a different kind of network structure? I think in a criminology community, we always say that organized crime, it is totally the same as any other kind of businesses that you observe.

Because there's a boss, there would be some managers, there would be people, there would be departments, different experts working on different problems.

One example that I always give is, you know, human trafficking network. There would be like people who are actually doing the snatching part of it. So, you know, going on the street or pretend to be a smuggler and then they will steal the passport and traffic, you know, people somewhere. And there will be people who are doing the money laundering. There will be people who are doing the active recruitment. So they're not involved in the actual trafficking, but they are hiring people to do the trafficking. And then the networks that you see here,

It does tell something about the organizations, but something really interesting actually emerges when we observe the kind of the structural properties of these networks. So one of the things that we found is that these networks don't have a uniform shape. Some networks are a little bit more centralized than the others.

And it tells us about how these networks operate. So for a more centralized network, basically there would be this one person who are kind of broadcasting information to the rest of the network. And then these people will be receiving orders from these people and then they will act on them. But for the more decentralized networks, it works a little bit more complicated. Some networks are much more centralized, much more dependent on one leader and the others are more dependent on

more interactive messaging between different notes or actors in the network. And a criminal organization that is going to exist for a while is going to try and hide itself, right? They don't want to publicize their criminal network or how they're organized. So you're sort of fighting against this hidden data problem. To what degree do you have the full picture and how do those limitations impact the potential for analysis?

The assumption that we're making in this paper is that let's assume that these data are complete, assume that we only have these people and this is the way that they operate. And even with that, we already find out that with maybe only 80% visible nodes, there are problems emerging. For example, we can't really dismantle the networks effectively.

Yeah, it is a big problem. And then I do have a statistic from the Australian Crime Agency where they look at how many missing people that we know of right now. It's only one-fourth of the true number of missing people. One-fourth is really a lot, which means that there are three-quarters of people that we don't know that are missing, and we don't have that in file.

And that's the kind of like the conservative estimation. And using the data that we chose in this paper, we actually can effectively show that even with only 20% missing data, we can't do a lot. And this kind of goes back to the title, garbage in, garbage out, because you don't have good data. We can't do anything about it.

Well, I would hope if you had 0.2% missing, you could do a lot. So that's two orders of magnitude from 20% to 0.2%. Where is like the critical threshold? How accurate must your data be to do useful analysis? What we've found across all the networks, kind of, sorry, I didn't mention this, which is that we also use synthetic networks to verify our claims on certain hypotheses that we did. What

What we have found is that over all of these networks, basically when you have only 60% of the network, it doesn't work at all. So it's between 60% and 80%. So that's where things get a little bit weird. And anything below 50% is trash.

Of course, there are different reasons why law enforcement agencies actually don't tackle criminal networks. Actually, I talked to people who's actively working with Dutch police. He told me that, you know what, we actually know the network, but we don't want to disturb it because when we disturb it, when we arrest the people in that network, there will be an unseen network

Maybe they're rivals that are going to emerge. And it's much more difficult to control the unseen networks because they don't know who they are. This is the problem that's going on. And most of the time we need so much confidence

to be able to do an actual intervention. So some sort of joint operation between different law enforcement agencies or even involving Europol, the FBI, for example, kind of doing this international tackling of criminal organizations. But they're really difficult because there was this one case, I think a few years ago, where they actually seized millions and millions of drugs and firearms. They arrested a lot of people. Some of them

Our police officers, by the way, that also tells a lot about corruption. Only a day after that operation, there are still drug dealers on the street. So this is really interesting because then we thought that we know a lot about what's going on. Actually, we don't. I think we know a little bit more than we did five years ago or 10 years ago.

But the problem is that these organizations are also getting smarter. They're also evolving over time. So this is quite challenging. And I think maybe we can talk about it later as well, which is kind of like the technologies or the techniques that they can use to height themselves, which is, well, in some ways, making our lives much more difficult.

I know this may get a little bit outside of your area of expertise, but do you know anything about the police investigator's point of view? Maybe a naive approach would be if we take down the big boss, if we get Al Capone or El Chapo, then the whole thing falls apart. Is it as simple as that, or is it harder to bring down such a network?

I did talk to people who are in the force or who are working closely with the enforcement agencies. So the first problem is that usually these people are really clean, so you can't really tackle them. They delegated everything to their, I don't know, right-hand man, left-hand man or whoever. So you can't really tackle them.

But even if you do, the most important people may not have the most connections in the network. And that's a problem because we have found in some papers that actually the bosses have maybe the least number of connections because they would have one left-hand man or right-hand man connecting the rest of the network. So they would be in the periphery of the network instead of in the center of the network.

Because in that way, they can preserve themselves much better than how they would look like when they're at the core of the network. The way that I see criminology or kind of like criminal network scientists who are trying to tackle this is really instead of targeting the most important person of the

person with the highest centrality, however you want to define it, is to kind of study the dynamics of these criminal networks instead. See

how these people evolve over time and try to infer what their roles are. And of course, this is not the only way to tackle a criminal network. There are different ways called value chain based intervention. So instead of looking at people who have the most connections, instead you target specific people with specific roles. So if you know that, okay,

If a human snatcher is the most effective person to attack, then you attack that person. So you can combine these different methods, so you need more attributes as well. So instead of only having degrees or between essentialities as an attribute, you can also look at what the roles are, the actual role, or maybe the tenure in the network, for example.

And these are really important. And I think there are a lot of work right now also looking in the geospatial distribution of these networks. And that works pretty well in theory. And also in practice, somehow, I do know that some people have been collaborating with law enforcement agencies in different cities to tackle this problem. In the Netherlands, there are people who are working on port intelligence projects.

So looking at, you know, truck imports, working closely with the government and working closely with the police force as well.

And I think it requires way more than networks and graphs. It requires also a lot of computer scientists, a lot of criminologists, and also people who are actually working on a field to understand what could possibly be a solution to those challenges that we have in these scenarios. As we were discussing that anything below maybe 60% of the network, it's hard to really learn much from it.

Beyond that, if you have something greater, what kind of insights does that unlock? You start to have a more complete structure. Imagine you have a skeleton and then having only a random H3N1.

80% of it, you're quite likely to get the head or you're quite likely to get the ribs. But if we have 60, then that actually reduces. So imagine that we're trying to estimate how human skeleton looks like with 80% of the data. So you just keep on randomly sample the snapshots of different body parts. So basically, this is what you have with more than 80%. You have much more confidence in how they look like

the network metrics or the statistics are much more confident as well. There have been works on looking at how missing data influence the errors that you make in doing those estimations. So with more than 80%, these estimation errors go down quite drastically. I don't have the exact number, but then they do become quite reliable.

But once you have less than 80% and less than 60%, then it becomes much more fragile in a way. So imagine having 60% of your body and then you try to sketch out the skeleton. Then it may look like a dog or it may look like a human, but you don't know how it will actually look like at the end. I think that is exactly what we are trying to do right now.

When I think of network science, it's a toolbox to me. It has things like centrality and the Louvain algorithm and things like that. When you open the toolbox, what are you inclined to go with? What's most useful for work like yours? The method that we used is called percolation, and then it comes from physics. We are actually doing the inverse of percolation. So imagine you have a grid.

n by n grid with like different checker boxes there and what you basically try to do is to understand how many of the boxes that you take out to be able to disconnect the most connected component so that the component that is uh that's with nodes that are interconnected so that's what percolation is trying to do and and you there are already a lot of theoretical you

thresholds as to when this connected component with the superior emerge. But this percolation theory does tell us that, okay, if you target the most connected person,

the threshold is more likely to shift to the more efficient side. So the most connected component will disappear more quickly when you're not targeting the most connected person or you're just targeting random people. And of course, when you open this toolbox and then look at the metrics that you have, then you can look at between a centrality, you can look at degree centrality,

Page rank, if you have the directions as well. And these are really important things. But then because we now have more compute, let's say, we can do a lot of machine learning related work as well. And that's kind of one of the things that I did, which is this thing called Finder.

So this is a reinforcement learning based targeting. I think on the back end, it's still, you know, centrality metrics. But what it's trying to do is to teach an agent to play the percolation game. So target this, target the network and keep on playing it, keep on playing it until a point that the percolation becomes so efficient that you know which one to percolate without even thinking

reading the networks beforehand. So that's basically how Finder actually works. So I know Finder, I guess, was maybe perhaps the most promising of the methods you looked at, but there are a couple others. Could you compare and contrast the ways in which you use these methods to look at the networks?

I think, as you know, there are a lot of centrality metrics that we use in network science. And then I think one of the purposes of paper is to kind of test the most common ones and to contrast them with the latest ones. For example, Finder, CoreHD, and Collective Influence.

I just mentioned degree centrality and it's quite self-explanatory. So basically looking at how many people the most important node has a connection with. But also if we look at closeness centrality and betweenness centrality, then you start to look at people who are having this role of being a bridge between different nodes. So connecting different sides, different components of the networks

And with the heuristics, they are a little bit more complicated, but at the end of the day, they are still trying to use degree centrality. For example, Core HD basically is trying to find the most high degree in the core. So they kind of do some filtering before they do the finding the most higher degree one. It is more adaptive in a way because it starts removing things out. And what they have in addition is

is this thing called grid insertion. Instead of removing everything out, you can actually plug things back in so that you don't remove people who you don't think are necessary to be removed. We could know, okay, these are the ranks that we can make for the most efficient and also to avoid unnecessary removal of nodes.

Well, if I'm thinking of this from the point of view of law enforcement, I assume their first goal is to reduce crime as much as they can. Maybe some secondary goal is not to accuse innocent people or something along those lines. I'm not sure. If like a psychic calls the police department, I hope they know to tell them to get lost because it's not science. But your work is science yet has this challenge of garbage in, garbage out. If the network's not complete enough, law enforcement might not be able to use the results effectively.

Do you have any thoughts on at what point it's most effective for law enforcement to look into techniques like yours? I think one of the anecdotes that I will mention here is that a while ago I was talking to a police representative in the UK. She was saying, hey, you know, network science, cool. I love what you're doing. I've been working with you guys maybe for a while.

But she's working on corruption network in the police force. And she said, we don't have the data. We have nothing. In the perfect scenario, you will have like all the data. But that doesn't work. First of all, if you have everything in the world, probably the world doesn't exist anyway. What we want to try to achieve here is that without maximizing the amount of data, we can still make a good prediction.

Because in the past decade, people have been saying, maximize everything, maximize. Like this is not really useful. And I can give you a reason why. So this is a basic machine learning kind of, you know, understanding of precision and recall. So if you maximize the data, maximize the scope, basically what you need to do is to record every single person's identity where they have been. Maybe governments have been doing this already. If you look at the EU or maybe you look at the US as well.

they have a lot of information about third country nationals. This is rather inefficient because then you need to record everything about these third country nationals. Basically, you're casting a YNet and say, let's assume that everyone could be a criminal

And if they are criminal, we're really smart. But no, they're not because they just assume that everyone could be a criminal. And then they run these algorithms on them. Okay, now we found one person out of the 2 million people that they're criminal. So what we're trying to advocate is instead of maximizing the number of people that we know about, we just need to find the sweet spot of not really...

overdoing it. Of course, there are a lot of ethical issues of doing this, I'm sure, because you need active surveillance. But at the same time, it's also about efficiency. How much money are we spending on implementing border controls? How much money are we spending on actually recording information about individuals? We're spending a lot of money. Going back to my work, the most important part or the biggest takeaway for practitioners is that

If we have some data, that's already some starting point. But before actually tackling the network, we first understand how much are we missing. If we're quite confident, we can go on. If we're not confident, do some estimations. So you also have this problem, right? So before actually adopting...

Think about why this happens, why the nodes happen. And also, I think the most important part of it is define the edges more precisely, more theoretically confrontational.

confined because otherwise we have a network with co-arrest edges. We also have a network with communication as edges. So they don't necessarily make sense in all cases. They might be useful for other procedures in the investigation, but not for the network scientists who are working on criminal network.

So I think there's strongly a place for network science in these types of investigations, but it's complex for some of the reasons you've laid out, where we don't want to give false positives. And I don't expect that every investigator in a legal system is well-versed in network science. Do you have any thoughts on how you can most effectively yet ethically express your findings and findings of methods like this to law enforcement groups?

So to be able to really effectively communicate with law enforcement and people who are working in the field is really to start by talking with them and understand the problems instead of proposing a theoretical solution, but instead say, hey, I know that there are a lot of challenges that you're facing, but can you tell us maybe something about the case that you're working on?

That's much better than us network scientists going into the industry and say, hey, we have a bunch of centrality metrics here. What do you think about this? Go arrest them. Yeah. Arrest them. No, it doesn't work that way. Right. So.

I think that is a great first step, really opening up this communication channels between practitioners and network scientists. And this is what we're trying to do with our conference satellite as well, which is really bringing in people who are in the field, but also people who are

more theoretical. So people who are actually criminologists, I'm not a criminologist, but people who know more about how these networks evolve from a more sociological, anthropological point of view. And by talking to these people, we understand more about how these networks actually operate and also how the interactions between law enforcement and this network emerge over time.

Can you share some details on the satellite conference coming up in June 2025? It is a joint effort by really an international team and also scholars with different backgrounds, with mathematicians, with computer scientists. I'm a data scientist, but we also have physicists in the team that we're trying to really bring different people in from different disciplines to research.

understand the complexity of criminal networks. That's why we call it Networked Criminal Complexity. We're hosting this satellite actually on the 3rd of June 2025 in Maastricht. It is a part of the bigger conference of Network Science Conference. So we're accepting abstracts right now. We only have actually two more days until the submission deadline is over. But then we have received quite some submissions on this. And then we also have some key...

keynote speeches on different topics. We also have demos coming from our research teams at the University of Amsterdam and TU Delft that have been working closely with the government and also with some intelligence companies on port intelligence, for example. We will also have speakers talking about mafia network and drug trafficking network. So it is really diverse, really different types of organized crime events

cases that we're looking at. But we are also actually inviting theoretical scholars to contribute to the satellite because we feel like sometimes network science is quite disconnected with criminology literature. It is quite easy to analyze things mathematically. Not easy, but when you take away all the context, then

you can do whatever you want. But this is not the case. So we also want to listen to people who may have a more critical view towards network science, people who have more experience in the field, people who have experience in combining the interdisciplinary efforts in criminal networks would be really welcomed. This is going to be a really fun satellite.

And hopefully there will be people who will be interested in this topic or even just, you know, come hang out. It'll be fun. Definitely. A very exciting opportunity. I wish I could make it, actually. Can you not?

Travel gets a little complicated. There's a maybe. We'll see. Jews of ways off. Who knows? Maybe I'll win the lottery between now and then. Maybe. In any event, Justin, where can listeners follow you online? Yeah, so I am on LinkedIn. I'm on Blue Sky as well. I have my own personal website, which is a really retro, old school kind of HTML only website. I'm on Twitter as well, but I'm trying to detach myself from it right now.

But you can always find me on these platforms. Very good. We'll have links to all the above in the show notes for listeners to follow up. Justin, thank you so much for taking the time to come on and share your work. Thank you. It's a pleasure.