We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Best of: Computation cracks cold cases

Best of: Computation cracks cold cases

2024/11/1
logo of podcast The Future of Everything

The Future of Everything

AI Deep Dive AI Insights AI Chapters Transcript
#forensic investigation#artificial intelligence and machine learning#cybersecurity#social sciences#blockchain and cryptocurrency#biotechnology and neuroscience#legal insights#data privacy People
L
Lawrence Wein
R
Russ Altman
Topics
@Russ Altman : 本集讨论了如何利用数学模型和算法分析家系数据库中的DNA数据,从而帮助执法部门侦破冷案。即使犯罪嫌疑人不在数据库中,也可以通过分析其亲属的DNA数据来缩小侦查范围。 @Lawrence Wein : 法医基因家谱学(FGG)通过分析家系数据库中的DNA数据,帮助侦破悬而未决的案件。在无法通过直接DNA比对找到犯罪嫌疑人时,可以利用第三方数据库(如GEDmatch和Family Tree DNA)寻找嫌疑人的亲属,从而缩小侦查范围。家谱学家利用这些信息构建家谱,最终确定犯罪嫌疑人。 该团队开发了一种数学模型,旨在优化家谱构建过程,提高效率。该模型将家谱构建过程分为两个阶段:向上追溯祖先和向下追溯后代。通过比较传统方法和该团队提出的算法,发现该算法在效率上高出传统方法10倍以上。 该算法是一个纯算法过程,不需要人工干预,但可以与家谱学家的工作相结合,提高效率。家谱学家仍然在法医基因家谱学中发挥重要作用,他们利用地理位置、种族等信息辅助算法进行分析。 法医基因家谱学涉及伦理和隐私问题,例如目标测试(target testing)。目前执法部门已经开始进行目标测试,但缺乏对该方法的伦理和隐私影响的评估。该团队的研究旨在为伦理学家和政策制定者提供信息,帮助他们做出明智的决策。 Lawrence Wein: 法医基因家谱学(FGG)的出现,解决了传统方法难以侦破冷案的问题。通过利用公开的家谱数据库,结合数学模型和算法,可以高效地分析DNA数据,找到犯罪嫌疑人的亲属,最终确定嫌疑人。 该团队提出的算法,通过优化家谱构建过程,显著提高了效率。算法将家谱构建分为向上追溯祖先和向下追溯后代两个阶段,并通过计算概率来选择最有效的搜索路径,从而避免了传统方法中存在的低效和错误。 虽然算法可以独立运行,但家谱学家的专业知识仍然至关重要。家谱学家可以利用地理位置、种族等信息,辅助算法进行分析,提高准确性和效率。 法医基因家谱学也带来了一些伦理和隐私问题,例如目标测试。在目标测试中,执法部门会直接联系到数据库中未登记的嫌疑人亲属,要求提供DNA样本,这涉及到个人隐私和知情权的问题。 该团队正在研究如何改进算法,以减少对目标测试的依赖,并正在研究如何评估目标测试的伦理和隐私影响,为相关政策制定提供参考。

Deep Dive

用数学破案:法医基因家谱学如何利用算法侦破冷案

我最近与斯坦福大学管理科学教授Lawrence Wein进行了一次对话,探讨了他利用数学模型和算法解决冷案的开创性工作。这并非科幻小说,而是利用法医基因家谱学(Forensic Genetic Genealogy, FGG)的现实应用。

FGG 的核心在于利用公开的家谱数据库中的DNA数据,即使犯罪嫌疑人本人不在数据库中,也能通过分析其亲属的DNA数据来缩小侦查范围。想象一下:一滴血、一根头发,这些微量的DNA样本,经过分析后,能指向一个庞大的家族网络,最终锁定罪犯。

传统方法的瓶颈

传统的侦破方法在面对冷案时往往束手无策。即使获得了犯罪现场的DNA样本,如果嫌疑人不在现有的罪犯DNA数据库中,调查就陷入了僵局。而FGG的出现,正是为了突破这一瓶颈。

算法的威力:高效的家谱构建

Wein教授及其团队开发了一种数学模型,显著提高了FGG的效率。该模型将家谱构建过程巧妙地分为两个阶段:

  • 向上追溯(Ascending): 从与犯罪嫌疑人DNA部分匹配的已知个体出发,向上追溯其祖先,构建家族树的上半部分。
  • 向下追溯(Descending): 找到嫌疑人祖先的共同点后,向下追溯其后代,寻找可能的犯罪嫌疑人。

这听起来很简单,但实际操作中,需要处理海量数据和复杂的亲缘关系。Wein教授的算法通过计算概率,选择最有效的搜索路径,避免了传统方法中低效的“大海捞针”式搜索,以及可能出现的错误判断。我们的研究表明,该算法的效率比传统方法高出10倍以上。

算法与人类智慧的结合

尽管算法能够高效地处理数据,但家谱学家的专业知识仍然不可或缺。算法并非完全自动化,它需要与人类智慧相结合。家谱学家可以利用地理位置、种族等信息,辅助算法进行分析,从而提高准确性和效率。例如,如果算法发现嫌疑人的亲属主要集中在某个特定地区,家谱学家就可以利用这一信息,进一步缩小搜索范围。

伦理与隐私的挑战:目标测试

FGG 的应用也带来了一些伦理和隐私方面的挑战,其中最突出的是“目标测试”(target testing)。在目标测试中,执法部门会直接联系到数据库中未登记的嫌疑人亲属,要求提供DNA样本。这涉及到个人隐私和知情权的问题,需要谨慎对待。

未来的方向

目前,我们正在研究如何改进算法,以减少对目标测试的依赖,并正在努力评估目标测试的伦理和隐私影响,为相关政策制定提供参考。我们的目标是,让FGG技术在侦破冷案的同时,最大限度地保护个人隐私和权益。

结论

法医基因家谱学代表着犯罪侦查领域的一次革命。通过巧妙地结合数学模型、算法和人类智慧,我们能够更高效地利用DNA数据,为解决悬而未决的案件提供新的途径。然而,在享受科技进步带来的便利的同时,我们也必须正视伦理和隐私问题,确保这项技术能够以负责任的方式应用。

Key Insights

What is forensic genetic genealogy and how does it help solve cold cases?

Forensic genetic genealogy uses DNA traces to identify criminals by analyzing partial matches in third-party ancestry databases. It involves matching DNA from crime scenes to relatives in databases like GEDmatch, then using genealogists and mathematical algorithms to build family trees and identify suspects.

How does the mathematical algorithm improve the efficiency of forensic genealogy?

The algorithm optimizes the process by focusing on potential ancestors of the target, prioritizing matches that are more likely to lead to the criminal. It reduces the workload by avoiding false leads and inefficient tree expansion, solving cases up to 10 times faster than traditional methods.

What are the ethical concerns surrounding the use of genetic data in forensic genealogy?

Ethical issues include privacy violations for individuals whose DNA is collected without consent, especially when law enforcement knocks on doors to request DNA samples. There are also concerns about the long-term storage and potential misuse of genetic data, as well as the broader implications for family members whose DNA is indirectly included in the database.

How do genealogists contribute to the process alongside the mathematical algorithm?

Genealogists use their expertise in tracing family trees, often relying on geography, ethnicity, and historical records that the algorithm doesn't consider. They act as a complement to the algorithm, helping to resolve complex cases and providing insights that the mathematical model cannot capture.

What is the role of third-party DNA databases in forensic genealogy?

Third-party databases like GEDmatch and Family Tree DNA provide lists of potential relatives based on DNA matches. These databases are crucial for identifying distant relatives of the suspect, which helps genealogists and algorithms narrow down the family tree to find the perpetrator.

Why did the Golden State Killer case become a landmark for forensic genetic genealogy?

The Golden State Killer case marked the first high-profile use of forensic genetic genealogy to solve a decades-old cold case. It demonstrated the potential of combining DNA analysis with family tree research to identify a previously untraceable suspect, even when the suspect's DNA was not directly in the database.

How does the algorithm handle the complexity of building family trees?

The algorithm uses a two-stage process: ascending to find ancestors and descending to trace descendants. It prioritizes matches that are more likely to be related to the target, minimizing unnecessary work by avoiding overshooting or undershooting in the tree construction.

What is the significance of the centimorgan value in DNA matching?

The centimorgan value measures the amount of shared DNA between individuals, indicating how closely they are related. It helps in determining the likelihood of a match being a distant relative, which is crucial for building accurate family trees and identifying suspects.

What is the current status of DNA databases and law enforcement access?

Databases like GEDmatch have transitioned to an opt-in system, where users must consent to allow law enforcement access. This change was prompted by privacy concerns after the Golden State Killer case, but only about 30% of users currently opt in.

How does the algorithm interact with human genealogists in practice?

The algorithm provides recommendations for which matches to investigate next, based on mathematical optimization. Genealogists then perform the actual research, such as looking up marriage records or birth certificates, and feed the results back into the algorithm to refine the search.

Chapters
This chapter explores the use of forensic genetic genealogy (FGG) in solving cold cases. It explains the process of using DNA from crime scenes, comparing it with third-party databases like GEDmatch, and employing mathematical models to build family trees and identify potential suspects.
  • Forensic genetic genealogy uses DNA from crime scenes and third-party databases to identify suspects.
  • Mathematical models help streamline the genealogy process.
  • The Golden State Killer case is used as an example of a successful FGG investigation.

Shownotes Transcript

Translations:
中文

Hi, everyone. It's Russ Altman here from the Future of Everything. We're starting our new Q&A segment on the podcast. At the end of an episode, I'll be answering a few questions that come in from viewers and listeners like you.

If you have a question, send it our way either in writing or as a voice memo, and it may be featured in an upcoming episode. Please introduce yourself, tell us where you're from, and give us your question. You can send the questions to thefutureofeverythingatstanford.edu. The future of everything, all one word, no spaces, no caps, no nothing, at stanford.edu.

S-T-A-N-F-O-R-D dot E-D-U. Thanks very much. Hi, everyone. It's your host, Russ Altman from The Future of Everything.

Well, the Halloween holiday is now behind us, but here at The Future of Everything, we're not quite done with spooky season. If you're pairing your trick-or-treat haul with some scary movies, we invite you to revisit with us a conversation I had with Lawrence Wine a couple of years ago about work he's doing in forensic genetic genealogy, trying to crack cold cases.

Professor Wine shares how he's using math to catch criminals through traces of their DNA. Tiny, tiny bits of DNA. It's both haunting and hopeful, and we hope you'll take another listen. Before we jump into this episode, I'd like to ask you to rate and review the podcast. It'll help others figure out if they're interested in the future of everything.

There's been a revolution in our ability to measure DNA. First, we can use it for health benefits. We can understand our risk for disease and which drugs may and may not work. Second, people are really interested in how DNA tells them about their ancestry. Where did my people come from? Why am I here? That has led to the proliferation of databases of DNA measurements made by people who want to find their long-lost relatives.

These databases can be used not just for ancestry, but sometimes by law enforcement to find perpetrators of crimes. It's not that the perpetrator will be in the database. That would be a direct match and easy. But sometimes their relatives might be in the database. And using relatively sophisticated algorithms, we can figure out, based on the pattern of DNA from relatives in these ancestry databases, who it might be that committed a crime.

Well, Professor Larry Wine is a professor of management science at Stanford University. He uses math modeling tools to understand a whole range of problems in manufacturing, healthcare, and homeland security. He has worked on a wide range of public health issues, including HIV, anthrax, influenza, food terrorism, and biometric identification.

He will tell us how he uses math modeling tools to take a long list of DNA partial matches and figure out who must be the perpetrator whose DNA was found at a crime scene. Larry, thanks so much for being here. You've recently published a really interesting paper about something called forensic genetic genealogy. What's the problem that your team was trying to address? And then we can get into what the solution is. Okay, sure. And thanks, Russ, for having me on. I really appreciate it.

So, forensic genetic genealogy or FGG came on the scene around four and a half years ago, hit the headlines with the Golden State Killer case.

So maybe I'll just walk us through that case. That sounds great. So the first step, which is typically the starting point, is that you have some bodily fluids or something you can get a DNA sample from the murderer. I'll be calling that person the target, the unknown target. And the first thing you do is compare it to the database bad guy's DNA through the whole country and you don't get a hit.

And then you've exhausted all your other non-DNA leads. And then these cases just sit there for decades unsolved. So the new and interesting part of this. And the reason you don't get a hit is because if this person has never been arrested or had a crime before, they won't be in the database and you're out of luck. Yeah. And with awful irony, this guy happened to be an ex-cop.

Okay. The Golden State Killer. So the next step, what's made this new is there's now these new third-party services, in particular GEDmatch and Family Tree DNA or FTDNA. And what you do is you take some SNP data. You basically take the bodily fluids or whatever, send it to a laboratory. You get back essentially some DNA. You...

Then send it to these companies and this company like GEDmatch will send you back a list of like a thousand names of people who are related to this unknown murderer.

Along with the centimorgan value, which is the amount of DNA that's shared between each of these thousand people and the target. And not only that, it gives you the names and the email addresses. Whoa. So wait a minute. How did they get in the database? So these are not criminals, generally speaking. Is that correct? This is like 23andMe or Ancestry.com. It's people who are doing this. The difference is that 23andMe and Ancestry.com do not allow law enforcement to use...

And the reason that folks do this is simply they're curious about their ancestry and they'd like to find long lost. They're not even lost relatives, their relatives that they never even knew that they had.

And that's fun because then you can say, Hey, you looks like you're my third cousin or whatever, something like that. Exactly. Right. I read something recently that genealogy is the second most popular hobby in the world behind gardening. Okay. Oh, and, and actually that's interesting because it's all about trees, but, but, but I,

But I digress. One more question about these databases is they must agree to have their name and email released as part of the deal for getting other people's names and emails. So interesting. So at the time that they caught the Golden State Killer, that was not the case. And so there was an uproar after they.

caught him and it became public that these users of GEDmatch were not aware that law enforcement were using them. So as a result, GEDmatch became an opt-in system. So you had to

opt in that law enforcement could use it. And as a result, it's somewhere around 30% or so, you know, slowly climbing each year of the original users are opting in. Okay, so good. So thank you for that distraction from me. But where we last left off is we've now, they have some DNA from the perpetrator or the target, as you said, and now they have this GED match. Okay, and continue about what the law enforcement does.

Right. So then at that point, you take the output from GEDmatch, the names and addresses of a whole bunch of relatives, some information about how each of these relatives are related to the murderer and how they're related to each other. And you give this to a team of genealogists. And the genealogists then try to build this family tree to figure out who is this unknown murderer. In the case of

the Golden State Killer, it took a number of months. They had a few distant relatives and they finally focused in on one person. And then the last step is to get a confirmatory sample. And they basically stalked his house. They got sample from his car door. They went into his garbage and found a tissue. And both of them gave a perfect DNA match.

And then, you know, they arrested him.

But it's also possible that your sibling, your first cousin, your second cousin, if it's sibling or first cousin, my guess is it's going to be easy to figure out who this is. Right. But my guess is that you don't always get siblings or first cousins and it's distant relatives. Right. So in our analysis, we looked at 17 cases from a company DNA Doe Project and there's great variation among the cases. So

Of the 17 cases, eight of them were solved at the time they gave us the data. And some of them were easy to solve. As you say, they investigated a couple dozen matches and found the person. In other cases, they investigated several hundred people and the cases remain unsolved. Okay. And this correlates roughly with whether it's a third, fifth cousin versus whether it's a first, second cousin. Okay. So now where does your team come in?

Sounds like a great system. Right. So we basically are the first ones to, well, first I'll just say the genealogy part of the process is by far the bottleneck, certainly in terms of time. Think of how long it takes to

how long it takes to solve a case. The great majority of it is on that. But maybe more importantly, also the chances that you solve a case can depend on how good a job or thorough a job you do on the genealogy part. And so our paper is really the first attempt to mathematically formulate the genealogy problem and then make some attempt to solve it.

And when you say genealogy, you are, I think, literally looking at birth certificates, family trees, immigration. What does it mean to do that? The precise mathematical problem is given a list of people, like let's just say a thousand people who are related to this murder, given the centimorgan value, how much... Which is a measure of how far apart they are. Right. And then a thousand by a thousand matrix...

of centimorgan values of how these 1,000 people are related to each other. Giving that maximize the probability that we can identify the killer subject to a constraint on the amount of work we put in. And we measure this workload by the number of people in the final family tree.

Okay. Because it's not that you're building a single 1,000-person family tree. You have to make decisions about which people will be in the tree and then build just like that subtree. I'm trying to understand this tree building. Yeah, you're building one huge tree, but we're making the decisions in our optimization. What's the efficient thing to do next in this tree? So I'll describe a little in broad terms. Yeah.

The general genealogy problem as posed broadly has two stages. The first stage I would call the ascending stage, where you're starting from matches that you're going to investigate, and then you try to find their ancestors, their parents, grandparents, great-grandparents, go back in time, up the tree. Yes.

The second stage is a descending stage where you finally take some of these ancestors and you're descending down looking for their descendants, their children, their grandchildren, great-grandchildren. And in the descending stage, ultimately, you're looking for a marriage relationship.

between the mother's and father's side of the murderer's family tree. And once you find that marriage, assuming there's no endogamy inbreeding, but then you know that the murder will be some offspring of

maybe not the direct child, but some descendant of this marriage. - Yes, okay, this makes sense. And you've just referred to something that everybody knows, but that I had forgotten for a moment, which is that my mom's tree and my dad's tree are unlikely to be intermixed. So when you look at all my relatives in this database,

I was going to say roughly, but it's not really even roughly. They can exactly be separated into likely relatives of my mother and likely relatives of my father. And they all come together in my mother and father for me. So you're trying to figure that all out. Right. And the miracle of mathematics allows you to do it.

Right, right. Exactly. So how does it work? I mean, so what kind of results did you find? Yeah. Well, maybe I'll tell you a little bit more about that. Yeah, definitely. First, I'll define something called most recent common ancestor. So if you and I were first cousins, we're not, but if we were, then our most recent common ancestors would be our common set of grandparents. Yes. That set of grandparents have parents and grandparents themselves. Those people would be common ancestors between us.

for you and me, but they wouldn't be the most recent common ancestor. Got it. Got it. So in our paper, we compare two strategies. The first one we call the benchmark strategy, which crudely represents how things are done in practice. And the way it's done in practice is that you're looking for common ancestors between a pair of matches. So you don't actually...

involve the target at all. You're just looking at these lists of matches, looking for ancestors. Then once they find common ancestors between pairs of matches, they then descend from them looking for their offspring. What we do in contrast is

we're explicitly considering the target for the target, each of the target's ancestors. So the murderer's mother and father, the four grandparents, the eight great-great-great-great-grandparents. We're going to keep a list of possible people who are most recent common ancestors

between each match and the target. And then what we do is calculate, given a particular list, what's the probability the true most recent common ancestor is in that list?

Because, you know, I may be the murderer, you may be the match. We find your grandparents, we're first cousins. But it turns out they find the wrong, you know, we're related on our paternal side, but they find the maternal grandparents who are no relation to me. Right. So it sounds like that that first method, the traditional way, there's a lot of false leads, like there's a lot of work that they're doing that is perfectly reasonable, but it doesn't connect.

to the target, whereas your insight was, let's make this very target-focused and prioritize the analysis of the people who we believe to be ancestors of the target. Right, right. And so we're more efficient than the benchmark strategy on both the ascending and the descending stage. I would say on the descending stage,

Well, on the ascending stage, if you do it their way, there's the danger of either undershooting or overshooting. If you undershoot when you descend from the common ancestors'

they're too closely related relative to the murder and you don't catch the murderer, the murderer is not in the descendants. And you can overshoot and go back five generations instead of three generations. And then it's just very inefficient because especially way back a couple hundred years ago, people were having six, seven children and you just- So the tree kind of exploded. And a lot of it is just not very useful.

On a more interesting side, on the descending side, we find that our proposed strategy is very aggressive and it descends from these lists of potential most recent common ancestors, even when the probability it has the true most recent common ancestor is relatively small, like 30 or 40%.

Whereas the traditional approach waits till you see the common ancestor and that, you know, 100%. So that's basically why our policy works well. And in terms of the actual

results. Again, we have 17 cases from the DNA Doe Project, which is a nonprofit. The paper's co-authored with the two co-founders of DNA Doe. We simulate each of the 17 cases 500 times, so we simulate 8,500 cases, and we find that

that our proposed strategy that uses the mathematics outperforms this benchmark strategy conservatively 10 times faster. So, for example, if you restrict the final family tree to have 7,500 people in it, which is a pretty large tree. That's a pretty big tree. The benchmark strategy has only solved 4% of the cases,

Whereas our proposed strategy has already solved 94% of the cases. Now, one of the things you said as a specification early on is you wanted to minimize the work of the genealogists. So does this algorithm just run the math and get an answer? Or does it have to stop at certain points and ask for some human help? Like, I don't even know what it would be. But is there a time at which it stops and says, I need an extra bit of information in order to proceed?

No. At each point in time, it's all purely algorithmic. It's basically saying we keep track of this huge system state that's in the

Order of a million and we keep track of all this information for each possible action we can take it we essentially compute kind of a benefit cost ratio and then we make the decision whether it's investigate a particular new match or descend from a particular list of potential most recent common ancestors without any

And so what is the typical output that you would present to law enforcement? So you have they gave you the sample. You went over to the database. You got your seven thousand people. You did your mathematical magic ascending and descending appropriately. What is the output that you then hand to police force or to the interested party? And what do you tell them about that, about what you're about to hand them?

Right. So we haven't got there. I'm hoping I can come back in a year or two and give you an answer to that and say this is what... So right now, the paper simulates 8,500 cases and says we can do this much faster. Now what we're trying to do is create an interface between the output from GEDmatch and our algorithms to do this in iterative fashion. So they would give us...

the output from GEDmatch, say, you know, we get the output, we'd say, okay, first investigate match number 37. Okay, they would investigate it, the actual genealogist, they would tell us this is, you know, we couldn't figure out who they were, we could or, you know, we found I see. Okay. So that's when they start looking up marriage records and birth certificates. And so right, and they do that. Okay, by themselves, then they would, you know, ideally, they would then

input to our algorithm. This is how we found we would change the system state as a result. Then we would tell them, okay, do this next. And then they would do that next. Okay. So in a funny way, the system is learning. I'm not going to call it a mistake, but it's learning from the extra information that the genealogists learn to then come up with the next best guess and the next best guess. Right. So it's like any, um,

decision problem over time and under uncertainty, say a driverless car. Right. Keeping track of a complex system state. You make a decision in a microsecond. You get new information about the environment. You update your decision. So it's the same thing. Great. We're doing here. This is the future of everything with Russ Altman. More with Larry Wine next.

Welcome back to the Future of Everything. I'm Russ Altman, and I'm talking with Professor Larry Wine from Stanford University about forensic genealogy. In the last segment, Larry told us about this mathematical algorithm that markedly improves the efficiency of analyzing a long list of potential distant relatives to figure out who is the most likely victim.

culprit and DNA match. In this segment, Larry will tell us that genealogists don't necessarily worry about losing their jobs because of his algorithm. In fact, they use some very different sources of data to contribute to the process.

He'll also tell us that there are pretty thorny ethical and privacy issues that must be addressed and that his team is working on. I wanted to ask, are we putting people out of, is this algorithm putting people out of jobs? So you describe these genealogists who do a lot of work in order to trace these cases now. How do they respond to this new technology?

Well, I would say that genealogy is as much an art as a science. And a few of these genealogists have put in their 10,000 hours and they're expert at it. And they use some information that we don't use in our analysis, in our mathematical problems. So in particular, they use geography and occasionally they use ethnicity.

So, for example, one of my co-authors, Talon Wasari, where, you know, all the relatives were in one state like Arizona. And then, you know, they found one person from a family tree that was in Wisconsin. They just couldn't figure it out. Then they realized this person had died.

traveling for a few months in Arizona and had an extramarital affair with someone and that was, you know, that kind of broke the case. And another... Interesting. When they had a family tree that everyone was the same ethnicity and then they had one...

one little spattering of Greek people from Greek origin and then they kind of found this one person that allowed them to see the needle through the haystack. So I view our algorithm, our proposed strategy as something that can help them when they're stuck because often

you know, if they're onto some hot lead and using geography, ethnicity, whatever, but otherwise they're kind of just taking the next one off the list and just going back as many generations as they want. And this would help them decide what to do next because these trees are huge. There's no way any person, even if they put in their 10,000 hours to,

you know, wrap their head around a tree this size. So that's great. So they're thinking of this as a power tool that they can use among other tools to just do their job a little bit better. Right, right. Great. So the other thing I wanted to ask is all of this genetic stuff always brings up issues of ethics, privacy. Do those issues come up in this kind of work?

Yeah, certainly when the Golden State Killer case occurred, you know, these companies were not telling people that they allowed law enforcement. So that was a big issue. Then, you know, GEDmatch moved to an opt-in system. But there's another aspect here that hasn't been investigated. In fact, we're in the middle of researching right now, and that's something called target testing.

So what's sometimes done is, you know, you have this huge family tree, you're in the middle of this investigation and you're stuck. You can't find the person.

And then, but you see these people in the family tree, you think are, might be related to the killer, but you're not sure. They're not people who use GEDmatch. These are just people in law enforcement. So the genealogist just found these out during the course of their work, looking at baptism and marriage and whatever certificates they said, Hey, this person exists, but they're not in any of my databases. Right. And then law enforcement literally knocks on their door.

and said, "We're in the middle of this murder investigation. We have reason to believe you're a distant relative of the murderer. Will you give us DNA?" And after I gave a talk at their big conference, there's a scientific working group run by the head of FBI's forensic genetic genealogy invited me to give a talk at this working group, and there was a bioethicist there, and she said,

This is really an invasion of privacy and unethical in her view. And there's been no study of understanding what's the benefit of this. I mean, some idea of the cost. So we've enlarged our model to allow this as an option to do target testing and to see how much it improves the efficiency of

And we don't have our results yet, but we are looking into this. So just to be clear, the privacy that the ethicist was worried about is the – there's a lot of people involved here. There's the privacy of the relative who just had the door knocked on and all – they're just going about their business. And now a representative from law enforcement wants to take their DNA and they need to know what are you going to do with it? Is it just for this crime? Are you going to keep it forever? All those issues come up. And what if I say no? Right. Right.

There's perhaps a less compelling issue about the privacy of the target or the potential criminal, and I think mostly we're not worried about that for the most part. But then there's, you know, whenever you test the DNA of a single person, all of their first degree, second degree, third degree relatives are now also partially in the database because they all share partial DNA.

So the other thing has to do with you might not be related to the perpetrator who we're looking at now, but there might be other relatives in your family that you've just made it easier to capture. And people might have different views about how much they wanted to do that. Right, right.

And they will feel pressured when, you know, a police person. Right. Right. And then the issues of how much this is a free choice and stuff will be. So are you looking when you say you're working on a paper, are you looking at the ethical aspects of it or are you looking at technical means to mitigate the ethical issues?

Or both. I view it as providing input for the ethicists and the policymakers to decide, do we need it? Because this is all the Wild West here. This is a brand new approach that's revolutionizing the solving of cold case murders. We hear about it all the time in the news. This seems to be happening increasingly. So I'm hoping just to shed light and to allow them to have information to make an informed decision.

So just to ask a somewhat obvious question, is it true that law enforcement is knocking on doors today or are they waiting to hear the results of your work? No, no, they are knocking on doors. In fact, I just heard anecdotally from someone who,

who would know better than anybody else in the field, they know of a case where they knocked on 20 doors for just one case, for just one murder case. Okay, so the doors are being knocked on and so now we're trying to figure out what's the right way to approach this and manage it. Well, in the last couple of minutes, I wanted to ask you, you're a professor in the School of Business at Stanford. What got you interested in something so specific as forensic genetic genealogy?

Right. So I teach operations management, and I have a PhD from your School of Engineering in the old operations research department that's now in management science and engineering. So I view the world as an operations person. After about seven or eight years into my career, I kind of switched into public health and public policy issues. After the September 11th attacks, I

I really focused my efforts for a number of years on the catastrophic threats, nuclear, smallpox, anthrax, botulinum toxin. And I looked at the issue, the US visit program of keeping terrorists out of the country using the fingerprint system at the airport. And I have a paper in Proceedings National Academy of Sciences and testified before a congressional committee

And that led to a switch from a two to a 10 finger system. So biometrics, I think, is a really interesting area, at least for someone like me. There's probability, there's statistics, there's optimization. Sometimes there's a game theory aspect. Sometimes there's an operations aspect that kind of plays to, you know, my general training. And they're certainly impactful in terms of societal challenges. Right, right. So that led me...

you know, years later to, you know, during the last five years or so to focus on crime. So I looked at ballistic imaging. So this, instead of fingerprints, this is now you have a database of photographs of cartridge cases that come out of the gun. So it can help solve gun crimes. And my work there, it was with the Stockton police department, but led to the, um,

nationwide adoption of 100% processing of cartridge cases. Then I looked at papers on sexual assault kits, which again, you're looking at DNA to try to, and there's hundreds of thousands of untested kits. And our paper was the first

to do a cost-benefit analysis. We estimated that for every dollar spent on a sexual assault kit, you could save about $81 in the cost associated with a future sexual assault that's averted by testing the kits.

Yeah, and that the issue of the sexual assault seems to be very closely related to this forensic genealogy because, right, you have to, often there will be a specific accused person, but you still need to do the genetic, the DNA sequencing or the DNA genotyping. You still need to do the matching and then you have to figure out how compelling the evidence is. Is that, are we making progress in that area? Because we've heard about this in the news every now and then about, you know, as you said, thousands and thousands of unprocessed kits. Yeah.

It seems like a real justice issue here. Yeah, yeah. And there's a lot of money being thrown at it. Like anything else, more money could be thrown at it. The national backlog could be processed quicker. But there certainly has been steady progress. Thanks to Larry Wine. That was the future of forensic genealogy.

Thanks for tuning into this episode. We have more than 250 episodes in our archive, so you have instant access to a broad range of discussions on an amazing variety of topics. If you're enjoying the show or if it's helped you in any way, please consider rating and reviewing it. That'll help us understand what you're experiencing with the show and it'll help get the word out. You can connect with me on X or Twitter at RB Altman and you can connect with Stanford Engineering at Stanford ENG.

If you'd like to ask a question about this episode or a previous episode, please email us a written question or a voice memo question. We might feature it in a future episode. You can send it to thefutureofeverything at stanford.edu. All one word, the future of everything. No spaces, no underscores, no dashes. The future of everything at stanford.edu. Thanks again for tuning in. We hope you're enjoying the podcast.

We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Export Podcast Subscriptions