We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Rethinking Race in Prenatal Screening for Open Neural Tube Defects

Rethinking Race in Prenatal Screening for Open Neural Tube Defects

2025/3/28
logo of podcast JAMA Medical News

JAMA Medical News

AI Deep Dive AI Chapters Transcript
People
D
Daniel Herman
Topics
Daniel Herman: 我参与了一项关于重新评估产前筛查开放性神经管缺陷中种族因素作用的研究。我们使用α-胎儿蛋白(AFP)浓度进行筛查,传统方法会根据种族对AFP值进行调整。然而,种族是一个社会建构,其在医学测试中的应用需要有力的证据支持。我们的研究回顾了宾夕法尼亚大学医学中心三年来7000名孕妇的数据,发现黑人孕妇的AFP浓度平均略高,导致假阳性率略高。但这种差异很小,在绝对数值上,如果取消种族调整,每170名患者中黑人患者只会多出一个假阳性结果。鉴于我们对种族调整对筛查灵敏度的影响以及黑人患者AFP浓度较高的机制尚不清楚,我们认为没有足够的证据继续进行基于种族的调整。我们已经改变了我们的实践,现在不考虑种族来计算AFP中位数,下一步将完全从检测流程中移除种族因素。未来的研究方向包括探究种族调整对筛查灵敏度的影响,以及是否存在潜在的遗传或环境因素导致黑人患者AFP浓度较高,以及寻找更有效的生物标志物。 Roy Perlis: 作为JAMA+ AI的主编,我与Daniel Herman博士讨论了他最近发表在JAMA Pediatrics上关于重新评估产前筛查开放性神经管缺陷中种族因素作用的研究。这项研究具有重要的临床意义,因为它直接关系到临床实践中是否应该继续使用基于种族的调整方法。研究结果表明,虽然黑人孕妇的AFP浓度平均略高,但取消种族调整对假阳性率的影响很小。更重要的是,目前尚不清楚种族调整对筛查灵敏度的影响,以及黑人患者AFP浓度较高的潜在机制。因此,在缺乏更多证据的情况下,研究者建议取消基于种族的调整,并呼吁进一步研究以探索潜在的遗传或环境因素,以及寻找更有效的生物标志物。这对于AI模型的开发和应用具有启示意义,即在构建复杂的AI模型时,需要有良好的结果指标,并评估模型对不同患者群体的临床结果的影响,避免引入或复制偏差。

Deep Dive

Chapters
This chapter explains the clinical context of prenatal testing for open neural tube defects, focusing on the measurement of alpha-fetoprotein (AFP) concentrations and the traditional incorporation of race into the interpretation of these measurements. The historical context of these methods is also discussed, including the variability across studies and the limited sample sizes.
  • Open neural tube defects affect roughly 1 in 1400 pregnancies.
  • Prenatal screening involves measuring AFP concentrations.
  • Race has been traditionally incorporated into the interpretation of AFP concentrations due to observed average differences.
  • Early studies on this topic were conducted in the late 1970s and early 1980s.

Shownotes Transcript

Translations:
中文

Welcome to JAMA AI Conversations. I'm Roy Perlis, Editor-in-Chief of JAMA Plus AI, and I'm pleased to welcome today's guest, Dr. Daniel Herrmann from the Department of Pathology and Laboratory Medicine at the Perelman School of Medicine, University of Pennsylvania.

Today, we'll be discussing his recent study, Reassessing the Inclusion of Race in Prenatal Screening for Open Neural Tube Defects, published in JAMA Pediatrics. Dr. Herman, thanks for joining us today. It's a pleasure to be talking with you, Dr. Perlis. Thank you for the opportunity. So before we get into the specifics of your study, can you give us a little bit of clinical context? So when and how is this kind of prenatal testing used?

Sure. So, open neural tube defects affect roughly 1 in 1400 pregnancies and one of the tools that we have for screening for these is measuring AFP concentrations or alpha-fetoprotein concentrations. This is done early in the second trimester, prenatally, and the way it works is we collect specimen from a pregnant individual, we measure the AFP concentration, and we compare it to the values that we expect.

Got it. So you have sort of a set of standards. Where does race traditionally fit into this kind of model?

Sure. The reason why we have these comparisons is the main reason is that AFP concentrations rise rapidly during the second trimester. And so we can't use a single interpretive threshold at, say, 15 weeks gestational age as compared to 20 weeks gestational age. And so that's why a more complex interpretive strategy was used than for most laboratory tests.

After that was done, it was realized that there were other associations with AFP on average, and one of those was race. And so conventionally, the adjustment for gestational age is done in a race-specific way. And where did that come from? So when did people sit down originally and build these models? Was this like 30 years ago or five years ago?

The initial studies were done in the late 70s, and the statistical methods that were used for this were developed in the late 70s and early 80s. And so we've been using this for second trimester screening for decades and decades. And the associations that were identified across race were

were observed at the same time. And there have been many studies over the decades. There's a lot of variability across those studies. Most of them tend to be small. Most tend to have small number of samples. They tend to be geographically localized. Many of them show associations on average across race. Some of them don't. There's a lot of variability across these historical studies.

It's interesting that this goes back so many decades, I think, like a lot of the models that people have taken for granted. In this case, you looked at the inclusion of race in these screening models. What motivated you to ask the question in the first place? Our narrow question was, in our practice at Penn Medicine, do we want to continue making this race adjustment?

So as I said, there's a history of observing on average a difference in AP concentrations in Black pregnant patients as compared to others. And so we knew in our clinical practice that we were following the guidelines, which were to make this adjustment.

And we also know that race is a social construct and it's inaccurate and it's imprecise. And we think that there should be a high bar of evidence for incorporating race in a clinical practice systematically like this. There had been a recent study at the University of Washington

that was trying to reevaluate this and saw that the association with race in their population appeared to disappear after adjusting for other patient factors.

And because of how much variability there is across historical studies, we weren't sure what made sense for our patient population. And so we wanted to look back and understand whether we saw the same association and think more deeply about if we put that in the larger context, did it make sense to continue this practice?

So if I understand you correctly, this was really, should we keep doing this? This was really a clinically focused, you know, quality improvement. Should we be doing this? Which is interesting because it's different from a lot of modeling studies where the research question comes first. In this case, it sounds like it was a very pragmatic question.

That's fair. I mean, we knew that we were adjusting for race. And we knew that looking back at the literature, it wasn't clear from the evidence that was out there that this made sense. And so we wanted to better understand in our patient population what

what the impact of that race adjustment was and how to compare the impact of that to the harms of race-based medicine in general, and to better understand exactly, well, if we were to remove the race adjustment, how would it change outcomes in patients?

And I guess this is probably the point where I should ask you, what did you find? Sure. So we looked back at data from Penn Medicine in our clinical practice over three years at 7,000 pregnant patients' studies. And what we found was consistent with some other studies that on average, Black patients had slightly elevated alpha-fetoprotein multiples of the median, about 8% higher on average, and

And when we traced that through the process and applied the standard interpretive thresholds, we found that there was a slightly higher frequency of false positive interpretations in Black patients as compared to others when we were using a race agnostic model as compared to a race adjusted model.

We can look at those associations in different ways, but on an absolute scale, the difference in false positive rates if we were to move from a race-adjusted model to a race-agnostic model was 0.6%, meaning that moving from a race-adjusted to race-agnostic model, we'd expect to have one more false positive in Black patients compared to others for every 170 patients tested.

So I like the way you're framing that. It's helpful to think in terms of absolute numbers. Maybe not a fair question, but going into the analysis, did you have an idea of what the threshold would be where you would want to retain race in the models? Like, is there a number, you know, if you said it was a 20% false positive rate, would you be framing it differently?

That's a good question. And our thinking about this evolved as we analyzed the data, as we thought more deeply about the consequences of this. I mean, there are lots of different ways of asking a question about fairness.

The reason why we historically have this race adjustment is that on average, Black pregnant patients have slightly higher concentrations, it seems. And the epidemiological evidence doesn't suggest that fetuses for Black patients have a higher frequency of open neural tube defects. So if everything else were equal,

it makes a lot of sense to minimize the number of false positives and to equalize the amount of false positives across patient groups. As we thought about more deeply, though, the screen positive rate and the false positive rate is only one of the factors here. The purpose of this test is to screen for open neural tube defects. And one of the things that's unknown is it's unknown how this race adjustment affects sensitivity for open neural tube defects.

Exactly how to balance an effect on sensitivity and false positive rates is an important question. But here, we don't actually know how the race adjustment affects sensitivity for open neural tube defects. So in the absence of that evidence, there's nothing to balance. We need more studies that can point to that question, that can tell us about, does the race adjustment affect sensitivity?

and tell us about outcomes in these patients, which is what's most important. And I guess the important notion here is that probably that trade-off depends a lot on the context. So in this case, clinically, I think you make the point in your paper, there is a follow-up for a positive test. Is that right?

Exactly. So if a patient screens positive, what is done now is the follow-up testing is non-invasive ultrasound, which is diagnostic in the vast majority of cases. And so, I mean, I wouldn't wish a false positive on anyone and false positives in this setting can lead to considerable anxiety and it leads to downstream testing.

But that harm is mitigated in part because diagnostic ultrasound is readily available and can be performed soon after getting a screen positive result. Got it. One of the things I should point out here is this is a podcast about AI. A lot of times we're talking about new AI applications.

But I'm wondering, this is a fairly straightforward question that you're trying to address about race and the role of race in models. But since this is an AI podcast, do you think there's something to take away from this for people who are building these really complicated AI models and have a choice of either incorporating race or not incorporating race in what they're doing?

Yeah, it's a great question. I mean, zooming out from here, our particular question here focused on prenatal screening. It's different from that general case in that the model is very, very small. It's transparent and we can understand explicitly the difference between including race or not including race. One of the takeaways, I think, is it's really important to have good outcome measures. And if you think more broadly, being able to assess for a particular individual

application of AI for a particular clinical question, how it affects different groups of patients for processing clinical outcomes that we think are most important for that.

And it's important to do that at the time that you're developing. It's also really important for us to develop tools that would allow us to do this practically in practice, to monitor and to say, well, if I brought this tool into practice, did it change the frequency of this important clinical outcome across patients? Here, the model is much more explicit.

And we can think about how the incorporation or non-incorporation of race can affect the performance and the downstream clinical outcomes in patients grouped on race.

With AIA models in general, we know there's a lot of bias that's gone into training of these models, and it's recapitulated as these models are used. And so I think the first principle is to be able to assess that well at the development stage and implementation stage. And then we as a community need to be spending the time to understand that

and to formulate what we think are appropriate fairness goals for individual questions and how to formulate those and how to apply those values, which come up from a diverse set of stakeholders as we are training, applying AI models, and then monitoring those models.

So in the case of this particular model or this particular test, did it lead you to change your practice? Is Penn changing how it applies these thresholds now? Yes. We're going through a two-step approach to this. So overall, we see that there is a small difference in false positives rates, but because of what we don't know about the effect on sensitivity, and we don't know what the mechanism whereby Black patients have a higher AFP on average is,

we don't think there's enough evidence to continue adjusting for race in this practice. And so right now we have changed our practice and we are calculating the medians agnostic of patient race. Our next step is we're actually building a new application that will allow us to completely remove race from the ordering and resulting process and will allow us to incorporate our new methods and

And will also allow us to improve the way information is communicated and the ordering process and the resulting process. And so right now we are now calculating risk and AFP multiples of the mediums in a way that's race agnostic. And our next step is going to be removed entirely from the ordering and resulting process.

So 50 years after these kinds of tests were first developed, you're going to have a new iteration that hopefully takes advantage of newer data and newer technologies. Yes. Not much has changed. The core of the test is the same. As part of the study, we actually updated the methods. And so we said, well, instead of doing serial adjustments for gestational age and then wait, well, let's use a multivariate model. Instead of grouping patients and

And having to look week by week, we were using an explicit quantile regression approach to estimate the medians. This is not the best that one can envision doing. It'd be better to have a method that doesn't have this association, this bias on average. We'd like to understand the mechanism and try to understand why we see this bias and

It'd be better if we could incorporate new biomarkers. And there is a little bit of preliminary evidence that specific forms of AFP that have differential glycosylation might be a better biomarker in general. We don't actually know whether those different variants of AFP, how they're associated across race. And so there's opportunities and things in the community that we should be moving towards that could improve this approach on average.

and improve the equity in using this test. But this is our step that we can take right now. We can remove race from the calculations and we can try to do additional studies to ask the question about sensitivity, to do additional studies to ask the question, are there underlying genetic factors? Are there underlying environmental factors? Is there a better analyte that we could be measuring?

So, more work to be done. Dr. Herrmann, thanks again for talking to us about your study in JAMA Pediatrics. To our listeners, if you want to read more about this study, you can find a link to the article in the episode description. To follow this and other JAMA Network podcasts, visit us online at jamanetworkaudio.com or search for JAMA Network wherever you get your podcasts. This episode was produced by Daniel Musisi at the JAMA Network.

Thanks for joining us, and we'll see you next time.