We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Prescreening for Clinical Trial Eligibility Using Large Language Models

2025/3/14

JAMA Medical News

AI Deep Dive AI Chapters Transcript

People

Alexander J. Blood

Yulin Xuan

Topics

Yulin Xuan: 我关注到贵团队的研究表明，使用AI辅助筛选工具能够缩短确定临床试验资格的时间。 Alexander J. Blood: 我们的研究确实表明，使用名为Rectifier的AI辅助工具，能够显著缩短确定患者是否符合临床试验资格的时间。该工具利用大型语言模型显著提高了自然语言处理能力，能够更有效地解析数据并回答临床问题。传统上，临床试验的筛选过程非常耗时且繁琐，需要研究人员手动查阅病历，验证患者是否符合纳入和排除标准。而Rectifier工具能够自动化这一过程，将大量时间释放出来，让研究人员能够专注于患者招募和护理。该工具的核心在于它能够处理非结构化数据，无论数据的格式或来源如何，都能帮助研究人员快速准确地回答临床问题。在我们的随机对照试验中，AI组的筛选效率比传统手动组高出两倍多，并且在10天内，AI组的患者入选率也显著高于手动组。这表明AI工具能够显著提高临床试验的效率。重要的是，我们的研究采用的是“人机协同”模式，即AI工具进行预筛选，所有患者仍需经过研究人员的最终审核。AI工具的作用是快速准确地提取病历中与纳入和排除标准相关的关键信息，辅助研究人员做出判断，从而提高效率。虽然本研究是在单一中心进行的，并且关注的是心力衰竭患者，但我们相信该工具具有广泛的适用性，可以应用于其他临床环境和疾病。大型语言模型是该工具的引擎，但其周围的预处理和后处理步骤，使其在经济上、计算上都具有可持续性和效率，同时保持准确性。我们期待在未来的研究中验证这一点。关于AI工具是否会漏掉合格的患者或招募过多的不合格患者，我们的研究结果显示，AI工具的假阳性率与手动审核组相似，甚至在之前的研究中，AI工具在评估患者资格方面比人工审核更准确。目前，我们正在将该工具推广到机构内的其他研究团队，并希望能够在更多中心和疾病领域验证其有效性，从而进一步加速临床研究进程，造福更多患者。 Alexander J. Blood: 临床试验的成功很大程度上取决于招募到符合特定纳入标准的患者。制定纳入和排除标准是为了确保研究结果的有效性和代表性，使研究结果能够更好地反映现实世界中药物或设备的实际效果。 AI工具的出现为临床试验的患者招募带来了新的机遇。通过自动化和加速筛选流程，AI工具能够显著提高临床试验的效率，缩短研究周期，并最终造福更多患者。我们的研究结果表明，AI辅助筛选工具不仅能够显著提高筛选效率，还能提高患者招募率。这主要是因为AI工具能够快速准确地识别符合纳入标准的患者，从而节省了研究人员的时间和精力，使他们能够专注于患者招募工作。虽然我们的研究是在单一中心进行的，但我们相信AI辅助筛选工具具有广泛的适用性，可以应用于多种临床环境和疾病。我们正在积极探索将该工具推广到其他中心和疾病领域，以进一步验证其有效性和适用性。在未来的研究中，我们将继续改进AI工具，使其能够更好地满足临床研究的需求，为临床试验的开展提供更强有力的支持。

Deep Dive

Chapters

This chapter explores the use of AI in clinical trial patient recruitment, focusing on how the Rectifier tool improves efficiency by automating the manual chart review process and ensuring the right patient population is studied. The importance of inclusion and exclusion criteria in validating trial results is highlighted.

AI-assisted screening reduced time to determine trial eligibility
AI improves natural language processing to parse data and answer clinical questions
The Rectifier tool automates manual chart reviews
Inclusion/exclusion criteria ensure the right patient population is studied

Shownotes Transcript

Translations:

中文

I'm Yulin Xuan, Associate Editor of JAMA and JAMA Plus AI, and you're listening to JAMA AI Conversations. My guest today is Dr. A.J. Blood, Cardiologist at Brigham and Women's, Associate Director of the Accelerator for Clinical Transformation Research Group, and an Instructor of Medicine at Harvard Medical School. His research focuses on the intersection of cardio, metabolic disease, and implementation science and data science. Welcome, A.J. Thanks so much for having me.

Your study, Manual versus Assisted Prescreening for Trial Eligibility Using Large Language Models, a Randomized Clinical Trial, was recently published in JAMA. It showed that AI-assisted screening using the rectifier tool reduced the time to determine eligibility compared to manual methods.

So what specific features do you believe the AI had that were critical to this improvement? Yeah, it's a great question. As we look at a lot of the processes that go into screening for clinical trials, both for inclusion and exclusion criteria, we found that large language models have significantly improved upon natural language processing and their ability to

to parse data and ask and answer clinical questions. So the Rectifier tool really was designed to make that really easy and straightforward and scalable. And so what it does is it takes unstructured data, regardless of the format or the output from the electronic health record, and really enables researchers to ask and answer questions of it.

So traditionally, when you're screening for a clinical trial, it's a very manual process. You get a structured data query that allows you to find a pool of patients who could be eligible. And then research assistants or research staff are required to do manual chart reviews to verify eligibility prior to inviting any individual patient to participate in a clinical trial.

And so what this tool was really designed to do was to take that very time-consuming and often tedious process of checking the medical chart against each and every inclusion and exclusion criteria to assess for eligibility and automate that and really be a level up to a lot of our research staff so that they can focus their time on enrolling patients and taking care of patients as part of our trials.

Can you tell our audience why it's important to recruit patients for trial with certain inclusion criteria? Yeah, absolutely. So whenever a new drug or device or questions are being asked and answered in medicine and health, the best way from a data and validity standpoint to ask and answer those questions is in the context of a clinical trial.

And the way that we do that is through protocols. And those protocols really specify the intervention as well as the patient population that that question is being asked about and answered through. So in creating inclusion and exclusion criteria, we're really looking to make sure that the patients that we're studying in any given trial are going to reflect and represent the patients in the real world out in routine clinical practice and regular care who are being asked that clinical question of.

So by ensuring that when we're screening patients that they meet the eligibility criteria for a trial, and by that I mean the inclusion and excluding criteria, we really want to make sure that this study is asking and answering questions of the right patient population that is going to be representative of what we expect the drug, device, or new innovation to do in the real world.

It's a lot more of a difficult task and challenging than people think. And the level of importance is very high to be able to really validate your results. What was seen is that the trial data received a dramatic difference in the number of patients remaining to be screened between the AI group, which was 37 patients, and the manual group, which was 887 patients.

So how do you interpret this disparity? So what does it suggest about the potential of AI to be able to streamline and how was it that AI missed much less or manual did not count as many patients?

for being screened? It's a really great question. And again, the way that I like to make sure we couch this is we consider this a pre-screen. And the reason being that this was kind of a human in the loop model. So every patient still received a screen from a research study staff before they were deemed either eligible or not eligible for the trial. And what the tool really allowed for was a much more rapid and comprehensive approach

Pre-review of the chart with the data that was pertinent to each individual inclusion and exclusion criteria being really brought straight to the research staff so that they could ask and answer those questions much more quickly. So importantly, this was blinded, randomized trial, and this really happened in two stages. The traditional care group, once they had a structured data query, they went into a manual review process.

by reading the chart and answering the inclusion/exclusion criteria questions as they normally would, by looking to the appropriate sections of the chart and notes to make sure that those criteria are either met or not met for inclusion/exclusion criteria, respectively. For the AI-enabled, the AI tool is able to review the chart, ask and answer those same inclusion/exclusion criteria questions

pull out the relevant aspects of clinical notes or data that would answer those questions and really put that right in front of the research coordinator. And what that allowed for was them to much more rapidly assess a patient's eligibility or ineligibility for the trial that was the use case that we published on here.

And really importantly, not only did we significantly increase the eligibility assessment or how quickly we were able to find patients who were eligible for the trial, which was our primary outcome, which overall was more than two times faster. But if you look at 10 days from randomization, the manual team was at about 2.5%, whereas the AI-enabled team was at about 20%. So a significant increase in the efficiency rate.

of finding patients eligible for the trial. Our pre-specified secondary out point and our hierarchical win ratio actually showed that enrollments increased as a result of the use of the AI tool in the trial. So that was something that we were hopeful for, but not certain of, because again, this really is a pre-screening tool. And what it did was made the teams so much more efficient

that because the time was balanced between each group in terms of how much time they were allocated both to screen and enroll patients, we found an almost 2x increase in enrollments in the trial in addition to that eligibility determination being significantly accelerated.

Can you explain for our listeners what human in the loop means? Yeah, so human in the loop is while the AI tool is able to perform autonomously, it has a set of tasks that it can complete with minimal human input, just really uploading the appropriate data sets to capture data.

the inclusion and exclusion criteria and the right patients to ask those questions of. Human in the Loop really ensures that as opposed to a completely autonomous system in which an AI model, an AI agent, or a computer system of any sort is able to complete processes, start to finish totally on their own, Human in the Loop kind of puts a gatekeeper on that and allows for a human participant or a manager or a really influence to step in and

and act as a confirmation check, to act as a gatekeeper, to make sure that at a certain stage of the process, this is not being completely autonomized. This is something that there is a human touch that allows progress to the next step. For the purposes of this study and for this use case, ensuring that prior to a patient being contacted or outreached for solicitation to participate in the clinical trial, really ensuring that those inclusion and exclusion criteria

are appropriate, are met, and this person is appropriate to reach out to to participate. So your study obviously found really great results with AI-assisted screening, but it was, if I'm correct, conducted within a single center and focused on heart failure. Do you think that this AI-assisted screening could work across multiple clinical settings and diseases?

The short answer is yes. So Mass General Brigham is a single site, about 14 or so hospitals that were part of the trial that we conducted. It's a really large mega site that we have here at Mass General Brigham. And to answer the question about heart failure,

While that was the use case and the paradigm through which we studied this tool, there are actually no specifications or fine tuning or special tuning to the model that really made this a heart failure study specific tool. In fact, the system that we surrounded, right, large language models are the engine that powers the tool, but there's a lot of pre and post processing that actually makes it

economically sustainable, computationally sustainable, and efficient while remaining accurate. So those steps we feel are actually externally valid to many other disease areas and are likely to be so to many other health systems and research environments. And we look forward to proving that out with additional study. So my other question is, do you think AI missed anything or maybe...

are recruiting too many false positives in terms of being too wide in terms of inclusion criteria? We didn't see that in our experience, again, in this single study, single use case, that it was bringing in higher rates of false positive eligible patients than our manual review. So that was really encouraging that those rates were overall very similar. In a prior study we published in the New England Journal AI, we actually showed the retrospective study using the same tool

We were actually more specific, more accurate with the Matthews correlation coefficient that outperformed our human study staff in assessing patient eligibility. So I think we're encouraged, and really this prospective blinded trial was really to add on to that experience we published in the retrospective data. We're really encouraged that prospectively the metrics seem to bear out.

But I do think it warrants further study, both in external validation and other disease areas, as well as many other sites where we hope to recapitulate many of these results. And do you think that this tool could be used immediately to start recruiting for clinical trials? Is there an evolution to this you see?

I think there's always ways you can make a system or a tool better. And we're in the process of doing that every day here at Master & O'Brigham. We are, in our system, starting to beta test this to additional researchers and research groups across our institution with the hopes and intent that this is going to scale broadly across our enterprise. But very much excited to talk to friends, colleagues, partners, and stakeholders.

also externally validate this research to demonstrate in additional centers and other disease areas that we continue to see really promising results that can actually accelerate research, both for clinicians, for patients, and really for the healthcare community more broadly. Thank you very much for this conversation. We really appreciate being here. My pleasure. Thanks so much for your time. Have a great day.

I am Yulin Chuan, Associate Editor at JAMA and JAMA Plus AI, and I've been speaking with Dr. AJ Blood on the role of AI-assisted for pre-screening for clinical trial eligibility. You can find a link to the article in this episode's description. And for more content like this, please visit our new JAMA Plus AI channel at jamaai.org.

To follow this and other JAMA Network podcasts, please visit us online at jamanetworkaudio.com or search for JAMA Network wherever you get your podcasts. This episode was produced by Shelley Steffens at the JAMA Network. Thanks for listening. This content is protected by copyright by the American Medical Association with all rights reserved, including those for text and data mining, AI training, and similar technologies.

Prescreening for Clinical Trial Eligibility Using Large Language Models 12:06 Share

JAMA Medical News

Deep Dive

Shownotes Transcript

Prescreening for Clinical Trial Eligibility Using Large Language Models