I'm Yulin Xuan, Associate Editor of JAMA and JAMA Plus AI, and you're listening to JAMA AI Conversations. My guest today is Dr. A.J. Blood, Cardiologist at Brigham and Women's, Associate Director of the Accelerator for Clinical Transformation Research Group, and an Instructor of Medicine at Harvard Medical School. His research focuses on the intersection of cardio, metabolic disease, and implementation science and data science. Welcome, A.J. Thanks so much for having me.
Your study, Manual versus Assisted Prescreening for Trial Eligibility Using Large Language Models, a Randomized Clinical Trial, was recently published in JAMA. It showed that AI-assisted screening using the rectifier tool reduced the time to determine eligibility compared to manual methods.
So what specific features do you believe the AI had that were critical to this improvement? Yeah, it's a great question. As we look at a lot of the processes that go into screening for clinical trials, both for inclusion and exclusion criteria, we found that large language models have significantly improved upon natural language processing and their ability to
to parse data and ask and answer clinical questions. So the Rectifier tool really was designed to make that really easy and straightforward and scalable. And so what it does is it takes unstructured data, regardless of the format or the output from the electronic health record, and really enables researchers to ask and answer questions of it.
So traditionally, when you're screening for a clinical trial, it's a very manual process. You get a structured data query that allows you to find a pool of patients who could be eligible. And then research assistants or research staff are required to do manual chart reviews to verify eligibility prior to inviting any individual patient to participate in a clinical trial.
And so what this tool was really designed to do was to take that very time-consuming and often tedious process of checking the medical chart against each and every inclusion and exclusion criteria to assess for eligibility and automate that and really be a level up to a lot of our research staff so that they can focus their time on enrolling patients and taking care of patients as part of our trials.
Can you tell our audience why it's important to recruit patients for trial with certain inclusion criteria? Yeah, absolutely. So whenever a new drug or device or questions are being asked and answered in medicine and health, the best way from a data and validity standpoint to ask and answer those questions is in the context of a clinical trial.
And the way that we do that is through protocols. And those protocols really specify the intervention as well as the patient population that that question is being asked about and answered through. So in creating inclusion and exclusion criteria, we're really looking to make sure that the patients that we're studying in any given trial are going to reflect and represent the patients in the real world out in routine clinical practice and regular care who are being asked that clinical question of.
So by ensuring that when we're screening patients that they meet the eligibility criteria for a trial, and by that I mean the inclusion and excluding criteria, we really want to make sure that this study is asking and answering questions of the right patient population that is going to be representative of what we expect the drug, device, or new innovation to do in the real world.
It's a lot more of a difficult task and challenging than people think. And the level of importance is very high to be able to really validate your results. What was seen is that the trial data received a dramatic difference in the number of patients remaining to be screened between the AI group, which was 37 patients, and the manual group, which was 887 patients.
So how do you interpret this disparity? So what does it suggest about the potential of AI to be able to streamline and how was it that AI missed much less or manual did not count as many patients?
for being screened? It's a really great question. And again, the way that I like to make sure we couch this is we consider this a pre-screen. And the reason being that this was kind of a human in the loop model. So every patient still received a screen from a research study staff before they were deemed either eligible or not eligible for the trial. And what the tool really allowed for was a much more rapid and comprehensive approach
Pre-review of the chart with the data that was pertinent to each individual inclusion and exclusion criteria being really brought straight to the research staff so that they could ask and answer those questions much more quickly. So importantly, this was blinded, randomized trial, and this really happened in two stages. The traditional care group, once they had a structured data query, they went into a manual review process.
by reading the chart and answering the inclusion/exclusion criteria questions as they normally would, by looking to the appropriate sections of the chart and notes to make sure that those criteria are either met or not met for inclusion/exclusion criteria, respectively. For the AI-enabled, the AI tool is able to review the chart, ask and answer those same inclusion/exclusion criteria questions
pull out the relevant aspects of clinical notes or data that would answer those questions and really put that right in front of the research coordinator. And what that allowed for was them to much more rapidly assess a patient's eligibility or ineligibility for the trial that was the use case that we published on here.
And really importantly, not only did we significantly increase the eligibility assessment or how quickly we were able to find patients who were eligible for the trial, which was our primary outcome, which overall was more than two times faster. But if you look at 10 days from randomization, the manual team was at about 2.5%, whereas the AI-enabled team was at about 20%. So a significant increase in the efficiency rate.
of finding patients eligible for the trial. Our pre-specified secondary out point and our hierarchical win ratio actually showed that enrollments increased as a result of the use of the AI tool in the trial. So that was something that we were hopeful for, but not certain of, because again, this really is a pre-screening tool. And what it did was made the teams so much more efficient
that because the time was balanced between each group in terms of how much time they were allocated both to screen and enroll patients, we found an almost 2x increase in enrollments in the trial in addition to that eligibility determination being significantly accelerated.
Can you explain for our listeners what human in the loop means? Yeah, so human in the loop is while the AI tool is able to perform autonomously, it has a set of tasks that it can complete with minimal human input, just really uploading the appropriate data sets to capture data.
the inclusion and exclusion criteria and the right patients to ask those questions of. Human in the Loop really ensures that as opposed to a completely autonomous system in which an AI model, an AI agent, or a computer system of any sort is able to complete processes, start to finish totally on their own, Human in the Loop kind of puts a gatekeeper on that and allows for a human participant or a manager or a really influence to step in and
and act as a confirmation check, to act as a gatekeeper, to make sure that at a certain stage of the process, this is not being completely autonomized. This is something that there is a human touch that allows progress to the next step. For the purposes of this study and for this use case, ensuring that prior to a patient being contacted or outreached for solicitation to participate in the clinical trial, really ensuring that those inclusion and exclusion criteria
are appropriate, are met, and this person is appropriate to reach out to to participate. So your study obviously found really great results with AI-assisted screening, but it was, if I'm correct, conducted within a single center and focused on heart failure. Do you think that this AI-assisted screening could work across multiple clinical settings and diseases?
The short answer is yes. So Mass General Brigham is a single site, about 14 or so hospitals that were part of the trial that we conducted. It's a really large mega site that we have here at Mass General Brigham. And to answer the question about heart failure,
While that was the use case and the paradigm through which we studied this tool, there are actually no specifications or fine tuning or special tuning to the model that really made this a heart failure study specific tool. In fact, the system that we surrounded, right, large language models are the engine that powers the tool, but there's a lot of pre and post processing that actually makes it
economically sustainable, computationally sustainable, and efficient while remaining accurate. So those steps we feel are actually externally valid to many other disease areas and are likely to be so to many other health systems and research environments. And we look forward to proving that out with additional study. So my other question is, do you think AI missed anything or maybe...
are recruiting too many false positives in terms of being too wide in terms of inclusion criteria? We didn't see that in our experience, again, in this single study, single use case, that it was bringing in higher rates of false positive eligible patients than our manual review. So that was really encouraging that those rates were overall very similar. In a prior study we published in the New England Journal AI, we actually showed the retrospective study using the same tool
We were actually more specific, more accurate with the Matthews correlation coefficient that outperformed our human study staff in assessing patient eligibility. So I think we're encouraged, and really this prospective blinded trial was really to add on to that experience we published in the retrospective data. We're really encouraged that prospectively the metrics seem to bear out.
But I do think it warrants further study, both in external validation and other disease areas, as well as many other sites where we hope to recapitulate many of these results. And do you think that this tool could be used immediately to start recruiting for clinical trials? Is there an evolution to this you see?
I think there's always ways you can make a system or a tool better. And we're in the process of doing that every day here at Master & O'Brigham. We are, in our system, starting to beta test this to additional researchers and research groups across our institution with the hopes and intent that this is going to scale broadly across our enterprise. But very much excited to talk to friends, colleagues, partners, and stakeholders.
also externally validate this research to demonstrate in additional centers and other disease areas that we continue to see really promising results that can actually accelerate research, both for clinicians, for patients, and really for the healthcare community more broadly. Thank you very much for this conversation. We really appreciate being here. My pleasure. Thanks so much for your time. Have a great day.
I am Yulin Chuan, Associate Editor at JAMA and JAMA Plus AI, and I've been speaking with Dr. AJ Blood on the role of AI-assisted for pre-screening for clinical trial eligibility. You can find a link to the article in this episode's description. And for more content like this, please visit our new JAMA Plus AI channel at jamaai.org.
To follow this and other JAMA Network podcasts, please visit us online at jamanetworkaudio.com or search for JAMA Network wherever you get your podcasts. This episode was produced by Shelley Steffens at the JAMA Network. Thanks for listening. This content is protected by copyright by the American Medical Association with all rights reserved, including those for text and data mining, AI training, and similar technologies.