Raj Manrai: 我们这项研究比较了领先的开源和专有大型语言模型在复杂诊断中的表现。最近,Meta等公司开发的开源模型在医学以外的任务上的表现已经显著提升。我们用来自马萨诸塞州总医院的病例记录评估了Meta的LAMA 3.1模型与GPT-4模型的表现。专有模型,例如OpenAI的ChatGPT,需要用户使用其界面,数据会发送到另一个平台进行处理。开源模型的权重是可用的,可以在本地运行,无需将数据发送到其他平台,从而保护数据隐私。开源模型可以在本地进行微调和定制,这对于医院来说非常重要。这项研究证明了开源模型在复杂诊断任务上的能力,这为利用本地数据进行研究和临床应用提供了可能性。开源模型的能力已经达到与领先的专有模型相同的水平,这使得利用医院本地数据进行实时推理和提供辅助诊断成为可能。需要进一步研究不同模型在不同任务中的适用性。需要研究不同规模的模型在不同医疗任务中的适用性。目前的AI模型仍然存在局限性,需要人类的判断来确保其输出不会被滥用。AI模型可以作为辅助工具,帮助医生更好地理解患者情况和价值观。医生使用AI工具进行辅助诊断是否合适,取决于具体情况和工具的用途。
Thomas Buckley: 这项研究证明了开源模型能够完成复杂的诊断任务。通过一些技巧,例如量化权重或蒸馏模型,可以在医院可部署的硬件上运行大型开源模型。随着模型效率的提高和尺寸的减小,未来可能在个人电脑上运行大型开源模型。医院是否应该部署本地模型取决于其目标,如果需要立即使用本地数据,则本地部署更合适;如果需要最佳性能,则可以使用API。医生使用类似的工具进行辅助诊断可以提高效率,但不能完全依赖于此类工具进行诊断。
Roy Perlis: (主要为引导性问题,未提出核心论点)
supporting_evidences
Raj Manrai: 'This has changed, I think, relatively recently, particularly with models that are produced by Meta and now a few others, the LAMA series of models, for example, that have really gotten much better on tasks outside of medicine.'
Raj Manrai: 'And so we sought in this study to take these challenging cases further. from the case records of the Massachusetts General Hospital, also known as the CPCs, or the Clinical Pathological Conferences, published by the New England Journal of Medicine, and evaluate just how good one of the larger, newer, open-source models from META, the LAMA 3.1, 405 billion parameter model, performed on these cases compared to GPT-4.'
Raj Manrai: 'Yeah, so these proprietary models, ones that you'll be very familiar with and most listeners will be familiar with, like ChatGPT from OpenAI, you have to use their interface...'
Raj Manrai: 'Very, very different, and I think very critical difference for healthcare applications are these open source models where the weights are available...'
Raj Manrai: 'It can also be fine-tuned, it can be tailored, it can be changed locally...'
Thomas Buckley: 'To me, this study is kind of like an existence proof that an open source model can do such a challenging task that we really didn't even consider what's possible.'
Thomas Buckley: 'So for example, at the BI, we have eight A100s, which is actually sufficient to run a model like this if you do tricks where you quantize the weights or you distill a model down.'
Thomas Buckley: 'And then at the same time, because we know an open source model can do such a challenging task, I think I'm really hopeful that smaller and smaller models will be able to do the same thing...'
Raj Manrai: 'Yes, we made the extremely surprising finding that the open source LAMA model performs on par with the proprietary GPT-4 model.'
Raj Manrai: 'I mean, I was shocked by this. Like, GBD4 has been just the be-all, end-all LLM for, you know, multiple years...'
Raj Manrai: 'This is also a conversation that we've had from the very beginning of starting NEJM AI...'
Raj Manrai: 'In this instance, it's not really about the particular models themselves...'
Raj Manrai: 'that on this hard task, on these hard cases, this open source model is able to perform on par with the until recently dominant GPT-4 model...'
Raj Manrai: 'And therefore, there's a lot of interesting work that we can do now with EHR records...'
Thomas Buckley: 'It kind of depends on the goals of the hospital...'
Raj Manrai: 'It's a great question. I think we are still at the very beginnings of sort of rigorously mapping out what models can be used for what tasks...'
Raj Manrai: 'I think we are only starting to sort of explore these questions systematically...'
Thomas Buckley: 'I think I personally would feel better. I feel like a chatbot in the hands of a trained physician...'
Raj Manrai: 'But if they're looking up something very basic that I would expect my doctor to know, that would also alarm me, right?'
Raj Manrai: 'And there are hallucinations. They make stuff up confidently. They have errors...'
Raj Manrai: 'But as a way of coming up with something that you might be missing, as a second opinion...'
A recent study published in JAMA Health Forum suggests that institutions may be able to deploy custom open-source large language models (LLMs) that run locally without sacrificing data privacy or flexibility. Coauthors Thomas A. Buckley, BS, and Arjun K. Manrai, PhD, from the Department of Biomedical Informatics at Harvard Medical School join JAMA+ AI Editor in Chief Roy H. Perlis, MD, MSc, to discuss. Related Content:
Can Open-Source AI Models Diagnose Complex Cases as Well as GPT-4?)