We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Machine Learning for Earlier Diagnosis of Schizophrenia

Machine Learning for Earlier Diagnosis of Schizophrenia

2025/3/7
logo of podcast JAMA Medical News

JAMA Medical News

AI Deep Dive AI Chapters Transcript
People
S
Søren Dinesen Østergaard
Topics
Søren Dinesen Østergaard: 严重精神疾病的早期诊断具有挑战性,因为患者在确诊前就可能已经出现症状,而诊断延迟会影响预后。本研究使用机器学习模型,基于患者既往的治疗记录,预测其发展为精神分裂症或双相情感障碍的风险。模型的阳性预测结果应促使临床医生更加关注精神分裂症或双相情感障碍的诊断,并提出更多相关问题,从而缩短诊断延迟,改善患者预后。模型更容易预测精神分裂症而非双相情感障碍,这可能是因为精神分裂症的早期症状比双相情感障碍更具同质性。模型的阳性预测值较低,因此目前不宜直接应用于临床实践,而应将其视为概念验证,未来有待改进。该模型是一个动态预测模型,它考虑了患者病情随时间的变化,并在每次门诊就诊前进行预测,这更符合临床实践。临床记录中的文本信息对预测尤为重要,这些信息主要反映了疾病严重程度、阳性症状和阴性症状。模型在不同医院的性能存在差异,这可能是由于医院规模、专业水平和患者群体等因素造成的。要提高模型的临床应用价值,需要改进模型本身,增加更多有用的数据,并谨慎选择研究问题。 Roy H. Perlis: (作为主持人,没有提出核心论点,主要提出问题引导讨论)

Deep Dive

Shownotes Transcript

Translations:
中文

Welcome to JAMA+AI Conversations. I'm Dr. Roy Perlis, Editor-in-Chief of JAMA+AI and today's host. I want to welcome our guest, Dr. Søren Denison Oestergaard, Professor at Aarhus University and Aarhus University Hospital's Department of Affective Disorders. Dr. Oestergaard, thank you for joining me today to discuss your recent paper in JAMA Psychiatry titled Machine Learning Models for Predicting Diagnostic Progression to Schizophrenia or Bipolar Disorder.

delighted to be here and thanks a lot for having me. Absolutely. And I first want to give you props for calling it machine learning models and not artificial intelligence. You're bucking the trend. I like it. It's got sort of a retro credibility to it. So this study focused on predicting diagnostic transitions to serious mental illness

For the non-psychiatrists in the audience, why is early detection of schizophrenia and bipolar disorder so challenging in clinical practice? Why does this matter?

Excellent question, because there is a substantial literature documenting that sort of in hindsight, once people with severe mental disorders are actually diagnosed, when you look at their disease trajectory, it is often the case that you can sort of see that perhaps we could have arrived at this diagnosis somewhat earlier.

And then there's also quite a lot of evidence showing that the diagnostic delay, the fact that these diagnoses are not always given timely and treatment started timely, the longer that delay, the worse the prognosis of these patients. So if we could shorten that delay, that would probably benefit patients quite a lot.

So just to clarify, who are you making these predictions on? So these are folks who are already getting some kind of treatment. They're already engaged with the mental health system, but they don't have a SMI diagnosis yet? Exactly. So the data for this study stem from a fairly large psychiatric service system in Denmark. So this is a public psychiatric service where patients come for many different problems and

So the population we are seeing here are patients that are already in some form of treatment, but for, let's say, less severe mental disorders. That could be a depression disorder. It could be an anxiety disorder. It could be sort of illness of that level of severity.

And we know, of course, that some of these patients will eventually end up with a diagnosis of either schizophrenia or bipolar disorder. So what we are trying to do here is to sort of identify the prodromal phase of these illnesses in order to single out patients that are at very high risk of eventually ending up with these disorders or perhaps already meeting the diagnostic criteria.

And so if you knew, you know, let's say that this test was perfect, which of course it's not, what would you do differently? What's the change in practice on the basis of this kind of a diagnostic? We see it as kind of a paraclinical test of some sort that would guide attention towards a particular illness trajectory. So you could argue exactly if this model was perfect in terms of prediction, then

then you could, of course, just assign a diagnosis. But that is, of course, not the case. So a positive prediction in the model we have developed here should probably lead to some diagnostic emphasis, attention towards bipolar disorder or schizophrenia in your examination of these patients. So maybe in terms of the next time you're seeing the patient for an outpatient visit, I'm

ask questions that would potentially inform whether bipolar disorder or schizophrenia is developing or potentially already the illness that the patient is suffering from. So this is sort of an ask more questions signal.

That's the way we tend to see it. So I think this should probably guide attention towards a particular diagnostic trajectory and for the clinician, maybe lead to him or her asking questions that would inform whether these diagnoses or mental disorders are developing. Got it. One of the things I found interesting was that it was so much easier to predict schizophrenia than bipolar disorder in your models, which...

naively, I wouldn't have assumed. What do you make of that? Why was one diagnosis so much easier, relatively speaking, than the other?

Yes, that also came as a bit of a surprise to us. And originally, we basically merged these two outcomes together in order to have more power when training the model. But as you say, it turned out that when looking at the outcome specifically and training the model specifically to identify either developing schizophrenia, bipolar disorder, it was substantially easier to predict schizophrenia.

The way we think about it is that it is probably due to the fact that the manifestation of the onset of schizophrenia is probably more homogenous than it is for bipolar disorder. Bipolar disorder, when that diagnosis is made, the patients can be in quite different phases of the illness. It could be tremendously severe mania or it could be very, very deep depression.

It could also be like sort of a retrospective diagnosis based on anamnesis that in an outpatient course of treatment, you have a patient that you're seeing for depression and you're then questioning about prior hypomanic episodes and you actually get a description of a hypomania. And then the diagnosis of bipolar disorder is assigned there.

So these manifestations of bipolar disorder are very, very different. Whereas with schizophrenia, the most common manifestation when the diagnosis is assigned would be a psychotic breakthrough, probably with paranoia and voice hearing. So somewhat more homogenous than bipolar disorder. I think that's the most likely explanation.

That makes a lot of sense. I can imagine that clinically. And, you know, I suppose a model that predicts schizophrenia with some reasonable discrimination, you don't need a perfect model. You don't need something that's sort of one size fits all. So I can appreciate the utility of focusing on schizophrenia. I wondered, given that the PPV, the positive predictive value is

Despite the models performing fairly well, the positive predictive values are still quite low, as I recall. So whatever your intentions, do you worry that clinicians will start, when a test like this is rolled out, over-interpreting the test result? So do you worry that people will see this as, oh, this person has schizophrenia, even though the PPVs are low?

That's an excellent question and definitely something to consider if you were to implement a model like this. As you rightly point out, the positive predictive value here is not ideal at all and maybe, I would argue, a bit too low for clinical implementation. So rather than seeing this as something that should definitely be implemented, I see it more as a sort of a starting point or a proof of concept

showing that we are able to sort of partly solve this very, very complex clinical problem, but we are not quite there yet. But with the development of this technology, we may eventually probably get there and that this is one of the early steps in getting there.

Also, in terms of the positive predictive value, it is also important to consider that our main unit of analysis here are not patients, but outpatient contacts with our psychiatric services. So even though a patient is not sort of, let's say, correctly classified at one visit level,

he or she might be correctly identified at subsequent visits. So it is a dynamic prediction model that takes into account that the patient's course of illness is changing over time, which is, I think, one of the good things about this study.

because there are quite some of the earlier studies have done one prediction at the beginning and then just looked years into the future. We do it dynamically with predictions issued at each outpatient visit or actually the day before each outpatient visit.

And that should give a more dynamic and realistic model that fits with clinical practice. Because that's, of course, also how clinical practice works, that for each visit where you see the patient, your information will be updated and you will be better at making predictions the more you know.

I'm glad you reinforced that point because I think an important distinction is that you're not testing someone once. You're running this in the background. And I liked in the paper that you talk about, I think you say the median time from the first positive prediction to diagnosis was a little over a year. It seems like a useful way of characterizing how much advance warning are you actually getting? And it seems like you could do a lot with a year.

One of the questions I had for you is you found that clinical notes were especially informative for prediction. Can you say a little bit about what you actually found in the notes that helped with prediction? Yes. So this was also a finding that we liked because that was part of our hypothesis and part of the reason why we included the clinical notes in the training data analysis.

I should also say that in terms of the data, we only use data that are routinely collected or are available as part of clinical practice. All of this stems from the electronic patient record. There's no dedicated data collection for this study. So another way of looking at this is that we are basically trying to leverage information that is already there. There are no costs, so to say, in terms of gathering additional information.

information on the patients in order to feed this model. It is based entirely on the data at hand. In terms of the text, it is completely right that that turned out to be most important for the prediction. And when we interpret the most important text predictors, they all more or less seem to fall into three different categories. One is a proxy for severity,

That is text that clearly documents activities taking place during inpatient stays, probably also inpatient stays of a certain length, which will reflect severe mental illness in need of inpatient treatment. The second one would be positive symptoms, voice hearing. And the third one would be negative symptoms, sort of social withdrawal symptoms.

So when we saw this, we were quite pleased because it would sort of, from an explanatory perspective, it fits with your clinical gut feeling that all of these text predictors, they make sense from a clinical perspective, which was nice. It is nice that the face validity is there. It makes sense. And I've always had the sense that clinicians often know things and document them obliquely before the diagnosis changes. So I think this is consistent with that impression. Yeah.

One of the things you did in the paper was you looked at your models across different hospital sites. And I know you found some decrease in the performance of the models when you looked at other hospitals. What's your take of that? Why does it do less well outside of the initial data set?

Yes, that is completely correct. So when we did the split between hospital sites, we were aware of the fact that they were probably not, let's say that the training split and the test split, there were some structural differences in terms of the hospital sites and the level of specialization. Specifically, one of the sites in the training data is our university hospital site, which

where to some extent you may see more patients with more complicated illness, more complex cases, and the level of expertise may also differ from some of the other hospitals. So some of that is probably just

by design, the training sites and the test sites are not entirely similar. So I think that's the main explanation. Could also be in the patient population, some being more rural, some being more urban. So there are several things that can feed into this. So

I guess this underscores that you really need to test the prediction level in the actual places where you want to potentially deploy the model. That is really, really critical. And I still think it's an open question whether you should do like a geographical split as we did here or more like a random split within the entire population. Whether the former or the latter makes more sense, I still think that's an open question. Depends on which question you would like to answer.

Personally, I think that perhaps the random split would have been a better way of doing it here. And then looking at performance stratified by hospital sites afterwards. I think if we were to do something similar going forward, which we are with other outcomes, we would probably do it that way, I think. Yeah, I sometimes wonder if individual health systems or hospital systems are going to, in the end, end up having to build their own models.

just because we see this pattern over and over and over again, where you move from one system to another or one set of hospitals to another. So that brings me to a question that may be sort of broader than the paper, which is we see consistently in health records, at least in psychiatry, but I think more generally,

that someone builds a model, the model performs reasonably well, but the conclusion is we're not there yet. This is promising, but we, you know, we need to do more.

And I guess I would put to you the question, what actually do we need to be doing? Is it better models? You know, will better machine learning solve this problem? Is it other things in the health record that we're not incorporating? Or are we just not measuring the things that we need to measure to really build clinical models from health records or registry data that will be performant enough to be deployed?

The good news is nobody knows the answer to this, or at least nobody knows it yet. What tact are you guys taking? What do you think? Probably a bit of both. Having better methods and more sophisticated hardware will probably help. And including more data may also help. Not necessarily that the data has to be informative. Otherwise, it will probably have the opposite effect.

in terms of prediction. And I actually think that maybe not for this particular model, because the prediction level is pretty far from perfect, but with some of the other models we have developed, I actually think that they are sufficiently accurate to be deployed in clinical practice. We had a, based on completely analog methods, as in this particular paper, we recently published a paper on prediction of involuntary admission.

So those were predictions issued at the final day of a voluntary inpatient stay. And we could then predict with quite high sensitivity and specificity, a very high positive predictive value in voluntary admissions at the individual patient level within the next six months. And we are now actually contemplating whether this is something we should move forward with and actually implement in clinical practice. So we may already be there in some cases.

With regard to this particular study on schizophrenia and bipolar disorder, I think we have identified a very difficult clinical problem where prediction is simply very difficult irrespective of which methods you're using and how good data you have. So this may simply be a problem that is too hard to crack, but there are other things where I think we have results that would actually support that we should move on towards clinical implementation.

I think that's probably a good note to end on. We need to pick our problems carefully if the goal is clinical deployment.

I'm Dr. Roy Perlis. I've been speaking with Dr. Søren Denison-Ostergaard from Aarhus University about using machine learning to predict progression to serious mental illness. You can find a link to the article in this episode's description. To follow this and other JAMA Network podcasts, please visit us online at jamanetworkaudio.com or search for JAMA Network wherever you get your podcasts. This episode was produced by Daniel Morrow at the JAMA Network. Thanks for joining us.