Pucher Gernot, Rostalski Till, Nensa Felix, Kleesiek Jens, Reinhardt Hans Christian, Sauer Christopher Martin
Department of Haematology & Stem Cell Transplantation, West German Cancer Center, University Hospital Essen, Essen, Germany; Laboratory for Clinical Research and Real-World Evidence, Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany.
Laboratory for Clinical Research and Real-World Evidence, Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany.
EBioMedicine. 2025 Jan;111:105526. doi: 10.1016/j.ebiom.2024.105526. Epub 2024 Dec 24.
Artificial intelligence (AI) and machine learning (ML) algorithms have shown great promise in clinical medicine. Despite the increasing number of published algorithms, most remain unvalidated in real-world clinical settings. This study aims to simulate the practical implementation challenges of a recently developed ML algorithm, AI-PAL, designed for the diagnosis of acute leukaemia and report on its performance.
We conducted a detailed simulation of the AI-PAL algorithm's implementation at the University Hospital Essen. Cohort building was performed using our Fast Healthcare Interoperability Resources (FHIR) database, identifying all initially diagnosed patients with acute leukaemia and selected differential diagnoses. The algorithm's performance was assessed by reproducing the original study's results.
The AI-PAL algorithm demonstrated significantly lower performance in our simulated clinical implementation compared to prior published results. The area under the receiver operating characteristic curve for acute lymphoblastic leukaemia dropped to 0.67 (95% CI: 0.61-0.73) and for acute myeloid leukaemia to 0.71 (95% CI: 0.65-0.76). The recalibration of probability cutoffs determining confident diagnoses increased the number of confident positive diagnosis for acute leukaemia from 98 to 160, highlighting the necessity of local validation and adjustments.
The findings underscore the challenges of implementing ML algorithms in clinical practice. Despite robust development and validation in research settings, ML models like AI-PAL may require significant adjustments and recalibration to maintain performance in different clinical settings. Our results suggest that clinical decision support algorithms should undergo local performance validation before integration into routine care to ensure reliability and safety.
This study was supported by the DFG-cofounded UMEA Clinician Scientist Program and the Ministry of Culture and Science of the State of North Rhine-Westphalia.
人工智能(AI)和机器学习(ML)算法在临床医学中显示出巨大潜力。尽管已发表的算法数量不断增加,但大多数在实际临床环境中仍未得到验证。本研究旨在模拟一种最近开发的用于诊断急性白血病的ML算法AI-PAL在实际应用中的挑战,并报告其性能。
我们在埃森大学医院对AI-PAL算法的实施进行了详细模拟。使用我们的快速医疗保健互操作性资源(FHIR)数据库构建队列,识别所有最初诊断为急性白血病的患者以及选定的鉴别诊断。通过重现原始研究的结果来评估该算法的性能。
与先前发表的结果相比,AI-PAL算法在我们模拟的临床应用中表现出明显较低的性能。急性淋巴细胞白血病的受试者操作特征曲线下面积降至0.67(95%置信区间:0.61-0.73),急性髓细胞白血病降至0.71(95%置信区间:0.65-0.76)。重新校准确定确诊诊断的概率阈值后,急性白血病的确诊阳性诊断数量从98增加到160,突出了进行本地验证和调整的必要性。
研究结果强调了在临床实践中实施ML算法的挑战。尽管在研究环境中进行了稳健的开发和验证,但像AI-PAL这样的ML模型在不同临床环境中可能需要进行重大调整和重新校准以维持性能。我们的结果表明,临床决策支持算法在整合到常规护理之前应进行本地性能验证,以确保可靠性和安全性。
本研究得到了德国研究基金会共同资助的UME临床科学家计划以及北莱茵-威斯特法伦州文化和科学部的支持。