Flyckt Ricco Noel Hansen, Sjodsholm Louise, Henriksen Margrethe Høstgaard Bang, Brasen Claus Lohman, Ebrahimi Ali, Hilberg Ole, Hansen Torben Frøstrup, Wiil Uffe Kock, Jensen Lars Henrik, Peimankar Abdolrahman
SDU Health Informatics and Technology, The Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, 5230, Odense, Denmark.
Department of Oncology, Vejle Hospital, University Hospital of Southern Denmark, 7100, Vejle, Denmark.
Sci Rep. 2024 Dec 24;14(1):30630. doi: 10.1038/s41598-024-82093-4.
Lung cancer (LC) remains the primary cause of cancer-related mortality, largely due to late-stage diagnoses. Effective strategies for early detection are therefore of paramount importance. In recent years, machine learning (ML) has demonstrated considerable potential in healthcare by facilitating the detection of various diseases. In this retrospective development and validation study, we developed an ML model based on dynamic ensemble selection (DES) for LC detection. The model leverages standard blood sample analysis and smoking history data from a large population at risk in Denmark. The study includes all patients examined on suspicion of LC in the Region of Southern Denmark from 2009 to 2018. We validated and compared the predictions by the DES model with diagnoses provided by five pulmonologists. Among the 38,944 patients, 9,940 had complete data of which 2,505 (25%) had LC. The DES model achieved an area under the roc curve of 0.77±0.01, sensitivity of 76.2%±2.04%, specificity of 63.8%±2.3%, positive predictive value of 41.6%±1.2%, and F-score of 53.8%±1.0%. The DES model outperformed all five pulmonologists, achieving a sensitivity 6.5% higher than their average. The model identified smoking status, lactate dehydrogenase, age, total calcium levels, low values of sodium, leucocytes, neutrophil count, and C-reactive protein as the most important factors for LC detection. The results highlight the successful application of the ML approach in detecting LC, surpassing pulmonologists' performance. Incorporating clinical and laboratory data in future risk assessment models can improve decision-making and facilitate timely referrals.
肺癌(LC)仍然是癌症相关死亡的主要原因,这主要归因于晚期诊断。因此,有效的早期检测策略至关重要。近年来,机器学习(ML)通过促进各种疾病的检测,在医疗保健领域展现出了巨大潜力。在这项回顾性开发与验证研究中,我们基于动态集成选择(DES)开发了一种用于肺癌检测的ML模型。该模型利用了丹麦大量高危人群的标准血液样本分析和吸烟史数据。该研究纳入了2009年至2018年在丹麦南部地区因疑似肺癌接受检查的所有患者。我们将DES模型的预测结果与五位肺科医生提供的诊断结果进行了验证和比较。在38944名患者中,9940名患者有完整数据,其中2505名(25%)患有肺癌。DES模型的roc曲线下面积为0.77±0.01,灵敏度为76.2%±2.04%,特异性为63.8%±2.3%,阳性预测值为41.6%±1.2%,F值为53.8%±1.0%。DES模型的表现优于所有五位肺科医生,其灵敏度比他们的平均水平高6.5%。该模型将吸烟状况、乳酸脱氢酶、年龄、总钙水平、低钠值、白细胞、中性粒细胞计数和C反应蛋白确定为肺癌检测的最重要因素。结果突出了ML方法在肺癌检测中的成功应用,超过了肺科医生的表现。将临床和实验室数据纳入未来的风险评估模型可以改善决策并促进及时转诊。