Sebastian Anu Maria, Peter David, Rajagopal T P, Sebastian Rinu Ann
Department of Computer Science and Engineering, Indian Institute of Information Technology, Kottayam, Kerala, India.
Department of Computer Science, Cochin University of Science and Technology, Kochi, Kerala, India.
Technol Cancer Res Treat. 2025 Jan-Dec;24:15330338251370239. doi: 10.1177/15330338251370239. Epub 2025 Aug 14.
IntroductionLung cancer has the highest mortality rate among all cancer types globally, largely due to delayed or ineffective diagnosis and treatment. Radiomics is commonly used to diagnose lung cancer, especially in later stages or during routine screenings. However, frequent radiological imaging poses health risks, and while advanced diagnostic alternatives exist, they are often costly and accessible only to a limited, privileged population. Leveraging clinical data using machine learning (ML) and artificial intelligence (AI) enables a safer, more inclusive, and affordable solution. Due to a lack of interpretability, AI-based models for cancer diagnosis are less adopted by clinicians.MethodsThis study introduces a safe, inclusive, and cost-effective lung cancer diagnostic method using an explainable AI (XAI) model built on routine clinical data. It employs a stacking ensemble of Artificial Neural Network (ANN) and Deep Neural Network (DNN) to match the diagnostic performance of clean-data DNN models. By incorporating rare medical cases through Adaptive Synthetic Sampling (ADASYN), the model reduces the risk of missing challenging, rare-case diagnoses.ResultsThe proposed XAI model demonstrates strong performance with an accuracy of 0.8558, AUC of 0.8600, precision of 0.8092, recall of 0.9282, and F1-score of 0.8646, notably improving rare case detection by over 50%. SHapley additive exPlanations(SHAP)-based interpretability highlights Erythrocyte sedimentation rate(ESR), intoxication-related factors, hemoglobin levels, and neutrophil counts as key features. The model also reveals associations, such as a link between heavy tobacco use and elevated ESR. Counterfactual explanations help identify features contributing to misdiagnoses by exposing sources of confusion in the model's decisions.ConclusionGiven the limited dataset size and geographic constraints, this research should be viewed as a prototype and in its current form, the model is best suited as a pre-screening tool to support early detection. With training on larger and more diverse datasets, the model has strong potential to evolve into a robust and scalable diagnostic solution.
引言
肺癌在全球所有癌症类型中死亡率最高,这主要归因于诊断和治疗的延迟或无效。放射组学常用于肺癌诊断,尤其是在晚期或常规筛查期间。然而,频繁的放射成像会带来健康风险,虽然存在先进的诊断替代方法,但它们往往成本高昂,且只有有限的特权人群能够使用。利用机器学习(ML)和人工智能(AI)处理临床数据能够提供一种更安全、更具包容性且经济实惠的解决方案。由于缺乏可解释性,基于AI的癌症诊断模型较少被临床医生采用。
方法
本研究介绍了一种安全、包容且经济高效的肺癌诊断方法,该方法使用基于常规临床数据构建的可解释AI(XAI)模型。它采用人工神经网络(ANN)和深度神经网络(DNN)的堆叠集成来匹配干净数据DNN模型的诊断性能。通过自适应合成采样(ADASYN)纳入罕见病例,该模型降低了错过具有挑战性的罕见病例诊断的风险。
结果
所提出的XAI模型表现出色,准确率为0.8558,AUC为0.8600,精确率为0.8092,召回率为0.9282,F1分数为0.8646,显著提高了50%以上的罕见病例检测率。基于SHapley加性解释(SHAP)的可解释性突出了红细胞沉降率(ESR)、中毒相关因素、血红蛋白水平和中性粒细胞计数作为关键特征。该模型还揭示了一些关联,例如大量吸烟与ESR升高之间的联系。反事实解释通过揭示模型决策中的混淆来源,有助于识别导致误诊的特征。
结论
鉴于数据集规模有限和地理限制,本研究应被视为一个原型,就其目前的形式而言,该模型最适合作为一种预筛查工具来支持早期检测。随着在更大、更多样化的数据集上进行训练,该模型有很强的潜力发展成为一个强大且可扩展的诊断解决方案。