Suppr超能文献

利用多模态临床数据构建和验证预测鼻咽癌风险的机器学习模型:一项单中心回顾性研究

Construction and validation of a machine learning model to predict the risk of nasopharyngeal carcinoma using multimodal clinical data: a single-center, retrospective study.

作者信息

Li Xiao, Wang Zuheng, Chen Wenting, Wei Chunmeng, Lu Wenhao, Zhou Rongbin, Wang Fubo, Liang Leifeng

机构信息

School of Life Sciences, Guangxi Medical University, Nanning, 530021, Guangxi, China.

Center for Genomic and Personalized Medicine, Guangxi Key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, University Engineering Research Center of Digital Medicine and Healthcare, Guangxi Medical University, No. 22, Shuangyong Road, Qingxiu District, Nanning, 530021, Guangxi Zhuang Autonomous Region, China.

出版信息

Clin Transl Oncol. 2025 Jul 15. doi: 10.1007/s12094-025-03992-0.

Abstract

OBJECTIVE

Early detection and treatment of nasopharyngeal carcinoma (NPC) are critical for improving patient prognosis. The aim of this study is to develop and compare multiple machine learning (ML) models using multimodal clinical data to identify a predictive model for NPC risk, increase diagnostic accuracy, and guide personalized treatment strategies.

METHODS

Clinical data were retrospectively collected from 1337 patients suspected of having NPC at the First People's Hospital of Yulin. Feature selection was performed using the least absolute shrinkage and selection operator (LASSO) regression. Patients were divided into training and test sets (80:20 ratio), and seven ML models were developed based on the training set. Model performance was assessed using metrics such as the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. The best-performing model was further evaluated through decision curve analysis (DCA), calibration, and learning curves. SHapley Additive exPlanations (SHAP) were used to interpret key clinical features.

RESULTS

Seven models were developed using 17 clinical features selected from 53 parameters. The gradient boosting decision tree (GBDT) model demonstrated superior performance (AUC of 0.95 in the training cohort and 0.82 in the validation cohort). Calibration curves and DCA confirmed the model's strong accuracy and clinical benefit. SHAP analysis revealed that age, lymphocyte percentage, serum albumin, sex, and EBV IgM were the five most significant predictors of NPC risk.

CONCLUSION

The GBDT-based ML model, using multimodal clinical data, accurately identifies patients at high risk for NPC, providing a valuable tool for early screening and personalized treatment strategies.

摘要

目的

鼻咽癌(NPC)的早期检测和治疗对于改善患者预后至关重要。本研究的目的是开发并比较多种使用多模态临床数据的机器学习(ML)模型,以确定NPC风险的预测模型,提高诊断准确性,并指导个性化治疗策略。

方法

回顾性收集玉林市第一人民医院1337例疑似NPC患者的临床数据。使用最小绝对收缩和选择算子(LASSO)回归进行特征选择。将患者分为训练集和测试集(比例为80:20),并基于训练集开发了七个ML模型。使用受试者工作特征曲线下面积(AUC)、敏感性和特异性等指标评估模型性能。通过决策曲线分析(DCA)、校准和学习曲线对性能最佳的模型进行进一步评估。使用SHapley加性解释(SHAP)来解释关键临床特征。

结果

使用从53个参数中选择的17个临床特征开发了七个模型。梯度提升决策树(GBDT)模型表现出卓越性能(训练队列中的AUC为0.95,验证队列中的AUC为0.82)。校准曲线和DCA证实了该模型具有很高的准确性和临床益处。SHAP分析显示,年龄、淋巴细胞百分比、血清白蛋白、性别和EBV IgM是NPC风险的五个最显著预测因素。

结论

基于GBDT的ML模型利用多模态临床数据,准确识别出NPC高危患者,为早期筛查和个性化治疗策略提供了有价值的工具。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验