• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用基于SHAP的机器学习模型特征工程改进阑尾癌预测:一项预测研究。

Improving appendix cancer prediction with SHAP-based feature engineering for machine learning models: a prediction study.

作者信息

Kim Ji Yoon

机构信息

Ewha Womans University College of Medicine, Seoul, Korea.

出版信息

Ewha Med J. 2025 Apr;48(2):e31. doi: 10.12771/emj.2025.00297. Epub 2025 Apr 15.

DOI:10.12771/emj.2025.00297
PMID:40703369
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12277501/
Abstract

PURPOSE

This study aimed to leverage Shapley additive explanation (SHAP)-based feature engineering to predict appendix cancer. Traditional models often lack transparency, hindering clinical adoption. We propose a framework that integrates SHAP for feature selection, construction, and weighting to enhance accuracy and clinical relevance.

METHODS

Data from the Kaggle Appendix Cancer Prediction dataset (260,000 samples, 21 features) were used in this prediction study conducted from January through March 2025, in accordance with TRIPOD-AI guidelines. Preprocessing involved label encoding, SMOTE (synthetic minority over-sampling technique) to address class imbalance, and an 80:20 train-test split. Baseline models (random forest, XGBoost, LightGBM) were compared; LightGBM was selected for its superior performance (accuracy=0.8794). SHAP analysis identified key features and guided 3 engineering steps: selection of the top 15 features, construction of interaction-based features (e.g., chronic severity), and feature weighting based on SHAP values. Performance was evaluated using accuracy, precision, recall, and F1-score.

RESULTS

Four LightGBM model configurations were evaluated: baseline (accuracy=0.8794, F1-score=0.8691), feature selection (accuracy=0.8968, F1-score=0.8860), feature construction (accuracy=0.8980, F1-score=0.8872), and feature weighting (accuracy=0.8986, F1-score=0.8877). SHAP-based engineering yielded performance improvements, with feature weighting achieving the highest precision (0.9940). Key features (e.g., red blood cell count and chronic severity) contributed to predictions while maintaining interpretability.

CONCLUSION

The SHAP-based framework substantially improved the accuracy and transparency of appendix cancer predictions using LightGBM (F1-score=0.8877). This approach bridges the gap between predictive power and clinical interpretability, offering a scalable model for rare disease prediction. Future validation with real-world data is recommended to ensure generalizability.

摘要

目的

本研究旨在利用基于夏普利值加法解释(SHAP)的特征工程来预测阑尾癌。传统模型往往缺乏透明度,这阻碍了其在临床中的应用。我们提出了一个框架,该框架整合了SHAP用于特征选择、构建和加权,以提高准确性和临床相关性。

方法

根据TRIPOD-AI指南,在2025年1月至3月进行的这项预测研究中,使用了来自Kaggle阑尾癌预测数据集(260,000个样本,21个特征)的数据。预处理包括标签编码、使用SMOTE(合成少数过采样技术)来解决类别不平衡问题,以及80:20的训练-测试分割。对基线模型(随机森林、XGBoost、LightGBM)进行了比较;LightGBM因其卓越的性能(准确率=0.8794)而被选中。SHAP分析确定了关键特征,并指导了三个工程步骤:选择前15个特征、构建基于交互的特征(例如,慢性严重程度)以及基于SHAP值进行特征加权。使用准确率、精确率、召回率和F1分数来评估性能。

结果

评估了四种LightGBM模型配置:基线配置(准确率=0.8794,F1分数=0.8691)、特征选择配置(准确率=0.8968,F1分数=0.8860)、特征构建配置(准确率=0.8980,F1分数=0.8872)和特征加权配置(准确率=0.8986,F1分数=0.8877)。基于SHAP的工程方法提高了性能,特征加权实现了最高的精确率(0.9940)。关键特征(例如,红细胞计数和慢性严重程度)在保持可解释性的同时对预测有贡献。

结论

基于SHAP的框架显著提高了使用LightGBM进行阑尾癌预测的准确性和透明度(F1分数=0.8877)。这种方法弥合了预测能力与临床可解释性之间的差距,为罕见病预测提供了一个可扩展的模型。建议使用真实世界数据进行未来验证,以确保其通用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/051fdb512692/emj-2025-00297f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/5d17f67b3428/emj-2025-00297f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/7f41ae94c98b/emj-2025-00297f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/6785f62e9fdf/emj-2025-00297f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/f9d07ad3fbaf/emj-2025-00297f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/e16919581f68/emj-2025-00297f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/051fdb512692/emj-2025-00297f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/5d17f67b3428/emj-2025-00297f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/7f41ae94c98b/emj-2025-00297f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/6785f62e9fdf/emj-2025-00297f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/f9d07ad3fbaf/emj-2025-00297f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/e16919581f68/emj-2025-00297f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1a6/12277501/051fdb512692/emj-2025-00297f6.jpg

相似文献

1
Improving appendix cancer prediction with SHAP-based feature engineering for machine learning models: a prediction study.利用基于SHAP的机器学习模型特征工程改进阑尾癌预测:一项预测研究。
Ewha Med J. 2025 Apr;48(2):e31. doi: 10.12771/emj.2025.00297. Epub 2025 Apr 15.
2
Interpretable Machine Learning for Serum-Based Metabolomics in Breast Cancer Diagnostics: Insights from Multi-Objective Feature Selection-Driven LightGBM-SHAP Models.用于乳腺癌诊断的基于血清代谢组学的可解释机器学习:多目标特征选择驱动的LightGBM-SHAP模型的见解
Medicina (Kaunas). 2025 Jun 19;61(6):1112. doi: 10.3390/medicina61061112.
3
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型:基于多中心队列研究的开发与验证研究
J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.
4
Construction and validation of a risk prediction model for chronic obstructive pulmonary disease (COPD): a cross-sectional study based on the NHANES database from 2009 to 2018.慢性阻塞性肺疾病(COPD)风险预测模型的构建与验证:基于2009年至2018年美国国家健康与营养检查调查(NHANES)数据库的横断面研究
BMC Pulm Med. 2025 Jul 3;25(1):317. doi: 10.1186/s12890-025-03776-w.
5
Construction and validation of HBV-ACLF bacterial infection diagnosis model based on machine learning.基于机器学习的HBV-ACLF细菌感染诊断模型的构建与验证
BMC Infect Dis. 2025 Jul 1;25(1):847. doi: 10.1186/s12879-025-11199-5.
6
Prediction of Insulin Resistance in Nondiabetic Population Using LightGBM and Cohort Validation of Its Clinical Value: Cross-Sectional and Retrospective Cohort Study.使用LightGBM预测非糖尿病人群的胰岛素抵抗及其临床价值的队列验证:横断面和回顾性队列研究
JMIR Med Inform. 2025 Jun 13;13:e72238. doi: 10.2196/72238.
7
Clinical prediction of intravenous immunoglobulin-resistant Kawasaki disease based on interpretable Transformer model.基于可解释Transformer模型的静脉注射免疫球蛋白抵抗性川崎病的临床预测
PLoS One. 2025 Jul 9;20(7):e0327564. doi: 10.1371/journal.pone.0327564. eCollection 2025.
8
Prediction of Percutaneous Coronary Intervention Success in Patients With Moderate to Severe Coronary Artery Calcification Using Machine Learning Based on Coronary Angiography: Prospective Cohort Study.基于冠状动脉造影的机器学习预测中重度冠状动脉钙化患者经皮冠状动脉介入治疗的成功率:前瞻性队列研究
J Med Internet Res. 2025 Jul 11;27:e70943. doi: 10.2196/70943.
9
A holistic framework for intradialytic hypotension prediction using generative adversarial networks-based data balancing.一种基于生成对抗网络的数据平衡用于透析中低血压预测的整体框架。
BMC Med Inform Decis Mak. 2025 Jul 10;25(1):257. doi: 10.1186/s12911-025-03094-5.
10
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

引用本文的文献

1
Concurrent high-grade appendiceal mucinous neoplasm and adenocarcinoma: a unique case report and literature review.同时性高级别阑尾黏液性肿瘤和腺癌:1例独特病例报告及文献复习
J Surg Case Rep. 2025 Aug 29;2025(8):rjaf666. doi: 10.1093/jscr/rjaf666. eCollection 2025 Aug.

本文引用的文献

1
Development and Validation of a Machine Learning Algorithm Predicting Emergency Department Use and Unplanned Hospitalization in Patients With Head and Neck Cancer.开发和验证一种机器学习算法,用于预测头颈部癌症患者在急诊科的使用情况和非计划性住院。
JAMA Otolaryngol Head Neck Surg. 2022 Aug 1;148(8):764-772. doi: 10.1001/jamaoto.2022.1629.
2
Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.停止为高风险决策解释黑箱机器学习模型,转而使用可解释模型。
Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.
3
Explainable machine-learning predictions for the prevention of hypoxaemia during surgery.
用于预防手术期间低氧血症的可解释机器学习预测。
Nat Biomed Eng. 2018 Oct;2(10):749-760. doi: 10.1038/s41551-018-0304-0. Epub 2018 Oct 10.
4
High-performance medicine: the convergence of human and artificial intelligence.高性能医学:人机智能融合。
Nat Med. 2019 Jan;25(1):44-56. doi: 10.1038/s41591-018-0300-7. Epub 2019 Jan 7.
5
A guide to deep learning in healthcare.深度学习在医疗保健中的应用指南。
Nat Med. 2019 Jan;25(1):24-29. doi: 10.1038/s41591-018-0316-z. Epub 2019 Jan 7.
6
Clinical Decision Support in the Era of Artificial Intelligence.人工智能时代的临床决策支持
JAMA. 2018 Dec 4;320(21):2199-2200. doi: 10.1001/jama.2018.17163.
7
The rise in appendiceal cancer incidence: 2000-2009.2000 - 2009年阑尾癌发病率的上升
J Gastrointest Surg. 2015 Apr;19(4):743-50. doi: 10.1007/s11605-014-2726-7. Epub 2015 Jan 6.
8
Primary malignant neoplasms of the appendix: a population-based study from the surveillance, epidemiology and end-results program, 1973-1998.阑尾原发性恶性肿瘤:一项基于监测、流行病学和最终结果计划(1973 - 1998年)的人群研究
Cancer. 2002 Jun 15;94(12):3307-12. doi: 10.1002/cncr.10589.