用于结直肠癌风险预测与分层的稳健机器学习

Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification.

作者信息

Nartowt Bradley J, Hart Gregory R, Muhammad Wazir, Liang Ying, Stark Gigi F, Deng Jun

机构信息

Department of Therapeutic Radiology, Yale University, New Haven, CT, United States.

Department of Radiation Oncology, Medial College of Wisconsin, Milwaukee, WI, United States.

出版信息

Front Big Data. 2020 Mar 10;3:6. doi: 10.3389/fdata.2020.00006. eCollection 2020.

DOI:10.3389/fdata.2020.00006

PMID:33693381

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7931964/

Abstract

While colorectal cancer (CRC) is third in prevalence and mortality among cancers in the United States, there is no effective method to screen the general public for CRC risk. In this study, to identify an effective mass screening method for CRC risk, we evaluated seven supervised machine learning algorithms: linear discriminant analysis, support vector machine, naive Bayes, decision tree, random forest, logistic regression, and artificial neural network. Models were trained and cross-tested with the National Health Interview Survey (NHIS) and the Prostate, Lung, Colorectal, Ovarian Cancer Screening (PLCO) datasets. Six imputation methods were used to handle missing data: mean, Gaussian, Lorentzian, one-hot encoding, Gaussian expectation-maximization, and listwise deletion. Among all of the model configurations and imputation method combinations, the artificial neural network with expectation-maximization imputation emerged as the best, having a concordance of 0.70 ± 0.02, sensitivity of 0.63 ± 0.06, and specificity of 0.82 ± 0.04. In stratifying CRC risk in the NHIS and PLCO datasets, only 2% of negative cases were misclassified as high risk and 6% of positive cases were misclassified as low risk. In modeling the CRC-free probability with Kaplan-Meier estimators, low-, medium-, and high CRC-risk groups have statistically-significant separation. Our results indicated that the trained artificial neural network can be used as an effective screening tool for early intervention and prevention of CRC in large populations.

摘要

虽然结直肠癌（CRC）在美国癌症的患病率和死亡率中位列第三，但目前尚无有效的方法对普通公众进行CRC风险筛查。在本研究中，为了确定一种有效的CRC风险群体筛查方法，我们评估了七种监督式机器学习算法：线性判别分析、支持向量机、朴素贝叶斯、决策树、随机森林、逻辑回归和人工神经网络。使用美国国家健康访谈调查（NHIS）和前列腺、肺、结肠、卵巢癌筛查（PLCO）数据集对模型进行训练和交叉测试。采用六种插补方法处理缺失数据：均值、高斯、洛伦兹、独热编码、高斯期望最大化和列表删除。在所有模型配置和插补方法组合中，采用期望最大化插补的人工神经网络表现最佳，一致性为0.70±0.02，灵敏度为0.63±0.06，特异性为0.82±0.04。在对NHIS和PLCO数据集中的CRC风险进行分层时，只有2%的阴性病例被误分类为高风险，6%的阳性病例被误分类为低风险。在用Kaplan-Meier估计器对无CRC概率进行建模时，低、中、高CRC风险组有统计学上的显著差异。我们的结果表明，经过训练的人工神经网络可作为一种有效的筛查工具，用于在大量人群中对CRC进行早期干预和预防。

相似文献

Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification.用于结直肠癌风险预测与分层的稳健机器学习

Front Big Data. 2020 Mar 10;3:6. doi: 10.3389/fdata.2020.00006. eCollection 2020.

Scoring colorectal cancer risk with an artificial neural network based on self-reportable personal health data.基于可自我报告的个人健康数据的人工神经网络对结直肠癌风险进行评分。

PLoS One. 2019 Aug 22;14(8):e0221421. doi: 10.1371/journal.pone.0221421. eCollection 2019.

Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study.基于生存事件的机器学习预测结直肠癌患者生存情况：回顾性队列研究。

J Med Internet Res. 2023 Oct 26;25:e44417. doi: 10.2196/44417.

Classification and Diagnostic Prediction of Colorectal Cancer Mortality Based on Machine Learning Algorithms: A Multicenter National Study.基于机器学习算法的结直肠癌死亡率的分类和诊断预测：一项多中心全国性研究。

Asian Pac J Cancer Prev. 2024 Jan 1;25(1):333-342. doi: 10.31557/APJCP.2024.25.1.333.

Machine learning-based colorectal cancer prediction using global dietary data.基于机器学习的全球饮食数据结直肠癌预测。

BMC Cancer. 2023 Feb 10;23(1):144. doi: 10.1186/s12885-023-10587-x.

Calibration and Validation of the Colorectal Cancer and Adenoma Incidence and Mortality (CRC-AIM) Microsimulation Model Using Deep Neural Networks.基于深度神经网络的结直肠癌与腺瘤发病和死亡（CRC-AIM）微观模拟模型的校准和验证。

Med Decis Making. 2023 Aug;43(6):719-736. doi: 10.1177/0272989X231184175. Epub 2023 Jul 11.

Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.利用机器学习进行准确的糖尿病风险分层：缺失值和异常值的作用。

J Med Syst. 2018 Apr 10;42(5):92. doi: 10.1007/s10916-018-0940-7.

Blood Biomarkers Panels for Screening of Colorectal Cancer and Adenoma on a Machine Learning-Assisted Detection Platform.基于机器学习辅助检测平台的用于结直肠癌和腺瘤筛查的血液生物标志物检测面板。

Cancer Control. 2023 Jan-Dec;30:10732748231222109. doi: 10.1177/10732748231222109.

Predicting Colorectal Cancer Recurrence and Patient Survival Using Supervised Machine Learning Approach: A South African Population-Based Study.使用监督机器学习方法预测结直肠癌复发和患者生存：一项南非基于人群的研究。

Front Public Health. 2021 Jul 7;9:694306. doi: 10.3389/fpubh.2021.694306. eCollection 2021.

Application of supervised machine learning algorithms for classification and prediction of type-2 diabetes disease status in Afar regional state, Northeastern Ethiopia 2021.2021 年，埃塞俄比亚东北部阿法尔地区使用监督机器学习算法对 2 型糖尿病疾病状况进行分类和预测。

Sci Rep. 2023 May 13;13(1):7779. doi: 10.1038/s41598-023-34906-1.

引用本文的文献

Development and Validation of a Lifestyle-Based 10-Year Risk Prediction Model of Colorectal Cancer for Early Stratification: Evidence from a Longitudinal Screening Cohort in China.基于生活方式的结直肠癌10年风险预测模型的开发与验证用于早期分层：来自中国纵向筛查队列的证据

Nutrients. 2025 May 31;17(11):1898. doi: 10.3390/nu17111898.

Artificial intelligence in inflammatory bowel disease.炎症性肠病中的人工智能

Saudi J Gastroenterol. 2025 Jul 1;31(4):197-205. doi: 10.4103/sjg.sjg_46_25. Epub 2025 Apr 25.

Deep learning-based identification of patients at increased risk of cancer using routine laboratory markers.利用常规实验室指标，基于深度学习识别患癌风险增加的患者。

Sci Rep. 2025 Apr 12;15(1):12661. doi: 10.1038/s41598-025-97331-6.

Primary Care Provider Preferences Regarding Artificial Intelligence in Point-of-Care Cancer Screening.初级保健提供者对即时癌症筛查中人工智能的偏好

MDM Policy Pract. 2025 Apr 4;10(1):23814683251329007. doi: 10.1177/23814683251329007. eCollection 2025 Jan-Jun.

Evaluating generalizability of oncology trial results to real-world patients using machine learning-based trial emulations.使用基于机器学习的试验模拟评估肿瘤学试验结果对真实世界患者的可推广性。

Nat Med. 2025 Feb;31(2):457-465. doi: 10.1038/s41591-024-03352-5. Epub 2025 Jan 3.

Explainable Machine Learning to Predict Treatment Response in Advanced Non-Small Cell Lung Cancer.用于预测晚期非小细胞肺癌治疗反应的可解释机器学习

JCO Clin Cancer Inform. 2025 Jan;9:e2400157. doi: 10.1200/CCI-24-00157. Epub 2025 Jan 3.

A Survey of Perspectives and Educational Needs of Canadian Oncology Residents on Artificial Intelligence.加拿大肿瘤学住院医师对人工智能的观点及教育需求调查

J Cancer Educ. 2025 Apr;40(2):273-279. doi: 10.1007/s13187-024-02509-7. Epub 2024 Sep 30.

Neoadjuvant Statistical Algorithm to Predict Individual Risk of Relapse in Patients with Resected Liver Metastases from Colorectal Cancer.预测结直肠癌肝转移切除患者个体复发风险的新辅助统计算法

Biomedicines. 2024 Aug 15;12(8):1859. doi: 10.3390/biomedicines12081859.

Barriers in early detection of colorectal cancer and exploring potential solutions.结直肠癌早期检测中的障碍及探索潜在解决方案。

World J Clin Oncol. 2024 Jul 24;15(7):811-817. doi: 10.5306/wjco.v15.i7.811.

Development of an Artificial-Intelligence-Based Tool for Automated Assessment of Cellularity in Bone Marrow Biopsies in Ph-Negative Myeloproliferative Neoplasms.基于人工智能的阴性骨髓增殖性肿瘤骨髓活检细胞密度自动评估工具的开发

Cancers (Basel). 2024 Apr 26;16(9):1687. doi: 10.3390/cancers16091687.

本文引用的文献

A multi-parameterized artificial neural network for lung cancer risk prediction.用于肺癌风险预测的多参数人工神经网络。

PLoS One. 2018 Oct 24;13(10):e0205264. doi: 10.1371/journal.pone.0205264. eCollection 2018.

Predicting non-melanoma skin cancer via a multi-parameterized artificial neural network.通过多参数人工神经网络预测非黑色素瘤皮肤癌。

Sci Rep. 2018 Jan 26;8(1):1701. doi: 10.1038/s41598-018-19907-9.

Systematic review of colorectal cancer screening guidelines for average-risk adults: Summarizing the current global recommendations.系统评价结直肠癌筛查指南在一般风险成年人中的应用：总结当前全球建议。

World J Gastroenterol. 2018 Jan 7;24(1):124-138. doi: 10.3748/wjg.v24.i1.124.

Screening for Colorectal Cancer: US Preventive Services Task Force Recommendation Statement.结直肠癌筛查：美国预防服务工作组推荐声明。

JAMA. 2016 Jun 21;315(23):2564-2575. doi: 10.1001/jama.2016.5989.

Risk Prediction Models for Colorectal Cancer: A Systematic Review.结直肠癌风险预测模型：一项系统综述

Cancer Prev Res (Phila). 2016 Jan;9(1):13-26. doi: 10.1158/1940-6207.CAPR-15-0274. Epub 2015 Oct 13.

Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.个体预后或诊断多变量预测模型的透明报告（TRIPOD）：TRIPOD声明

Br J Cancer. 2015 Jan 20;112(2):251-9. doi: 10.1038/bjc.2014.639. Epub 2015 Jan 6.

Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation.用于医学诊断测试评估的受试者工作特征（ROC）曲线分析。

Caspian J Intern Med. 2013 Spring;4(2):627-35.

Effect of metabolic syndrome and its components on recurrence and survival in colon cancer patients.代谢综合征及其组分对结肠癌患者复发和生存的影响。

Cancer. 2013 Apr 15;119(8):1512-20. doi: 10.1002/cncr.27923. Epub 2012 Dec 20.

Use of colonoscopy as a primary screening test for colorectal cancer in average risk people.将结肠镜检查用作一般风险人群结直肠癌的主要筛查测试。

Am J Gastroenterol. 2003 Dec;98(12):2648-54. doi: 10.1111/j.1572-0241.2003.08771.x.

Reduced risk of colorectal cancer among long-term users of aspirin and nonaspirin nonsteroidal antiinflammatory drugs.长期服用阿司匹林和非阿司匹林非甾体抗炎药可降低患结直肠癌的风险。

Epidemiology. 2001 Jan;12(1):88-93. doi: 10.1097/00001648-200101000-00015.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于结直肠癌风险预测与分层的稳健机器学习

Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献