基于常规实验室检验数据的机器学习模型检测结直肠癌

Colorectal Cancer Detected by Machine Learning Models Using Conventional Laboratory Test Data.

机构信息

373651Department of Clinical Laboratory, The Sixth Affiliated Hospital, 26469Sun Yat-sen University, Guangzhou, Guangdong, China.

373651Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, Guangdong Institute of Gastroenterology, The Sixth Affiliated Hospital, 26469Sun Yat-sen University, Guangzhou, Guangdong, China.

出版信息

Technol Cancer Res Treat. 2021 Jan-Dec;20:15330338211058352. doi: 10.1177/15330338211058352.

Abstract

Current diagnostic methods for colorectal cancer (CRC) are colonoscopy and sigmoidoscopy, which are invasive and complex procedures with possible complications. This study aimed to determine models for CRC identification that involve minimally invasive, affordable, portable, and accurate screening variables. This was a retrospective study that used data from electronic medical records of patients with CRC and healthy individuals between July 2017 and June 2018. Laboratory data, including liver enzymes, lipid profiles, complete blood counts, and tumor biomarkers, were extracted from the electronic medical records. Five machine learning models (logistic regression, random forest, k-nearest neighbors, support vector machine [SVM], and naïve Bayes) were used to identify CRC. The performances were evaluated using the areas under the curve (AUCs), sensitivity, specificity, positive predictive values (PPV), and negative predictive values (NPV). A total of 1164 electronic medical records (CRC patients: 582; healthy controls: 582) were included. The logistic regression model achieved the highest performance in identifying CRC (AUC: 0.865, sensitivity: 89.5%, specificity: 83.5%, PPV: 84.4%, NPV: 88.9%). The first four weighted features in the model were carcinoembryonic antigen (CEA), hemoglobin (HGB), lipoprotein (a) (Lp(a)), and high-density lipoprotein (HDL). A diagnostic model for CRC was established based on the four indicators, with an AUC of 0.849 (0.840-0.860) for identifying all CRC patients, and it performed best in discriminating patients with late colon cancer from healthy individuals with an AUC of 0.905 (0.889-0.929). The logistic regression model based on CEA, HGB, Lp(a), and HDL might be a powerful, noninvasive, and cost-effective method to identify CRC.

摘要

目前用于结直肠癌(CRC)的诊断方法为结肠镜检查和乙状结肠镜检查,这两种方法均具有侵入性和复杂性,且可能引发并发症。本研究旨在确定涉及微创、经济实惠、便携且准确的筛查变量的 CRC 识别模型。这是一项回顾性研究,使用了 2017 年 7 月至 2018 年 6 月间 CRC 患者和健康个体的电子病历数据。从电子病历中提取了实验室数据,包括肝酶、血脂谱、全血细胞计数和肿瘤生物标志物。使用了五种机器学习模型(逻辑回归、随机森林、k-最近邻、支持向量机[ SVM ]和朴素贝叶斯)来识别 CRC。使用曲线下面积(AUC)、敏感性、特异性、阳性预测值(PPV)和阴性预测值(NPV)来评估性能。共纳入 1164 份电子病历(CRC 患者:582 例;健康对照者:582 例)。逻辑回归模型在识别 CRC 方面表现最佳(AUC:0.865,敏感性:89.5%,特异性:83.5%,PPV:84.4%,NPV:88.9%)。模型中的前四个加权特征是癌胚抗原(CEA)、血红蛋白(HGB)、脂蛋白(a)(Lp(a))和高密度脂蛋白(HDL)。基于这四个指标建立了 CRC 诊断模型,该模型对所有 CRC 患者的识别 AUC 为 0.849(0.840-0.860),在区分晚期结肠癌患者与健康个体方面表现最佳,AUC 为 0.905(0.889-0.929)。基于 CEA、HGB、Lp(a)和 HDL 的逻辑回归模型可能是一种强大、非侵入性且具有成本效益的 CRC 识别方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/8606732/2f67d8c528c8/10.1177_15330338211058352-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索