Li Yan-Rong, Meng Ke, Yang Guang, Liu Bao-Hai, Li Chu-Qiao, Zhang Jia-Yuan, Zhang Xiao-Mei
Department of Gastroenterology, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, China.
Department of Gastroenterology and Hepatology, The First Medical Center, Chinese PLA General Hospital, Beijing, China.
J Gastrointest Oncol. 2022 Jun;13(3):1188-1203. doi: 10.21037/jgo-22-536.
Genetic factors account for approximately 35% of colorectal cancer risk. The specificity and sensitivity of previous diagnostic biomarkers for colorectal cancer could not meet the need of clinical application. The expanding scale and inherent complexity of biological data have encouraged a growing use of machine learning to build informative and predictive models of the underlying biological processes. The aim of this study is to identify diagnostic genes of colorectal cancer by using machine learning methods.
The GSE41328 and GSE106582 data sets were downloaded from the Gene Expression Omnibus (GEO) database. The gene expression differences between colon cancer and normal tissues were analyzed. The key colorectal cancer genes were screened and validated by Least Absolute Shrinkage and Selection Operator (LASSO) and Support Vector Machine (SVM) regression. Immune cell infiltration and the correlation with the key genes in patients with colon cancer were further analyzed by CIBERSORT.
Eleven key genes were identified as biomarkers for colon cancer, namely and . The mean area under the receiver operating characteristic (ROC) curve (AUC) of all 11 genes for colon cancer diagnosis were 0.94 with a range of 0.91-0.97. In the validation set, the expression of the 11 key genes was significantly different between colon cancer and normal subjects (P<0.05) and the mean AUCs were 0.82 with a range of 0.70-0.88. Immune cell infiltration analyses demonstrated that the relative quantity of plasma cells, T cells, B cells, NK cells, MO, M1, Dendritic cells resting, Mast cells resting, Mast cells activated, and Neutrophils in the tumor group were significantly different to the normal group.
, and were identified as the key genes for colon cancer diagnosis. These genes are expected to become novel diagnostic markers and targets of new pharmacotherapies for colorectal cancer.
遗传因素约占结直肠癌风险的35%。先前用于结直肠癌诊断的生物标志物的特异性和敏感性无法满足临床应用需求。生物数据规模的不断扩大及其内在复杂性促使机器学习在构建潜在生物过程的信息性和预测性模型方面的应用日益增加。本研究的目的是使用机器学习方法鉴定结直肠癌的诊断基因。
从基因表达综合数据库(GEO)下载GSE41328和GSE106582数据集。分析结肠癌组织与正常组织之间的基因表达差异。通过最小绝对收缩和选择算子(LASSO)和支持向量机(SVM)回归筛选并验证关键的结直肠癌基因。通过CIBERSORT进一步分析结肠癌患者的免疫细胞浸润情况以及与关键基因的相关性。
鉴定出11个关键基因作为结肠癌的生物标志物,即 和 。用于结肠癌诊断的所有11个基因的受试者操作特征曲线(ROC)下的平均面积(AUC)为0.94,范围为0.91 - 0.97。在验证集中,结肠癌患者与正常受试者之间11个关键基因的表达存在显著差异(P<0.05),平均AUC为0.82,范围为0.70 - 0.88。免疫细胞浸润分析表明,肿瘤组中浆细胞、T细胞、B细胞、NK细胞、单核细胞、M1、静息树突状细胞、静息肥大细胞、活化肥大细胞和中性粒细胞的相对数量与正常组有显著差异。
和 被鉴定为结肠癌诊断的关键基因。这些基因有望成为结直肠癌新的诊断标志物和新药物治疗的靶点。