Suppr超能文献

采用机器学习方法鉴定食管鳞癌的关键预后分子。

Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma.

机构信息

School of Information Engineering of Henan University of Science and Technology, 263 Kaiyuan Road, Luolong Qu, Luoyang, 471023, P. R. China.

Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment; Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital, College of Clinical Medicine, Medical College of Henan University of Science and Technology, 24 Jinghua Road, Jianxi Qu, Luoyang, 471003, P. R. China.

出版信息

BMC Cancer. 2021 Aug 9;21(1):906. doi: 10.1186/s12885-021-08647-1.

Abstract

BACKGROUND

A plethora of prognostic biomarkers for esophageal squamous cell carcinoma (ESCC) that have hitherto been reported are challenged with low reproducibility due to high molecular heterogeneity of ESCC. The purpose of this study was to identify the optimal biomarkers for ESCC using machine learning algorithms.

METHODS

Biomarkers related to clinical survival, recurrence or therapeutic response of patients with ESCC were determined through literature database searching. Forty-eight biomarkers linked to recurrence or prognosis of ESCC were used to construct a molecular interaction network based on NetBox and then to identify the functional modules. Publicably available mRNA transcriptome data of ESCC downloaded from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets included GSE53625 and TCGA-ESCC. Five machine learning algorithms, including logical regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF) and XGBoost, were used to develop classifiers for prognostic classification for feature selection. The area under ROC curve (AUC) was used to evaluate the performance of the prognostic classifiers. The importances of identified molecules were ranked by their occurrence frequencies in the prognostic classifiers. Kaplan-Meier survival analysis and log-rank test were performed to determine the statistical significance of overall survival.

RESULTS

A total of 48 clinically proven molecules associated with ESCC progression were used to construct a molecular interaction network with 3 functional modules comprising 17 component molecules. The 131,071 prognostic classifiers using these 17 molecules were built for each machine learning algorithm. Using the occurrence frequencies in the prognostic classifiers with AUCs greater than the mean value of all 131,071 AUCs to rank importances of these 17 molecules, stratifin encoded by SFN was identified as the optimal prognostic biomarker for ESCC, whose performance was further validated in another 2 independent cohorts.

CONCLUSION

The occurrence frequencies across various feature selection approaches reflect the degree of clinical importance and stratifin is an optimal prognostic biomarker for ESCC.

摘要

背景

迄今为止,已有大量报道的用于食管鳞癌(ESCC)的预后生物标志物由于 ESCC 的分子异质性很高,其重复性受到挑战。本研究旨在使用机器学习算法确定 ESCC 的最佳生物标志物。

方法

通过文献数据库搜索确定与 ESCC 患者临床生存、复发或治疗反应相关的生物标志物。使用与 ESCC 复发或预后相关的 48 个生物标志物,基于 NetBox 构建分子相互作用网络,然后识别功能模块。从基因表达综合数据库(GEO)和癌症基因组图谱(TCGA)数据集下载可公开获得的 ESCC mRNA 转录组数据,包括 GSE53625 和 TCGA-ESCC。使用 5 种机器学习算法,包括逻辑回归(LR)、支持向量机(SVM)、人工神经网络(ANN)、随机森林(RF)和 XGBoost,为特征选择开发预后分类的分类器。ROC 曲线下面积(AUC)用于评估预后分类器的性能。通过它们在预后分类器中的出现频率对鉴定出的分子的重要性进行排名。进行 Kaplan-Meier 生存分析和对数秩检验以确定总体生存的统计学意义。

结果

总共使用 48 种经临床证实与 ESCC 进展相关的分子构建了一个分子相互作用网络,该网络包含 3 个功能模块,包含 17 个组成分子。对于每个机器学习算法,使用这 17 个分子构建了 131,071 个预后分类器。使用 AUC 大于所有 131,071 个 AUC 的平均值的预后分类器中的出现频率对这 17 个分子的重要性进行排名,鉴定出编码 stratifin 的 SFN 是 ESCC 的最佳预后生物标志物,其性能在另外 2 个独立队列中得到进一步验证。

结论

各种特征选择方法的出现频率反映了临床重要性的程度,stratifin 是 ESCC 的最佳预后生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cbfe/8351329/23b5a448ddef/12885_2021_8647_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验