Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China.
Department of Biostatistics, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing 210008, China.
Comput Math Methods Med. 2022 Oct 27;2022:5844846. doi: 10.1155/2022/5844846. eCollection 2022.
Patients (363 in total) with stomach adenocarcinoma from The Cancer Genome Atlas (TCGA) cohort were included. An autoencoder was constructed to integrate the RNA sequencing, miRNA sequencing, and methylation data. The features of the bottleneck layer were used to perform the -means clustering algorithm to obtain different subgroups for evaluating the prognosis-related risk of stomach adenocarcinoma. The model's robustness was verified using a 10-fold cross-validation (CV). Survival was analyzed by the Kaplan-Meier method. Univariate and multivariate Cox regression was used to estimate hazard risk. The model was validated in three independent cohorts with different endpoints.
The patients were divided into low-risk and high-risk groups according to the -means clustering algorithm. The high-risk group had a significantly higher risk of poor survival (log-rank value = 2.80 - 06; adjusted hazard ratio = 2.386, 95% confidence interval: 1.607~3.543), a concordance index (C-index) of 0.714, and a Brier score of 0.184. The model performed well both in the 10-fold CV procedure and three independent cohorts from the Gene Expression Omnibus (GEO) repository.
A robust and generalizable model based on the autoencoder was proposed to integrate multiomics data and predict the prognosis of patients with stomach adenocarcinoma. The model demonstrates better performance than two alternative approaches on prognosis prediction. The results might provide the grounds for further exploring the potential biomarkers to predict the prognosis of patients with stomach adenocarcinoma.
纳入癌症基因组图谱(TCGA)队列中总计 363 例胃腺癌患者。构建自编码器以整合 RNA 测序、miRNA 测序和甲基化数据。使用瓶颈层的特征执行均值聚类算法,以获得不同亚组,用于评估胃腺癌的预后相关风险。使用 10 折交叉验证(CV)验证模型的稳健性。通过 Kaplan-Meier 方法分析生存情况。使用单变量和多变量 Cox 回归来估计危险风险。在具有不同终点的三个独立队列中验证模型。
根据均值聚类算法,患者被分为低风险组和高风险组。高风险组的生存不良风险明显更高(对数秩检验值=2.80-06;调整后的危险比=2.386,95%置信区间:1.607~3.543),一致性指数(C 指数)为 0.714,Brier 评分 0.184。该模型在 10 折 CV 过程以及来自基因表达综合数据库(GEO)存储库的三个独立队列中均表现良好。
提出了一种基于自编码器的稳健且可推广的模型,用于整合多组学数据并预测胃腺癌患者的预后。该模型在预后预测方面的表现优于两种替代方法。结果可能为进一步探索预测胃腺癌患者预后的潜在生物标志物提供依据。