Shi Xinyuan, Zhu Fangfang, Min Wenwen
School of Information Science and Engineering, Yunnan University, Kunming, China.
School of Health and Nursing, Yunnan Open University, Kunming, China.
J Comput Biol. 2025 Sep;32(9):850-864. doi: 10.1089/cmb.2024.0884. Epub 2025 Apr 28.
Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.
预测患者的生存结果并评估其风险在理解癌症各个阶段的微生物组成方面起着关键作用。随着深度学习的不断发展,已经证实深度学习有潜力基于微生物数据来分析患者的生存风险。然而,在单个癌症数据集中面临的一个常见挑战是样本量有限以及特征空间的高维度。这种困境常常导致深度学习模型出现过拟合问题,阻碍它们有效提取深度数据表示的能力,从而导致模型性能欠佳。为了克服这些挑战,我们提倡使用预训练和微调策略,这些策略已被证明在解决单个癌症数据集中样本量较小的限制方面是有效的。在本研究中,我们提出了一种深度学习模型,该模型融合了Transformer编码器和变分自编码器(VAE),即VTrans,采用预训练和微调策略,利用微生物数据来预测癌症患者的生存风险。此外,我们强调了扩展VTrans以整合微生物多组学数据的潜力。我们的方法在来自癌症基因组图谱计划的三个不同癌症数据集上进行了评估,研究结果表明:(1)与传统机器学习和其他深度学习模型相比,VTrans在性能方面表现出色。(2)预训练的使用显著提高了其性能。(3)与位置编码相比,采用VAE编码在丰富数据表示方面更有效。(4)利用显著性图的概念,可以观察到哪些微生物对分类结果有高贡献。这些结果证明了VTrans在预测患者生存风险方面的有效性。本文使用的源代码和所有数据集可在https://github.com/wenwenmin/VTrans和https://doi.org/10.5281/zenodo.14166580获取。