Wang Chao, O'Connell Michael J
Ben May Department for Cancer Research, University of Chicago, 929 E. 57th St., Chicago, IL, 60637, USA.
Department of Statistics, Miami University, 105 Tallawanda Rd., Oxford, OH, 45056, USA.
BMC Bioinformatics. 2025 Aug 19;26(1):214. doi: 10.1186/s12859-025-06245-7.
In cancer research, different levels of high-dimensional data are often collected for the same subjects. Effective integration of these data by considering the shared and specific information from each data source can help us better understand different types of cancer.
In this study we propose a novel autoencoder (AE) structure with explicitly defined orthogonal loss between the shared and specific embeddings to integrate different data sources. We compare our model with previously proposed AE structures based on simulated data and real cancer data from The Cancer Genome Atlas. Using simulations with different proportions of differentially expressed genes, we compare the performance of AE methods for subsequent classification tasks. We also compare the model performance with a commonly used dimension reduction method, joint and individual variance explained (JIVE). In terms of reconstruction loss, our proposed AE models with orthogonal constraints have a slightly better reconstruction loss. All AE models achieve higher classification accuracy than the original features, demonstrating the usefulness of the embeddings extracted by the model.
We show that the proposed models have consistently high classification accuracy on both training and testing sets. In comparison, the recently proposed MOCSS model that imposes an orthogonality penalty in the post-processing step has lower classification accuracy that is on par with JIVE.
在癌症研究中,通常会为同一组受试者收集不同层次的高维数据。通过考虑每个数据源的共享信息和特定信息来有效整合这些数据,有助于我们更好地理解不同类型的癌症。
在本研究中,我们提出了一种新颖的自动编码器(AE)结构,在共享嵌入和特定嵌入之间明确定义了正交损失,以整合不同的数据源。我们基于模拟数据和来自癌症基因组图谱的真实癌症数据,将我们的模型与先前提出的AE结构进行比较。通过使用具有不同比例差异表达基因的模拟,我们比较了AE方法在后续分类任务中的性能。我们还将模型性能与常用的降维方法——联合和个体方差解释(JIVE)进行比较。在重构损失方面,我们提出的具有正交约束的AE模型具有略好的重构损失。所有AE模型都比原始特征实现了更高的分类准确率,证明了模型提取的嵌入的有用性。
我们表明,所提出的模型在训练集和测试集上都具有始终如一的高分类准确率。相比之下,最近提出的在后期处理步骤中施加正交惩罚的MOCSS模型具有较低的分类准确率,与JIVE相当。