Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, 423 Guardian Dr, Philadelphia, 19104, USA.
BMC Bioinformatics. 2020 Apr 15;21(1):141. doi: 10.1186/s12859-020-3455-4.
Multiple co-inertia analysis (mCIA) is a multivariate analysis method that can assess relationships and trends in multiple datasets. Recently it has been used for integrative analysis of multiple high-dimensional -omics datasets. However, its estimated loading vectors are non-sparse, which presents challenges for identifying important features and interpreting analysis results. We propose two new mCIA methods: 1) a sparse mCIA method that produces sparse loading estimates and 2) a structured sparse mCIA method that further enables incorporation of structural information among variables such as those from functional genomics.
Our extensive simulation studies demonstrate the superior performance of the sparse mCIA and structured sparse mCIA methods compared to the existing mCIA in terms of feature selection and estimation accuracy. Application to the integrative analysis of transcriptomics data and proteomics data from a cancer study identified biomarkers that are suggested in the literature related with cancer disease.
Proposed sparse mCIA achieves simultaneous model estimation and feature selection and yields analysis results that are more interpretable than the existing mCIA. Furthermore, proposed structured sparse mCIA can effectively incorporate prior network information among genes, resulting in improved feature selection and enhanced interpretability.
多重共惰性分析(mCIA)是一种多元分析方法,可评估多个数据集之间的关系和趋势。最近,它已被用于多个高维组学数据集的综合分析。然而,其估计的加载向量是非稀疏的,这给识别重要特征和解释分析结果带来了挑战。我们提出了两种新的 mCIA 方法:1)产生稀疏加载估计的稀疏 mCIA 方法,2)进一步能够在变量之间(如功能基因组学)纳入结构信息的结构稀疏 mCIA 方法。
我们广泛的模拟研究表明,在特征选择和估计准确性方面,稀疏 mCIA 和结构稀疏 mCIA 方法的性能优于现有的 mCIA。应用于癌症研究中转录组学数据和蛋白质组学数据的综合分析,确定了文献中与癌症疾病相关的生物标志物。
提出的稀疏 mCIA 实现了同时的模型估计和特征选择,并产生了比现有 mCIA 更具可解释性的分析结果。此外,提出的结构稀疏 mCIA 可以有效地在基因之间纳入先验网络信息,从而改善特征选择并增强可解释性。