Department of Computer Science, Aalto University, Espoo, Finland.
PLoS One. 2024 Sep 9;19(9):e0309921. doi: 10.1371/journal.pone.0309921. eCollection 2024.
Multi-omics analysis offers a promising avenue to a better understanding of complex biological phenomena. In particular, untangling the pathophysiology of multifactorial health conditions such as the inflammatory bowel disease (IBD) could benefit from simultaneous consideration of several omics levels. However, taking full advantage of multi-omics data requires the adoption of suitable new tools. Multi-view learning, a machine learning technique that natively joins together heterogeneous data, is a natural source for such methods. Here we present a new approach to variable selection in unsupervised multi-view learning by applying stability selection to canonical correlation analysis (CCA). We apply our method, StabilityCCA, to simulated and real multi-omics data, and demonstrate its ability to find relevant variables and improve the stability of variable selection. In a case study on an IBD microbiome data set, we link together metagenomics and metabolomics, revealing a connection between their joint structure and the disease, and identifying potential biomarkers. Our results showcase the usefulness of multi-view learning in multi-omics analysis and demonstrate StabilityCCA as a powerful tool for biomarker discovery.
多组学分析为更好地理解复杂的生物现象提供了一条很有前途的途径。特别是,梳理像炎症性肠病(IBD)这样的多因素健康状况的病理生理学可以从同时考虑几个组学层面中受益。然而,要充分利用多组学数据,需要采用合适的新工具。多视图学习是一种机器学习技术,可以自然地将异构数据结合在一起,它是此类方法的一个自然来源。在这里,我们通过将稳定性选择应用于典型相关分析(CCA),提出了一种在无监督多视图学习中进行变量选择的新方法。我们将我们的方法 StabilityCCA 应用于模拟和真实的多组学数据,并证明了它找到相关变量和提高变量选择稳定性的能力。在一个 IBD 微生物组数据集的案例研究中,我们将宏基因组学和代谢组学联系起来,揭示了它们的联合结构与疾病之间的联系,并确定了潜在的生物标志物。我们的结果展示了多视图学习在多组学分析中的有用性,并证明了 StabilityCCA 作为一种强大的生物标志物发现工具的作用。