Hawinkel Stijn, Bijnens Luc, Cao Kim-Anh Lê, Thas Olivier
Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium.
Quantitative Sciences, Janssen Pharmaceutical companies of Johnson and Johnson, 2340 Beerse, Belgium.
NAR Genom Bioinform. 2020 Jul 21;2(3):lqaa050. doi: 10.1093/nargab/lqaa050. eCollection 2020 Sep.
The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package .
数据来自异质来源,信号质量也各不相同。此外,一些组学数据本质上具有构成性,例如序列计数数据。大多数整合方法在处理协变量、缺失值、构成结构和异方差性方面能力有限。在本文中,我们引入了一种基于灵活模型的数据整合方法来解决这些当前的局限性:COMBI。我们将诸如构成双标图和对数比率链接函数等概念与潜在变量模型相结合,并通过多图提出一种有吸引力的可视化方法以改进解释。使用实际数据示例和模拟,我们阐述并将我们的方法与其他数据整合技术进行比较。我们的算法可在R包中获取。