van der Kloet Frans M, Sebastián-León Patricia, Conesa Ana, Smilde Age K, Westerhuis Johan A
Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098, XH, Amsterdam, The Netherlands.
Computational Genomics Program, Centro de Investigaciones Príncipe Felipe, Valencia, Spain.
BMC Bioinformatics. 2016 Jun 6;17 Suppl 5(Suppl 5):195. doi: 10.1186/s12859-016-1037-2.
Joint and individual variation explained (JIVE), distinct and common simultaneous component analysis (DISCO) and O2-PLS, a two-block (X-Y) latent variable regression method with an integral OSC filter can all be used for the integrated analysis of multiple data sets and decompose them in three terms: a low(er)-rank approximation capturing common variation across data sets, low(er)-rank approximations for structured variation distinctive for each data set, and residual noise. In this paper these three methods are compared with respect to their mathematical properties and their respective ways of defining common and distinctive variation.
The methods are all applied on simulated data and mRNA and miRNA data-sets from GlioBlastoma Multiform (GBM) brain tumors to examine their overlap and differences. When the common variation is abundant, all methods are able to find the correct solution. With real data however, complexities in the data are treated differently by the three methods.
All three methods have their own approach to estimate common and distinctive variation with their specific strength and weaknesses. Due to their orthogonality properties and their used algorithms their view on the data is slightly different. By assuming orthogonality between common and distinctive, true natural or biological phenomena that may not be orthogonal at all might be misinterpreted.
联合与个体变异解释(JIVE)、独特与共同同时成分分析(DISCO)以及O2-PLS(一种带有积分正交信号校正滤波器的双块(X-Y)潜变量回归方法)均可用于多个数据集的综合分析,并将其分解为三个部分:一个低秩近似,用于捕捉数据集之间的共同变异;针对每个数据集独特的结构化变异的低秩近似;以及残余噪声。本文将对这三种方法在数学性质以及定义共同变异和独特变异的各自方式方面进行比较。
这些方法均应用于模拟数据以及来自多形性胶质母细胞瘤(GBM)脑肿瘤的mRNA和miRNA数据集,以检验它们的重叠与差异。当共同变异丰富时,所有方法都能够找到正确的解决方案。然而,对于真实数据,这三种方法处理数据复杂性的方式有所不同。
所有这三种方法都有各自估计共同变异和独特变异的方法,各有其特定的优势和劣势。由于它们的正交性性质以及所使用的算法,它们对数据的看法略有不同。通过假设共同变异和独特变异之间的正交性,可能会误解那些根本可能并非正交的真实自然或生物学现象。