Vegas Esteban, Oller Josep M, Reverter Ferran
Department of Statistics, University of Barcelona, Diagonal, 643, Barcelona, 08028, Spain.
Center of Genomic Regulation, Parc de Recerca Biomedica de Barcelona, Dr. Aiguader, 88, Barcelona, 08003, Spain.
BMC Bioinformatics. 2016 Jun 6;17 Suppl 5(Suppl 5):205. doi: 10.1186/s12859-016-1046-1.
Pathway expression is multivariate in nature. Thus, from a statistical perspective, to detect differentially expressed pathways between two conditions, methods for inferring differences between mean vectors need to be applied. Maximum mean discrepancy (MMD) is a statistical test to determine whether two samples are from the same distribution, its implementation being greatly simplified using the kernel method.
An MMD-based test successfully detected the differential expression between two conditions, specifically the expression of a set of genes involved in certain fatty acid metabolic pathways. Furthermore, we exploited the ability of the kernel method to integrate data and successfully added hepatic fatty acid levels to the test procedure.
MMD is a non-parametric test that acquires several advantages when combined with the kernelization of data: 1) the number of variables can be greater than the sample size; 2) omics data can be integrated; 3) it can be applied not only to vectors, but to strings, sequences and other common structured data types arising in molecular biology.
通路表达本质上是多变量的。因此,从统计学角度来看,要检测两种条件之间差异表达的通路,需要应用推断均值向量差异的方法。最大均值差异(MMD)是一种用于确定两个样本是否来自同一分布的统计检验,使用核方法可大大简化其实现过程。
基于MMD的检验成功检测到了两种条件之间的差异表达,特别是一组参与特定脂肪酸代谢通路的基因的表达。此外,我们利用核方法整合数据的能力,成功地将肝脏脂肪酸水平添加到了检验过程中。
MMD是一种非参数检验,与数据的核化相结合时具有多个优点:1)变量数量可以大于样本量;2)组学数据可以整合;3)它不仅可以应用于向量,还可以应用于分子生物学中出现的字符串、序列和其他常见的结构化数据类型。