School of Statistics, Capital University of Economics and Business, Fengtai, Beijing, China.
Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA.
Bioinformatics. 2017 Jul 15;33(14):2140-2147. doi: 10.1093/bioinformatics/btx138.
Although coexpression analysis via pair-wise expression correlation is popularly used to elucidate gene-gene interactions at the whole-genome scale, many complicated multi-gene regulations require more advanced detection methods. Liquid association (LA) is a powerful tool to detect the dynamic correlation of two gene variables depending on the expression level of a third variable (LA scouting gene). LA detection from single transcriptomic study, however, is often unstable and not generalizable due to cohort bias, biological variation and limited sample size. With the rapid development of microarray and NGS technology, LA analysis combining multiple gene expression studies can provide more accurate and stable results.
In this article, we proposed two meta-analytic approaches for LA analysis (MetaLA and MetaMLA) to combine multiple transcriptomic studies. To compensate demanding computing, we also proposed a two-step fast screening algorithm for more efficient genome-wide screening: bootstrap filtering and sign filtering. We applied the methods to five Saccharomyces cerevisiae datasets related to environmental changes. The fast screening algorithm reduced 98% of running time. When compared with single study analysis, MetaLA and MetaMLA provided stronger detection signal and more consistent and stable results. The top triplets are highly enriched in fundamental biological processes related to environmental changes. Our method can help biologists understand underlying regulatory mechanisms under different environmental exposure or disease states.
A MetaLA R package, data and code for this article are available at http://tsenglab.biostat.pitt.edu/software.htm.
Supplementary data are available at Bioinformatics online.
虽然通过两两表达相关性的共表达分析常用于阐明全基因组范围内的基因-基因相互作用,但许多复杂的多基因调控需要更先进的检测方法。液体关联(LA)是一种强大的工具,可以检测两个基因变量的动态相关性,这取决于第三个变量(LA 侦察基因)的表达水平。然而,由于队列偏差、生物学变异性和有限的样本量,从单个转录组研究中进行 LA 检测通常不稳定且不可推广。随着微阵列和 NGS 技术的快速发展,结合多个基因表达研究的 LA 分析可以提供更准确和稳定的结果。
在本文中,我们提出了两种用于 LA 分析的荟萃分析方法(MetaLA 和 MetaMLA),以结合多个转录组研究。为了弥补计算需求,我们还提出了一种两步快速筛选算法,用于更有效的全基因组筛选:引导过滤和符号过滤。我们将这些方法应用于五个与环境变化相关的酿酒酵母数据集。快速筛选算法将运行时间减少了 98%。与单研究分析相比,MetaLA 和 MetaMLA 提供了更强的检测信号,并且结果更一致和稳定。前三个三重体高度富集与环境变化相关的基本生物学过程。我们的方法可以帮助生物学家理解不同环境暴露或疾病状态下的潜在调节机制。
本文的 MetaLA R 包、数据和代码可在 http://tsenglab.biostat.pitt.edu/software.htm 获得。
补充数据可在生物信息学在线获得。