Huo Zhiguang, Song Chi, Tseng George
Department of Biostatistics University of Florida Gainesville, FL 32611
Division of Biostatistics College of Public Health The Ohio State University Columbus, OH 43210
Ann Appl Stat. 2019 Mar;13(1):340-366. doi: 10.1214/18-AOAS1188. Epub 2019 Apr 10.
Due to the rapid development of high-throughput experimental techniques and fast-dropping prices, many transcriptomic datasets have been generated and accumulated in the public domain. Meta-analysis combining multiple transcriptomic studies can increase the statistical power to detect disease-related biomarkers. In this paper, we introduce a Bayesian latent hierarchical model to perform transcriptomic meta-analysis. This method is capable of detecting genes that are differentially expressed (DE) in only a subset of the combined studies, and the latent variables help quantify homogeneous and heterogeneous differential expression signals across studies. A tight clustering algorithm is applied to detected biomarkers to capture differential meta-patterns that are informative to guide further biological investigation. Simulations and three examples, including a microarray dataset from metabolism-related knockout mice, an RNA-seq dataset from HIV transgenic rats, and cross-platform datasets from human breast cancer, are used to demonstrate the performance of the proposed method.
由于高通量实验技术的快速发展以及价格的迅速下降,许多转录组数据集已在公共领域生成并积累。结合多个转录组研究的荟萃分析可以提高检测疾病相关生物标志物的统计能力。在本文中,我们引入了一种贝叶斯潜在分层模型来进行转录组荟萃分析。该方法能够检测仅在部分合并研究中差异表达(DE)的基因,并且潜在变量有助于量化各研究间的同质和异质差异表达信号。一种紧密聚类算法应用于检测到的生物标志物,以捕获有助于指导进一步生物学研究的差异元模式。通过模拟和三个实例,包括来自代谢相关基因敲除小鼠的微阵列数据集、来自HIV转基因大鼠的RNA测序数据集以及来自人类乳腺癌的跨平台数据集,来证明所提出方法的性能。