Department of Statistics, Keimyung University, Daegu 42601, South Korea.
Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
Bioinformatics. 2018 Apr 15;34(8):1321-1328. doi: 10.1093/bioinformatics/btx765.
With the prevalent usage of microarray and massively parallel sequencing, numerous high-throughput omics datasets have become available in the public domain. Integrating abundant information among omics datasets is critical to elucidate biological mechanisms. Due to the high-dimensional nature of the data, methods such as principal component analysis (PCA) have been widely applied, aiming at effective dimension reduction and exploratory visualization.
In this article, we combine multiple omics datasets of identical or similar biological hypothesis and introduce two variations of meta-analytic framework of PCA, namely MetaPCA. Regularization is further incorporated to facilitate sparse feature selection in MetaPCA. We apply MetaPCA and sparse MetaPCA to simulations, three transcriptomic meta-analysis studies in yeast cell cycle, prostate cancer, mouse metabolism and a TCGA pan-cancer methylation study. The result shows improved accuracy, robustness and exploratory visualization of the proposed framework.
An R package MetaPCA is available online. (http://tsenglab.biostat.pitt.edu/software.htm).
Supplementary data are available at Bioinformatics online.
随着微阵列和大规模平行测序的广泛应用,大量高通量组学数据集已经在公共领域中可用。整合组学数据集中丰富的信息对于阐明生物学机制至关重要。由于数据的高维性质,已经广泛应用了主成分分析(PCA)等方法,旨在实现有效的降维和探索性可视化。
在本文中,我们将相同或相似生物学假设的多个组学数据集结合起来,并引入了两种 PCA 的荟萃分析框架变体,即 MetaPCA。进一步纳入正则化以促进 MetaPCA 中的稀疏特征选择。我们将 MetaPCA 和稀疏 MetaPCA 应用于模拟、酵母细胞周期、前列腺癌、小鼠代谢的三个转录组荟萃分析研究以及 TCGA 泛癌甲基化研究。结果表明,所提出的框架提高了准确性、稳健性和探索性可视化。
一个名为 MetaPCA 的 R 包可在线获得。(http://tsenglab.biostat.pitt.edu/software.htm)。
补充数据可在 Bioinformatics 在线获得。