Xie Xin-Ping, Xie Yu-Feng, Wang Hong-Qiang
School of Mathematics and Physics, Anhui Jianzhu University, Hefei, Anhui, 230022, China.
Cancer Hospital, CAS, Hefei, Anhui, 230031, China.
BMC Bioinformatics. 2017 Aug 23;18(1):375. doi: 10.1186/s12859-017-1794-6.
Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal.
This paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis.
Data heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.
组学数据的大规模积累给生物信息学中多个数据集的综合分析带来了紧迫挑战。这种综合分析的一个悬而未决的问题是如何在各项研究中精准找出一致但细微的基因活性模式。为实现这一目标,需要谨慎处理研究异质性问题。
本文提出了一种基于调控概率模型的荟萃分析方法jGRP,用于识别差异表达基因(DEG)。该方法在基因调控空间而非基因表达空间整合多个转录组数据集,这使得它易于捕捉和管理来自不同实验室或平台的研究中的数据异质性。具体而言,我们通过数学定义两种条件之间的两个基因调控事件并估计它们在样本中的发生概率,将基因表达谱转化为跨研究的统一基因调控谱。最后,基于基因调控谱建立了一种新的差异表达统计量,实现了在基因调控空间中准确且灵活地识别差异表达基因。我们在模拟数据和真实世界的癌症数据集上评估了所提出的方法,并展示了jGRP在荟萃分析背景下识别差异表达基因的有效性和效率。
数据异质性在很大程度上影响差异表达基因识别的荟萃分析性能。现有的不同荟萃分析方法对研究异质性表现出非常不同程度的敏感性。所提出的方法jGRP因其统一的框架和处理研究异质性的可控方式,可以成为一个独立的工具。