Liang Xiao, Young William Chad, Hung Ling-Hong, Raftery Adrian E, Yeung Ka Yee
Department of Computer Science, Virginia Tech, Blacksburg, Virginia.
Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington.
J Comput Biol. 2019 Oct;26(10):1113-1129. doi: 10.1089/cmb.2019.0036. Epub 2019 Apr 22.
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources, including gene expression data, genome-wide binding data, gene ontology, and known pathways, and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks as well as extends some previous Bayesian frameworks both in theory and applications. We apply our method to two different human cell lines, namely skin melanoma cell line A375 and lung cancer cell line A549, to illustrate the capabilities of our method. Our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines.
从大规模人类基因组数据推断基因网络具有挑战性,因为在高维搜索空间中为每个基因识别正确的调控因子存在困难。我们提出了一种贝叶斯方法,将外部数据源与来自人类细胞系的基因敲低数据相结合,以推断基因调控网络。具体而言,我们整合了多个数据源,包括基因表达数据、全基因组结合数据、基因本体和已知通路,并使用监督学习框架来计算调控关系的先验概率。我们表明,我们的整合方法提高了推断基因网络的准确性,并且在理论和应用方面都扩展了一些先前的贝叶斯框架。我们将我们的方法应用于两种不同的人类细胞系,即皮肤黑色素瘤细胞系A375和肺癌细胞系A549,以说明我们方法的能力。我们的结果表明,性能的提升可能因细胞系而异,并且如果我们希望针对不同的细胞系获得更高的准确性,可能需要选择不同的外部数据源作为先验知识。