Schmich Fabian, Kuipers Jack, Merdes Gunter, Beerenwinkel Niko
Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland.
SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058 Basel, Switzerland.
Stat Appl Genet Mol Biol. 2019 Mar 6;18(3):sagmb-2018-0033. doi: 10.1515/sagmb-2018-0033.
In the post-genomic era of big data in biology, computational approaches to integrate multiple heterogeneous data sets become increasingly important. Despite the availability of large amounts of omics data, the prioritisation of genes relevant for a specific functional pathway based on genetic screening experiments, remains a challenging task. Here, we introduce netprioR, a probabilistic generative model for semi-supervised integrative prioritisation of hit genes. The model integrates multiple network data sets representing gene-gene similarities and prior knowledge about gene functions from the literature with gene-based covariates, such as phenotypes measured in genetic perturbation screens, for example, by RNA interference or CRISPR/Cas9. We evaluate netprioR on simulated data and show that the model outperforms current state-of-the-art methods in many scenarios and is on par otherwise. In an application to real biological data, we integrate 22 network data sets, 1784 prior knowledge class labels and 3840 RNA interference phenotypes in order to prioritise novel regulators of Notch signalling in Drosophila melanogaster. The biological relevance of our predictions is evaluated using in silico and in vivo experiments. An efficient implementation of netprioR is available as an R package at http://bioconductor.org/packages/netprioR.
在生物学大数据的后基因组时代,整合多个异构数据集的计算方法变得越来越重要。尽管有大量的组学数据可用,但基于基因筛选实验对与特定功能途径相关的基因进行优先级排序仍然是一项具有挑战性的任务。在这里,我们介绍了netprioR,这是一种用于对命中基因进行半监督综合优先级排序的概率生成模型。该模型将多个代表基因-基因相似性的网络数据集以及来自文献的关于基因功能的先验知识与基于基因的协变量整合在一起,例如在基因扰动筛选中测量的表型,例如通过RNA干扰或CRISPR/Cas9。我们在模拟数据上评估了netprioR,结果表明该模型在许多情况下优于当前的先进方法,在其他情况下与它们相当。在对真实生物数据的应用中,我们整合了22个网络数据集、1784个先验知识类别标签和3840个RNA干扰表型,以便对黑腹果蝇中Notch信号通路的新型调节因子进行优先级排序。我们使用计算机模拟和体内实验评估了我们预测的生物学相关性。netprioR的高效实现可作为R包在http://bioconductor.org/packages/netprioR上获取。