IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1827-1839. doi: 10.1109/TCBB.2021.3057241. Epub 2022 Jun 3.
Laboratory gene regulatory data for a species are sporadic. Despite the abundance of gene regulatory network algorithms that employ single data sets, few algorithms can combine the vast but disperse sources of data and extract the potential information. With a motivation to compensate for this shortage, we developed an algorithm called GENEREF that can accumulate information from multiple types of data sets in an iterative manner, with each iteration boosting the performance of the prediction results.
The algorithm is examined extensively on data extracted from the quintuple DREAM4 networks and DREAM5's Escherichia coli and Saccharomyces cerevisiae networks and sub-networks. Many single-dataset and multi-dataset algorithms were compared to test the performance of the algorithm. Results show that GENEREF surpasses non-ensemble state-of-the-art multi-perturbation algorithms on the selected networks and is competitive to present multiple-dataset algorithms. Specifically, it outperforms dynGENIE3 and is on par with iRafNet. Also, we argued that a scoring method solely based on the AUPR criterion would be more trustworthy than the traditional score.
The Python implementation along with the data sets and results can be downloaded from github.com/msaremi/GENEREF.
针对某一物种的实验室基因调控数据较为零散。尽管有大量的基因调控网络算法仅采用单一数据集,但很少有算法能够整合丰富但分散的数据源,并从中提取潜在信息。为弥补这一不足,我们开发了一种名为 GENEREF 的算法,该算法可以迭代地从多种类型的数据集累积信息,每一次迭代都能提升预测结果的性能。
我们在从 DREAM4 网络的五重网络和 DREAM5 的大肠杆菌及酿酒酵母网络和子网络中提取的数据上对该算法进行了广泛的检验。我们比较了许多单数据集和多数据集算法,以测试该算法的性能。结果表明,在所选网络上,GENEREF 优于非集成的最先进的多扰动算法,且与现有的多数据集算法具有竞争力。具体而言,它优于 dynGENIE3,与 iRafNet 旗鼓相当。此外,我们认为仅基于 AUPR 标准的评分方法比传统评分更可靠。
可从 github.com/msaremi/GENEREF 下载 Python 实现以及数据集和结果。