School of Computer Science, McGill University, Montreal H3A 0G4, Canada.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i299-i306. doi: 10.1093/bioinformatics/btac259.
The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA-protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods.
In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA-RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA-RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results.
The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM.
Supplementary data are available at Bioinformatics online.
在组学研究中,与基因组序列相关的调控功能的计算预测至关重要,这有助于我们理解庞大的基因调控网络背后的潜在机制。该领域的突出例子包括 DNA 调控区域中转录因子的结合预测,以及在后转录基因表达的情况下预测 RNA-蛋白质相互作用。然而,现有的计算方法存在高假阳性率的问题,并且很少利用任何进化信息,尽管在众多现存和祖先基因组中都有大量的同源数据,这为提高现有计算方法的准确性提供了机会。
在这项研究中,我们提出了一种名为 PhyloPGM 的新概率方法,该方法通过从各种同源区域聚合先前训练的 TFBS 或 RNA-RBP 结合预测器的预测,从而提高了对人类序列的整体预测准确性。在我们的实验中,PhyloPGM 与基线相比有显著的改进,例如基于序列的 RNA-RBP 结合预测器 RNATracker 和基于序列的 TFBS 预测器 FactorNet。PhyloPGM 在原理上简单,易于实现,但却取得了令人印象深刻的结果。
PhyloPGM 包可在 https://github.com/BlanchetteLab/PhyloPGM 上获得。
补充数据可在生物信息学在线获得。