Department of Computer Science, Bar-Ilan University, Ramat Gan, 5290002, Israel.
The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel.
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad252.
Nucleic-acid G-quadruplexes (G4s) play vital roles in many cellular processes. Due to their importance, researchers have developed experimental assays to measure nucleic-acid G4s in high throughput. The generated high-throughput datasets gave rise to unique opportunities to develop machine-learning-based methods, and in particular deep neural networks, to predict G4s in any given nucleic-acid sequence and any species. In this paper, we review the success stories of deep-neural-network applications for G4 prediction. We first cover the experimental technologies that generated the most comprehensive nucleic-acid G4 high-throughput datasets in recent years. We then review classic rule-based methods for G4 prediction. We proceed by reviewing the major machine-learning and deep-neural-network applications to nucleic-acid G4 datasets and report a novel comparison between them. Next, we present the interpretability techniques used on the trained neural networks to learn key molecular principles underlying nucleic-acid G4 folding. As a new result, we calculate the overlap between measured DNA and RNA G4s and compare the performance of DNA- and RNA-G4 predictors on RNA- and DNA-G4 datasets, respectively, to demonstrate the potential of transfer learning from DNA G4s to RNA G4s. Last, we conclude with open questions in the field of nucleic-acid G4 prediction and computational modeling.
核酸 G-四链体(G4s)在许多细胞过程中发挥着重要作用。由于其重要性,研究人员已经开发了实验方法来高通量测量核酸 G4s。这些高通量数据集的产生为开发基于机器学习的方法,特别是深度神经网络,提供了独特的机会,以便在任何给定的核酸序列和任何物种中预测 G4s。在本文中,我们回顾了深度神经网络在 G4 预测中的成功案例。我们首先介绍了近年来生成最全面的核酸 G4 高通量数据集的实验技术。然后,我们回顾了经典的基于规则的 G4 预测方法。接下来,我们回顾了主要的机器学习和深度神经网络在核酸 G4 数据集上的应用,并报告了它们之间的新比较。接下来,我们介绍了用于训练神经网络的可解释性技术,以学习核酸 G4 折叠的关键分子原理。作为一个新的结果,我们计算了测量的 DNA 和 RNA G4s 之间的重叠,并比较了 DNA 和 RNA-G4 预测器在 RNA 和 DNA-G4 数据集上的性能,以证明从 DNA G4 到 RNA G4 的迁移学习的潜力。最后,我们总结了核酸 G4 预测和计算建模领域的开放性问题。