IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 61266 Brno, Czech Republic.
Department of Information Technology, Faculty of Informatics, Masaryk University, 60200 Brno, Czech Republic.
Bioinformatics. 2017 Nov 1;33(21):3373-3379. doi: 10.1093/bioinformatics/btx413.
G-quadruplexes (G4s) are one of the non-B DNA structures easily observed in vitro and assumed to form in vivo. The latest experiments with G4-specific antibodies and G4-unwinding helicase mutants confirm this conjecture. These four-stranded structures have also been shown to influence a range of molecular processes in cells. As G4s are intensively studied, it is often desirable to screen DNA sequences and pinpoint the precise locations where they might form.
We describe and have tested a newly developed Bioconductor package for identifying potential quadruplex-forming sequences (PQS). The package is easy-to-use, flexible and customizable. It allows for sequence searches that accommodate possible divergences from the optimal G4 base composition. A novel aspect of our research was the creation and training (parametrization) of an advanced scoring model which resulted in increased precision compared to similar tools. We demonstrate that the algorithm behind the searches has a 96% accuracy on 392 currently known and experimentally observed G4 structures. We also carried out searches against the recent G4-seq data to verify how well we can identify the structures detected by that technology. The correlation with pqsfinder predictions was 0.622, higher than the correlation 0.491 obtained with the second best G4Hunter.
http://bioconductor.org/packages/pqsfinder/ This paper is based on pqsfinder-1.4.1.
Supplementary data are available at Bioinformatics online.
G-四链体(G4s)是一种易于在体外观察到的非 B 型 DNA 结构,据推测在体内形成。最新的 G4 特异性抗体和 G4 解旋酶突变体实验证实了这一推测。这些四链结构还被证明会影响细胞内的一系列分子过程。随着 G4s 的深入研究,通常需要筛选 DNA 序列并确定它们可能形成的精确位置。
我们描述并测试了一个新开发的 Bioconductor 包,用于识别潜在的四链体形成序列(PQS)。该软件包易于使用、灵活且可定制。它允许进行序列搜索,以适应与最佳 G4 碱基组成可能存在的偏差。我们研究的一个新方面是创建和训练(参数化)一个先进的评分模型,与类似的工具相比,这提高了精度。我们证明,搜索背后的算法在 392 个已知和实验观察到的 G4 结构上的准确率为 96%。我们还对最近的 G4-seq 数据进行了搜索,以验证我们识别该技术检测到的结构的能力。与 pqsfinder 预测的相关性为 0.622,高于与第二个最佳 G4Hunter 获得的 0.491 的相关性。
http://bioconductor.org/packages/pqsfinder/ 本文基于 pqsfinder-1.4.1。
补充数据可在 Bioinformatics 在线获得。