Department of Biomedical Engineering, Izmir University of Economics, Izmir, Turkey.
J Theor Biol. 2019 Feb 21;463:92-98. doi: 10.1016/j.jtbi.2018.12.007. Epub 2018 Dec 6.
In vivo discovery of G-quadruplex-forming sequences would provide the most relevant G-quadruplexes along a genomic DNA or an RNA molecule, however it is difficult to perform due to the small size of G-quadruplexes, the existence of different topologies, and the additional influence of environmental factors and ligands present during experimentation. In vitro discovery on the other hand is not only unable to simulate in vivo conditions but also, is not practical for large sequences due to limited resources. The immediate solution continues to be the computational prediction although, not always in agreement with experimental findings. This is often due to features that are not conventionally accepted for G-quadruplexes such as disrupted G-tracts or extremely long loops.
Here, we propose a novel tool for the discovery of putative G-quadruplexes with better accuracy through consideration of the features of previously missed G-quadruplex-forming sequences. Comparing against a set of experimentally confirmed sequences, a sensitivity as high as 99% and Youden's J-statistics of as high as 0.91 is achieved; an improvement over other computational approaches. More importantly, we showed that the allowance of a single atypical G-tract which includes a mismatched or a bulging non-guanine nucleotide, and a single loop of extreme size benefits the overall prediction.
The python code may be found at http://github.com/odoluca/G4Catchall and the web application at http://homes.ieu.edu.tr/odoluca/G4Catchall.
在体内发现 G-四链体形成序列将提供最相关的 G-四链体,沿着基因组 DNA 或 RNA 分子,但由于 G-四链体的体积小,不同的拓扑结构的存在,以及实验过程中存在的环境因素和配体的额外影响,这是很难进行的。另一方面,体外发现不仅无法模拟体内条件,而且由于资源有限,对于大序列来说也不实际。直接的解决方案仍然是计算预测,尽管并不总是与实验结果一致。这通常是由于不被传统接受的 G-四链体特征,如破坏的 G-链或极长的环。
在这里,我们提出了一种新的工具,通过考虑以前错过的 G-四链体形成序列的特征,以提高准确性来发现可能的 G-四链体。与一组经过实验验证的序列进行比较,实现了高达 99%的灵敏度和高达 0.91 的 Youden's J 统计量;这比其他计算方法有所改进。更重要的是,我们表明,允许单个非典型 G-链包含一个错配或一个凸起的非鸟嘌呤核苷酸,以及一个极端大小的单个环,有利于整体预测。
Python 代码可在 http://github.com/odoluca/G4Catchall 找到,网络应用可在 http://homes.ieu.edu.tr/odoluca/G4Catchall 找到。