School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore.
College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China.
BMC Bioinformatics. 2020 Nov 12;21(1):522. doi: 10.1186/s12859-020-03864-0.
Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: (1) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and (2) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data.
In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the fivefold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarize the cis and trans targeting of triplexes lncRNAs.
The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.
长非编码 RNA(lncRNA)可以通过与 DNA 形成三螺旋来发挥功能。目前预测三螺旋形成的方法主要依赖于根据碱基配对规则的数学统计。然而,这些方法有两个主要的局限性:(1)它们识别出大量形成三螺旋的 lncRNA,但有限数量的实验验证的形成三螺旋的 lncRNA 表明,也许并非所有的 lncRNA 在实践中都能形成三螺旋;(2)它们的预测仅考虑理论关系,而缺乏来自实验验证数据的特征。
在这项工作中,我们开发了一个名为 TriplexFPP(三螺旋形成潜力预测)的综合程序,这是第一个用于 DNA:RNA 三螺旋预测的机器学习模型。TriplexFPP 基于实验验证的数据预测最有可能形成三螺旋的 lncRNA 和 DNA 位点,其中高级特征由卷积神经网络学习。在五重交叉验证中,具有阈值 0.8 的冗余三螺旋形成 lncRNA 数据集的 ROC 曲线和 PRC 曲线下面积的平均值分别为 0.9649 和 0.9996,而对于三螺旋 DNA 位点预测,这两个值分别为 0.8705 和 0.9671。此外,我们还简要总结了三螺旋 lncRNA 的顺式和反式靶向。
TriplexFPP 能够从所有具有计算定义的三螺旋形成能力的 lncRNA 中预测最有可能形成三螺旋的 lncRNA,以及一个 DNA 位点成为三螺旋的潜力。它可能为探索 lncRNA 功能提供了新的思路。