School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.
Nucleic Acids Res. 2019 May 7;47(8):e43. doi: 10.1093/nar/gkz087.
The rapid and accurate approach to distinguish between coding RNAs and ncRNAs has been playing a critical role in analyzing thousands of novel transcripts, which have been generated in recent years by next-generation sequencing technology. Previously developed methods CPAT, CPC2 and PLEK can distinguish coding RNAs and ncRNAs very well, but poorly distinguish between small coding RNAs and small ncRNAs. Herein, we report an approach, CPPred (coding potential prediction), which is based on SVM classifier and multiple sequence features including novel RNA features encoded by the global description. The CPPred can better distinguish not only between coding RNAs and ncRNAs, but also between small coding RNAs and small ncRNAs than the state-of-the-art methods due to the addition of the novel RNA features. A recent study proposes 1335 novel human coding RNAs from a large number of RNA-seq datasets. However, only 119 transcripts are predicted as coding RNAs by the CPPred. In fact, almost all proposed novel coding RNAs are ncRNAs (91.1%), which is consistent with previous reports. Remarkably, we also reveal that the global description of encoding features (T2, C0 and GC) plays an important role in the prediction of coding potential.
快速准确地区分编码 RNA 和非编码 RNA 在分析近年来通过下一代测序技术产生的数千种新型转录本方面发挥了关键作用。以前开发的 CPAT、CPC2 和 PLEK 方法可以很好地区分编码 RNA 和非编码 RNA,但在区分小编码 RNA 和小非编码 RNA 方面效果不佳。在这里,我们报告了一种方法 CPPred(编码潜力预测),它基于 SVM 分类器和多个序列特征,包括由全局描述编码的新型 RNA 特征。由于添加了新型 RNA 特征,CPPred 不仅可以更好地区分编码 RNA 和非编码 RNA,还可以更好地区分小编码 RNA 和小非编码 RNA,优于最先进的方法。最近的一项研究从大量 RNA-seq 数据集中提出了 1335 种新型人类编码 RNA。然而,只有 119 个转录本被 CPPred 预测为编码 RNA。事实上,几乎所有提出的新型编码 RNA 都是非编码 RNA(91.1%),这与之前的报告一致。值得注意的是,我们还揭示了编码特征的全局描述(T2、C0 和 GC)在编码潜力预测中起着重要作用。