Zou Quan, Guo Jiasheng, Ju Ying, Wu Meihong, Zeng Xiangxiang, Hong Zhiling
School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
School of Computer Science and Technology, Tianjin University, Tianjin 300072, China.
Mol Inform. 2015 Nov;34(11-12):761-70. doi: 10.1002/minf.201500031. Epub 2015 Sep 14.
tRNAScan-SE is a tRNA detection program that is widely used for tRNA annotation; however, the false positive rate of tRNAScan-SE is unacceptable for large sequences. Here, we used a machine learning method to try to improve the tRNAScan-SE results. A new predictor, tRNA-Predict, was designed. We obtained real and pseudo-tRNA sequences as training data sets using tRNAScan-SE and constructed three different tRNA feature sets. We then set up an ensemble classifier, LibMutil, to predict tRNAs from the training data. The positive data set of 623 tRNA sequences was obtained from tRNAdb 2009 and the negative data set was the false positive tRNAs predicted by tRNAscan-SE. Our in silico experiments revealed a prediction accuracy rate of 95.1 % for tRNA-Predict using 10-fold cross-validation. tRNA-Predict was developed to distinguish functional tRNAs from pseudo-tRNAs rather than to predict tRNAs from a genome-wide scan. However, tRNA-Predict can work with the output of tRNAscan-SE, which is a genome-wide scanning method, to improve the tRNAscan-SE annotation results. The tRNA-Predict web server is accessible at http://datamining.xmu.edu.cn/∼gjs/tRNA-Predict.
tRNAScan-SE是一个广泛用于tRNA注释的tRNA检测程序;然而,对于大型序列而言,tRNAScan-SE的假阳性率令人无法接受。在此,我们使用一种机器学习方法来尝试改进tRNAScan-SE的结果。设计了一种新的预测器tRNA-Predict。我们使用tRNAScan-SE获得真实和伪tRNA序列作为训练数据集,并构建了三种不同的tRNA特征集。然后我们建立了一个集成分类器LibMutil,用于从训练数据中预测tRNA。623个tRNA序列的阳性数据集取自tRNAdb 2009,阴性数据集是tRNAscan-SE预测的假阳性tRNA。我们的计算机模拟实验显示,使用10折交叉验证时,tRNA-Predict的预测准确率为95.1%。开发tRNA-Predict是为了区分功能性tRNA和伪tRNA,而不是从全基因组扫描中预测tRNA。然而,tRNA-Predict可以与作为全基因组扫描方法的tRNAscan-SE的输出配合使用,以改进tRNAscan-SE的注释结果。可通过http://datamining.xmu.edu.cn/∼gjs/tRNA-Predict访问tRNA-Predict网络服务器。