tRNA-DL：一种用于改善tRNAscan-SE预测结果的深度学习方法。

tRNA-DL: A Deep Learning Approach to Improve tRNAscan-SE Prediction Results.

作者信息

Gao Xin, Wei Zhi, Hakonarson Hakon

机构信息

Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, USA.

Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, USA,

出版信息

Hum Hered. 2018;83(3):163-172. doi: 10.1159/000493215. Epub 2019 Jan 25.

DOI:10.1159/000493215

PMID:30685762

Abstract

BACKGROUND

tRNAscan-SE is the leading tool for transfer RNA (tRNA) annotation, which has been widely used in the field. However, tRNAscan-SE can return a significant number of false positives when applied to large sequences. Recently, conventional machine learning methods have been proposed to address this issue, but their efficiency can be still limited due to their dependency on handcrafted features. With the growing availability of large-scale genomic data-sets, deep learning methods, especially convolutional neural networks, have demonstrated excellent power in characterizing sequence patterns in genomic sequences. Thus, we hypothesize that deep learning may bring further improvement for tRNA prediction.

METHODS

We proposed a new computational approach based on deep neural networks to predict tRNA gene sequences. We designed and investigated various deep neural network architectures. We used the tRNA sequences as positive samples, and the false-positive tRNA sequences predicted by tRNAscan-SE in coding sequences as negative samples, to train and evaluate the proposed models by comparison with the conventional machine learning methods and popular tRNA prediction tools.

RESULTS

Using the one-hot encoding method, our proposed models can extract features without involving extensive manual feature engineering. Our proposed best model outperformed the existing methods under different performance metrics.

CONCLUSION

The proposed deep learning methods can substantially reduce the false positive output by the state-of-the-art tool tRNAscan-SE. Coupled with tRNAscan-SE, it can serve as a useful complementary tool for tRNA annotation. The application to tRNA prediction demonstrates the superiority of deep learning in automatic feature generation for characterizing sequence patterns.

摘要

背景

tRNAscan-SE是用于转运RNA（tRNA）注释的领先工具，已在该领域广泛使用。然而，当应用于大型序列时，tRNAscan-SE会返回大量假阳性结果。最近，有人提出使用传统机器学习方法来解决这个问题，但由于它们对手工制作特征的依赖，其效率仍然可能有限。随着大规模基因组数据集的不断增加，深度学习方法，特别是卷积神经网络，在表征基因组序列中的序列模式方面已显示出卓越的能力。因此，我们假设深度学习可能会为tRNA预测带来进一步的改进。