Mostavi Milad, Salekin Sirajul, Huang Yufei
Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:2394-2397. doi: 10.1109/EMBC.2018.8512780.
2'-O-methylation (2'-O-me) of ribose moiety is one of the significant and ubiquitous post-transcriptional RNA modifications which is vital for metabolism and functions of RNA. Although recent development of new technology (Nmseq) enabled biologists to find precise location of 2'-O-me in RNA sequences, there is still a lack of computational tools that can also provide high resolution prediction of this RNA modification. In this paper, we propose a deep learning based method that takes advantage of an embedding method to learn complex feature representation of pre-mRNA sequences and employs a Convolutional Neural Network to fine-tune the features required for accurate prediction of such alteration. Specifically, we adopted dna2vec, a biological sequence embedding method originally inspired by the word2vec model of text analysis, to yield embedded representation of sequences that may or may not contain 2-O-me sites before feeding those features into CNN for classification. Our model was trained using the data collected from Nm-seq experiment. The proposed method achieved AUC and auPRC scores of 90% outperforming existing state-of-the-art algorithms by a significant margin in both balanced and unbalanced class testing scenarios.
核糖部分的2'-O-甲基化(2'-O-me)是重要且普遍存在的转录后RNA修饰之一,对RNA的代谢和功能至关重要。尽管新技术(Nmseq)的最新发展使生物学家能够在RNA序列中找到2'-O-me的精确位置,但仍然缺乏能够提供这种RNA修饰高分辨率预测的计算工具。在本文中,我们提出了一种基于深度学习的方法,该方法利用嵌入方法来学习前体mRNA序列的复杂特征表示,并采用卷积神经网络来微调准确预测此类改变所需的特征。具体而言,我们采用了dna2vec,这是一种最初受文本分析的word2vec模型启发的生物序列嵌入方法,在将这些特征输入CNN进行分类之前,生成可能包含或不包含2'-O-me位点的序列的嵌入表示。我们的模型使用从Nm-seq实验收集的数据进行训练。在平衡和不平衡类测试场景中,所提出的方法在AUC和auPRC分数方面均达到了90%,显著优于现有的最先进算法。