Suppr超能文献

使用长短期记忆神经网络进行剪接位点识别。

Splice Junction Identification using Long Short-Term Memory Neural Networks.

作者信息

Regan Kevin, Saghafi Abolfazl, Li Zhijun

机构信息

Department of Chemistry and Biochemistry, University of the Sciences, Philadelphia, PA, USA.

Department of Mathematics, Physics and Statistics, University of the Sciences, Philadelphia, PA, USA.

出版信息

Curr Genomics. 2021 Dec 30;22(5):384-390. doi: 10.2174/1389202922666211011143008.

Abstract

BACKGROUND

Splice junctions are the key to move from pre-messenger RNA to mature messenger RNA in many multi-exon genes due to alternative splicing. Since the percentage of multi-exon genes that undergo alternative splicing is very high, identifying splice junctions is an attractive research topic with important implications.

OBJECTIVE

The aim of this paper is to develop a deep learning model capable of identifying splice junctions in RNA sequences using 13,666 unique sequences of primate RNA.

METHODS

A Long Short-Term Memory (LSTM) Neural Network model is developed that classifies a given sequence as EI (Exon-Intron splice), IE (Intron-Exon splice), or N (No splice). The model is trained with groups of trinucleotides and its performance is tested using validation and test data to prevent bias.

RESULTS

Model performance was measured using accuracy and f-score in test data. The finalized model achieved an average accuracy of 91.34% with an average f-score of 91.36% over 50 runs.

CONCLUSION

Comparisons show a highly competitive model to recent Convolutional Neural Network structures. The proposed LSTM model achieves the highest accuracy and f-score among published alternative LSTM structures.

摘要

背景

在许多多外显子基因中,由于可变剪接,剪接位点是从前体信使核糖核酸转变为成熟信使核糖核酸的关键。鉴于经历可变剪接的多外显子基因的比例非常高,识别剪接位点是一个具有重要意义的、引人关注的研究课题。

目的

本文旨在开发一种深度学习模型,该模型能够使用13666条灵长类动物RNA的独特序列来识别RNA序列中的剪接位点。

方法

开发了一种长短期记忆(LSTM)神经网络模型,该模型将给定序列分类为EI(外显子-内含子剪接)、IE(内含子-外显子剪接)或N(无剪接)。该模型用三联体核苷酸组进行训练,并使用验证数据和测试数据来测试其性能,以防止偏差。

结果

在测试数据中使用准确率和F值来衡量模型性能。最终模型在50次运行中平均准确率达到91.34%,平均F值达到91.36%。

结论

比较结果表明,该模型与最近的卷积神经网络结构相比具有很强的竞争力。所提出的LSTM模型在已发表的替代LSTM结构中实现了最高的准确率和F值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3fcb/8844938/7a3a37058cc7/CG-22-384_F1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验