Suppr超能文献

通过深度表示学习进行RNA结构比对和聚类的信息性RNA碱基嵌入

Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning.

作者信息

Akiyama Manato, Sakakibara Yasubumi

机构信息

Department of Biosciences and Informatics, Keio University, 223-8522, Japan.

出版信息

NAR Genom Bioinform. 2022 Feb 22;4(1):lqac012. doi: 10.1093/nargab/lqac012. eCollection 2022 Mar.

Abstract

Effective embedding is actively conducted by applying deep learning to biomolecular information. Obtaining better embeddings enhances the quality of downstream analyses, such as DNA sequence motif detection and protein function prediction. In this study, we adopt a pre-training algorithm for the effective embedding of RNA bases to acquire semantically rich representations and apply this algorithm to two fundamental RNA sequence problems: structural alignment and clustering. By using the pre-training algorithm to embed the four bases of RNA in a position-dependent manner using a large number of RNA sequences from various RNA families, a context-sensitive embedding representation is obtained. As a result, not only base information but also secondary structure and context information of RNA sequences are embedded for each base. We call this 'informative base embedding' and use it to achieve accuracies superior to those of existing state-of-the-art methods on RNA structural alignment and RNA family clustering tasks. Furthermore, upon performing RNA sequence alignment by combining this informative base embedding with a simple Needleman-Wunsch alignment algorithm, we succeed in calculating structural alignments with a time complexity of ( ) instead of the ( ) time complexity of the naive implementation of Sankoff-style algorithm for input RNA sequence of length .

摘要

通过将深度学习应用于生物分子信息来积极进行有效的嵌入。获得更好的嵌入可以提高下游分析的质量,例如DNA序列基序检测和蛋白质功能预测。在本研究中,我们采用一种预训练算法对RNA碱基进行有效嵌入,以获得语义丰富的表示,并将该算法应用于两个基本的RNA序列问题:结构比对和聚类。通过使用预训练算法,利用来自各种RNA家族的大量RNA序列以位置依赖的方式嵌入RNA的四个碱基,获得了上下文敏感的嵌入表示。结果,不仅每个碱基的碱基信息,而且RNA序列的二级结构和上下文信息都被嵌入。我们将此称为“信息性碱基嵌入”,并使用它在RNA结构比对和RNA家族聚类任务上实现了优于现有最先进方法的准确率。此外,通过将这种信息性碱基嵌入与简单的Needleman-Wunsch比对算法相结合来进行RNA序列比对,我们成功地以( )的时间复杂度计算结构比对,而不是对于长度为 的输入RNA序列,Sankoff风格算法的朴素实现的( )时间复杂度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14f2/8862729/5e094a4ce59f/lqac012fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验