Suppr超能文献

使用拓扑指纹对RNA结构进行准确分类

Accurate Classification of RNA Structures Using Topological Fingerprints.

作者信息

Huang Jiajie, Li Kejie, Gribskov Michael

机构信息

Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America.

Life Sciences Solutions Group, Thermo Fisher Scientific, South San Francisco, California, United States of America.

出版信息

PLoS One. 2016 Oct 18;11(10):e0164726. doi: 10.1371/journal.pone.0164726. eCollection 2016.

Abstract

While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity-an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC > 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint.

摘要

虽然众所周知RNA具有复杂的结构,但功能相似的RNA往往在序列上相似度很低。尽管碱基配对区域的确切大小和间距各不相同,但功能相似的RNA在碱基配对茎的排列或拓扑结构上具有明显的相似性。此外,预测的RNA结构通常缺乏假结(生物活性的关键方面),并且只是部分正确或不完整。一种拓扑方法解决了所有这些难题。在这项工作中,我们将每个RNA结构描述为一个可以转换为拓扑谱(RNA指纹)的图。RNA结构中的子图集合,即其RNA指纹,可以与其他RNA结构的指纹进行比较,以识别功能相关的RNA并对其进行正确分类。即使省略高达30%的茎,也能识别出拓扑相似的RNA,这表明不需要非常精确的结构。我们在一组八个精心挑选的RNA家族上研究了RNA指纹方法的性能,这些家族大小和功能各异,包含假结,且序列相似度很低——这是一个特别具有挑战性的测试集。尽管测试集难度很大,但RNA指纹方法非常成功(ROC曲线下面积>0.95)。由于包含了假结,RNA指纹方法比仅基于二级结构的方法涵盖了更广泛的可能结构,并且其对不完整结构的耐受性表明它甚至可以应用于预测结构。源代码可在https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29ed/5068708/45721910c93b/pone.0164726.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验