Suppr超能文献

超n-基序模型:一种用于表示和比较RNA二级结构的新型无比对方法。

The super-n-motifs model: a novel alignment-free approach for representing and comparing RNA secondary structures.

作者信息

Glouzon Jean-Pierre Séhi, Perreault Jean-Pierre, Wang Shengrui

机构信息

Department of Computer Science, Faculty of Science, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada.

RNA Group, Department of Biochemistry, Faculty of Medicine and Health Sciences, Applied Cancer Research Pavilion, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada.

出版信息

Bioinformatics. 2017 Apr 15;33(8):1169-1178. doi: 10.1093/bioinformatics/btw773.

Abstract

MOTIVATION

Comparing ribonucleic acid (RNA) secondary structures of arbitrary size uncovers structural patterns that can provide a better understanding of RNA functions. However, performing fast and accurate secondary structure comparisons is challenging when we take into account the RNA configuration (i.e. linear or circular), the presence of pseudoknot and G-quadruplex (G4) motifs and the increasing number of secondary structures generated by high-throughput probing techniques. To address this challenge, we propose the super-n-motifs model based on a latent analysis of enhanced motifs comprising not only basic motifs but also adjacency relations. The super-n-motifs model computes a vector representation of secondary structures as linear combinations of these motifs.

RESULTS

We demonstrate the accuracy of our model for comparison of secondary structures from linear and circular RNA while also considering pseudoknot and G4 motifs. We show that the super-n-motifs representation effectively captures the most important structural features of secondary structures, as compared to other representations such as ordered tree, arc-annotated and string representations. Finally, we demonstrate the time efficiency of our model, which is alignment free and capable of performing large-scale comparisons of 10 000 secondary structures with an efficiency up to 4 orders of magnitude faster than existing approaches.

AVAILABILITY AND IMPLEMENTATION

The super-n-motifs model was implemented in C ++. Source code and Linux binary are freely available at http://jpsglouzon.github.io/supernmotifs/ .

CONTACT

Shengrui.Wang@Usherbrooke.ca.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics o nline.

摘要

动机

比较任意大小的核糖核酸(RNA)二级结构,能够揭示一些结构模式,从而有助于更好地理解RNA的功能。然而,考虑到RNA的构型(即线性或环状)、假结和G-四链体(G4)基序的存在,以及高通量探测技术产生的二级结构数量不断增加,要进行快速且准确的二级结构比较具有挑战性。为应对这一挑战,我们基于对增强基序的潜在分析提出了超n-基序模型,该模型不仅包含基本基序,还考虑了邻接关系。超n-基序模型将二级结构的向量表示计算为这些基序的线性组合。

结果

我们证明了该模型在比较线性和环状RNA二级结构时的准确性,同时也考虑了假结和G4基序。我们表明,与有序树、弧注释和字符串表示等其他表示方法相比,超n-基序表示能够有效地捕捉二级结构的最重要结构特征。最后,我们展示了该模型的时间效率,它无需比对,能够对10000个二级结构进行大规模比较,效率比现有方法快4个数量级。

可用性与实现

超n-基序模型用C++实现。源代码和Linux二进制文件可从http://jpsglouzon.github.io/supernmotifs/免费获取。

联系方式

Shengrui.Wang@Usherbrooke.ca

补充信息

补充数据可在《生物信息学》在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验