利用序列比对和二级结构的集合进行非编码 RNA 的快速准确聚类。

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures.

机构信息

Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan.

出版信息

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S48. doi: 10.1186/1471-2105-12-S1-S48.

DOI:10.1186/1471-2105-12-S1-S48

PMID:21342580

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3044305/

Abstract

BACKGROUND

Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level.

RESULTS

We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes all possible sequence alignments and all possible secondary structures, whereas the existing methods only use one optimal sequence alignment and one optimal secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60%.

CONCLUSIONS

Our method enables fast and accurate clustering of ncRNAs. The software is available for download at http://bpla-kernel.dna.bio.keio.ac.jp/clustering/.

摘要

背景

对未注释的转录本进行聚类是识别新的非编码 RNA(ncRNA)家族的重要任务。已经开发了几种基于结构比对得分的相似性度量的层次聚类方法。然而，精确结构比对的计算成本很高，这要求这些方法采用近似算法。这种启发式方法会降低聚类结果的质量，尤其是在家族成员之间的相似性在一级序列水平上无法检测到时。

结果

我们描述了一种用于 ncRNA 层次聚类的新相似性度量方法。其思想是，通过在动态规划框架中利用次优解的信息，可以提高近似算法的可靠性。我们以比现有方法更简化的方式进行结构比对的近似处理。相反，我们的方法利用了所有可能的序列比对和所有可能的二级结构，而现有方法仅使用一个最优的序列比对和一个最优的二级结构。我们证明了这种策略可以在计算成本和聚类质量之间达到最佳平衡。特别是，即使家族成员的序列同一性小于 60%，我们的方法也能保持高性能。

结论

我们的方法能够快速准确地对 ncRNA 进行聚类。该软件可在 http://bpla-kernel.dna.bio.keio.ac.jp/clustering/ 下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2527/3044305/385ca0bc56ea/1471-2105-12-S1-S48-1.jpg

相似文献

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures.利用序列比对和二级结构的集合进行非编码 RNA 的快速准确聚类。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S48. doi: 10.1186/1471-2105-12-S1-S48.

Robust and accurate prediction of noncoding RNAs from aligned sequences.从比对序列中准确预测非编码 RNA。

BMC Bioinformatics. 2010 Oct 15;11 Suppl 7(Suppl 7):S3. doi: 10.1186/1471-2105-11-S7-S3.

A local multiple alignment method for detection of non-coding RNA sequences.一种用于检测非编码RNA序列的局部多重比对方法。

Bioinformatics. 2009 Jun 15;25(12):1498-505. doi: 10.1093/bioinformatics/btp261. Epub 2009 Apr 17.

RNASAlign: RNA structural alignment system.RNASAlign：RNA 结构比对系统。

Bioinformatics. 2011 Aug 1;27(15):2151-2. doi: 10.1093/bioinformatics/btr338. Epub 2011 Jun 8.

Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering.通过基于基因组规模结构的聚类推断非编码RNA家族和类别。

PLoS Comput Biol. 2007 Apr 13;3(4):e65. doi: 10.1371/journal.pcbi.0030065. Epub 2007 Feb 22.

Convolutional neural networks for classification of alignments of non-coding RNA sequences.卷积神经网络在非编码 RNA 序列比对分类中的应用。

Bioinformatics. 2018 Jul 1;34(13):i237-i244. doi: 10.1093/bioinformatics/bty228.

Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change.基于预测的二级结构形成自由能变化检测非编码RNA。

BMC Bioinformatics. 2006 Mar 27;7:173. doi: 10.1186/1471-2105-7-173.

Specific alignment of structured RNA: stochastic grammars and sequence annealing.结构化RNA的特定比对：随机语法与序列退火

Bioinformatics. 2008 Dec 1;24(23):2677-83. doi: 10.1093/bioinformatics/btn495. Epub 2008 Sep 16.

Structural alignment of RNA with triple helix structure.具有三螺旋结构的RNA的结构比对。

J Comput Biol. 2012 Apr;19(4):365-78. doi: 10.1089/cmb.2010.0052.

GraphClust: alignment-free structural clustering of local RNA secondary structures.GraphClust：无比对的局部 RNA 二级结构的结构聚类。

Bioinformatics. 2012 Jun 15;28(12):i224-32. doi: 10.1093/bioinformatics/bts224.

引用本文的文献

Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery.RNA 信息学的最新趋势：机器学习和深度学习在 RNA 二级结构预测和 RNA 药物发现中的应用综述。

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad186.

Reference-based read clustering improves the genome assembly of microbial strains.基于参考的 reads 聚类可改善微生物菌株的基因组组装。

Comput Struct Biotechnol J. 2022 Dec 21;21:444-451. doi: 10.1016/j.csbj.2022.12.032. eCollection 2023.

Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning.通过深度表示学习进行RNA结构比对和聚类的信息性RNA碱基嵌入

NAR Genom Bioinform. 2022 Feb 22;4(1):lqac012. doi: 10.1093/nargab/lqac012. eCollection 2022 Mar.

Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs.计算方法在非编码 RNA 分类和亚细胞定位预测中的研究进展。

Int J Mol Sci. 2021 Aug 13;22(16):8719. doi: 10.3390/ijms22168719.

Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations.基于多视图结构表示的非编码 RNA 序列比对分类的深度森林集成学习。

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa354.

The rare lncRNA GOLLD is widespread and structurally conserved among tRNA arrays.罕见的长链非编码RNA GOLLD在tRNA阵列中广泛存在且结构保守。

RNA Biol. 2020 Jul;17(7):1001-1008. doi: 10.1080/15476286.2020.1748922. Epub 2020 Apr 22.

Convolutional neural networks for classification of alignments of non-coding RNA sequences.卷积神经网络在非编码 RNA 序列比对分类中的应用。

Bioinformatics. 2018 Jul 1;34(13):i237-i244. doi: 10.1093/bioinformatics/bty228.

Accurate Classification of RNA Structures Using Topological Fingerprints.使用拓扑指纹对RNA结构进行准确分类

PLoS One. 2016 Oct 18;11(10):e0164726. doi: 10.1371/journal.pone.0164726. eCollection 2016.

SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.SPARSE：无需基于序列的启发式方法的二次时间RNA同时比对与折叠。

Bioinformatics. 2015 Aug 1;31(15):2489-96. doi: 10.1093/bioinformatics/btv185. Epub 2015 Apr 2.

BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles.BlockClust：从短读 RNA-seq 图谱中对非编码 RNA 进行高效聚类和分类。

Bioinformatics. 2014 Jun 15;30(12):i274-82. doi: 10.1093/bioinformatics/btu270.

本文引用的文献

Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs.从头构建小鼠细胞类型特异性转录组揭示了 lincRNAs 的保守多外显子结构。

Nat Biotechnol. 2010 May;28(5):503-10. doi: 10.1038/nbt.1633. Epub 2010 May 2.

Comparative genomics reveals 104 candidate structured RNAs from bacteria, archaea, and their metagenomes.比较基因组学揭示了来自细菌、古菌及其宏基因组的 104 个候选结构 RNA。

Genome Biol. 2010;11(3):R31. doi: 10.1186/gb-2010-11-3-r31. Epub 2010 Mar 15.

RNPomics: defining the ncRNA transcriptome by cDNA library generation from ribonucleo-protein particles.RNPomics：通过从核糖核蛋白颗粒中生成 cDNA 文库来定义 ncRNA 转录组。

Nucleic Acids Res. 2010 Jun;38(10):e113. doi: 10.1093/nar/gkq057. Epub 2010 Feb 11.

Exceptional structured noncoding RNAs revealed by bacterial metagenome analysis.通过细菌宏基因组分析揭示的特殊结构化非编码RNA

Nature. 2009 Dec 3;462(7273):656-9. doi: 10.1038/nature08586.

Metatranscriptomics reveals unique microbial small RNAs in the ocean's water column.宏转录组学揭示了海洋水柱中独特的微生物小RNA。

Nature. 2009 May 14;459(7244):266-9. doi: 10.1038/nature08055.

Genome-wide searching with base-pairing kernel functions for noncoding RNAs: computational and expression analysis of snoRNA families in Caenorhabditis elegans.使用碱基配对核函数对非编码RNA进行全基因组搜索：秀丽隐杆线虫中snoRNA家族的计算与表达分析

Nucleic Acids Res. 2009 Feb;37(3):999-1009. doi: 10.1093/nar/gkn1054. Epub 2009 Jan 7.

Rfam: updates to the RNA families database.Rfam：RNA家族数据库的更新。

Nucleic Acids Res. 2009 Jan;37(Database issue):D136-40. doi: 10.1093/nar/gkn766. Epub 2008 Oct 25.

Directed acyclic graph kernels for structural RNA analysis.用于结构RNA分析的有向无环图核

BMC Bioinformatics. 2008 Jul 22;9:318. doi: 10.1186/1471-2105-9-318.

Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix.通过修剪动态规划矩阵实现快速成对结构RNA比对。

PLoS Comput Biol. 2007 Oct;3(10):1896-908. doi: 10.1371/journal.pcbi.0030193. Epub 2007 Aug 20.

Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering.通过基于基因组规模结构的聚类推断非编码RNA家族和类别。

PLoS Comput Biol. 2007 Apr 13;3(4):e65. doi: 10.1371/journal.pcbi.0030065. Epub 2007 Feb 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用序列比对和二级结构的集合进行非编码 RNA 的快速准确聚类。

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献