基于序列的启发式方法，用于更快地注释非编码RNA家族。

Sequence-based heuristics for faster annotation of non-coding RNA families.

作者信息

Weinberg Zasha, Ruzzo Walter L

机构信息

Department of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA.

出版信息

Bioinformatics. 2006 Jan 1;22(1):35-9. doi: 10.1093/bioinformatics/bti743. Epub 2005 Nov 2.

DOI:10.1093/bioinformatics/bti743

PMID:16267089

Abstract

MOTIVATION

Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be.

RESULTS

In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that--unlike family-specific solutions--can scale to hundreds of ncRNA families.

AVAILABILITY

The source code is available under GNU Public License at the supplementary web site.

摘要

动机

非编码RNA（ncRNA）是不编码蛋白质的功能性RNA分子。协方差模型（CM）是一种有用的统计工具，可利用序列以及重要的RNA二级结构信息，在大型基因组数据库中寻找ncRNA基因家族的新成员。不幸的是，CM搜索极其缓慢。此前，我们创建了严格的过滤器，在不牺牲CM准确性的前提下，显著加快了几乎所有ncRNA家族的搜索速度。然而，这些严格的过滤器使得搜索速度比启发式方法还要慢。

结果

在本文中，我们引入了基于隐马尔可夫模型（profile HMM）的启发式过滤器。我们表明，其准确性通常优于基于BLAST的启发式方法。此外，我们将我们的启发式方法与tRNAscan-SE中使用的方法进行了比较，tRNAscan-SE的启发式方法包含了大量针对tRNA的特定工作，而我们的启发式方法对任何ncRNA都是通用的。性能大致相当，因此我们预计我们的启发式方法提供了一个高质量的解决方案，与特定家族的解决方案不同，它可以扩展到数百个ncRNA家族。

可用性

源代码可在补充网站上根据GNU通用公共许可证获取。

相似文献

Sequence-based heuristics for faster annotation of non-coding RNA families.

Bioinformatics. 2006 Jan 1;22(1):35-9. doi: 10.1093/bioinformatics/bti743. Epub 2005 Nov 2.

Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i334-41. doi: 10.1093/bioinformatics/bth925.

A local multiple alignment method for detection of non-coding RNA sequences.

Bioinformatics. 2009 Jun 15;25(12):1498-505. doi: 10.1093/bioinformatics/btp261. Epub 2009 Apr 17.

An Ariadne's thread to the identification and annotation of noncoding RNAs in eukaryotes.

Brief Bioinform. 2009 Sep;10(5):475-89. doi: 10.1093/bib/bbp022. Epub 2009 Apr 21.

Fast model-based protein homology detection without alignment.

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%.

Bioinformatics. 2005 May 1;21(9):1815-24. doi: 10.1093/bioinformatics/bti279. Epub 2005 Jan 18.

CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score.

Bioinformatics. 2009 Dec 15;25(24):3236-43. doi: 10.1093/bioinformatics/btp580. Epub 2009 Oct 6.

Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome.

Nat Biotechnol. 2005 Nov;23(11):1383-90. doi: 10.1038/nbt1144.

Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles.

J Mol Biol. 2001 Nov 9;313(5):1003-11. doi: 10.1006/jmbi.2001.5102.

Predicting RNA secondary structure based on the class information and Hopfield network.

Comput Biol Med. 2009 Mar;39(3):206-14. doi: 10.1016/j.compbiomed.2008.12.010. Epub 2009 Feb 11.

引用本文的文献

Comparative RNA Genomics.

Methods Mol Biol. 2024;2802:347-393. doi: 10.1007/978-1-0716-3838-5_12.

Architectures and complex functions of tandem riboswitches.

RNA Biol. 2022 Jan;19(1):1059-1076. doi: 10.1080/15476286.2022.2119017.

Silencing of MEG3 attenuated the role of lipopolysaccharides by modulating the miR-93-5p/PTEN pathway in Leydig cells.

Reprod Biol Endocrinol. 2021 Feb 27;19(1):33. doi: 10.1186/s12958-021-00712-5.

A Machine Learning Approach for Accurate Annotation of Noncoding RNAs.

IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):551-9. doi: 10.1109/TCBB.2014.2366758.

Ambivalent covariance models.

BMC Bioinformatics. 2015 May 28;16:178. doi: 10.1186/s12859-015-0569-1.

A review of three different studies on hidden markov models for epigenetic problems: a computational perspective.

Genomics Inform. 2014 Dec;12(4):145-50. doi: 10.5808/GI.2014.12.4.145. Epub 2014 Dec 31.

Annotating RNA motifs in sequences and alignments.

Nucleic Acids Res. 2015 Jan;43(2):691-8. doi: 10.1093/nar/gku1327. Epub 2014 Dec 17.

Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm.

Nucleic Acids Res. 2014 Jun;42(11):e93. doi: 10.1093/nar/gku325. Epub 2014 Apr 25.

Computational analysis of riboswitch-based regulation.

Biochim Biophys Acta. 2014 Oct;1839(10):900-907. doi: 10.1016/j.bbagrm.2014.02.011. Epub 2014 Feb 28.

Infernal 1.1: 100-fold faster RNA homology searches.

Bioinformatics. 2013 Nov 15;29(22):2933-5. doi: 10.1093/bioinformatics/btt509. Epub 2013 Sep 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于序列的启发式方法，用于更快地注释非编码RNA家族。

Sequence-based heuristics for faster annotation of non-coding RNA families.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献