用于更快RNA相似性搜索的查询依赖条带法（QDB）。

Query-dependent banding (QDB) for faster RNA similarity searches.

作者信息

Nawrocki Eric P, Eddy Sean R

机构信息

Howard Hughes Medical Institute, Janelia Farm Research Campus, Ashburn, Virginia, United States of America.

出版信息

PLoS Comput Biol. 2007 Mar 30;3(3):e56. doi: 10.1371/journal.pcbi.0030056. Epub 2007 Feb 7.

DOI:10.1371/journal.pcbi.0030056

PMID:17397253

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1847999/

Abstract

When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs) are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB), which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN(2.4) to LN(1.3) for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization.

摘要

在序列数据库中搜索RNA时，对一级序列和RNA二级结构相似性进行评分是很有必要的。协方差模型（CMs）是非常适合RNA相似性搜索应用的概率模型。然而，CM动态规划比对算法的计算复杂度限制了它们的实际应用。在这里，我们描述了一种称为查询依赖条带化（QDB）的加速方法，该方法使用概率查询CM预先计算动态规划网格中概率可忽略不计的区域，而与目标数据库无关。我们已在免费的Infernal软件包中实现了QDB。对于一个有N个残基的查询RNA和一个有L个残基的目标数据库，QDB将CM比对的平均情况时间复杂度从LN(2.4)降低到LN(1.3)，这使得典型RNA查询的速度提高了4倍。结合对Infernal的其他改进，包括对模型参数使用信息性混合狄利克雷先验，基准测试还表明，改进的参数化提高了敏感性和特异性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cdf/1847999/429080b75bd3/pcbi.0030056.g001.jpg

相似文献

Query-dependent banding (QDB) for faster RNA similarity searches.用于更快RNA相似性搜索的查询依赖条带法（QDB）。

PLoS Comput Biol. 2007 Mar 30;3(3):e56. doi: 10.1371/journal.pcbi.0030056. Epub 2007 Feb 7.

Shape based indexing for faster search of RNA family databases.基于形状的索引，用于更快地搜索RNA家族数据库。

BMC Bioinformatics. 2008 Feb 29;9:131. doi: 10.1186/1471-2105-9-131.

Infernal 1.1: 100-fold faster RNA homology searches. Infernal 1.1：100 倍更快的 RNA 同源性搜索。

Bioinformatics. 2013 Nov 15;29(22):2933-5. doi: 10.1093/bioinformatics/btt509. Epub 2013 Sep 4.

FastR: fast database search tool for non-coding RNA.FastR：用于非编码RNA的快速数据库搜索工具。

Proc IEEE Comput Syst Bioinform Conf. 2004:52-61. doi: 10.1109/csb.2004.1332417.

Locomotif: from graphical motif description to RNA motif search.Locomotif：从图形基序描述到RNA基序搜索

Bioinformatics. 2007 Jul 1;23(13):i392-400. doi: 10.1093/bioinformatics/btm179.

Infernal 1.0: inference of RNA alignments.Infernal 1.0：RNA比对推断

Bioinformatics. 2009 May 15;25(10):1335-7. doi: 10.1093/bioinformatics/btp157. Epub 2009 Mar 23.

Computational analysis of RNAs.RNA的计算分析

Cold Spring Harb Symp Quant Biol. 2006;71:117-28. doi: 10.1101/sqb.2006.71.003.

Mining frequent stem patterns from unaligned RNA sequences.从未比对的RNA序列中挖掘频繁茎模式。

Bioinformatics. 2006 Oct 15;22(20):2480-7. doi: 10.1093/bioinformatics/btl431. Epub 2006 Aug 14.

Database indexing for production MegaBLAST searches.用于生产性MegaBLAST搜索的数据库索引编制。

Bioinformatics. 2008 Aug 15;24(16):1757-64. doi: 10.1093/bioinformatics/btn322. Epub 2008 Jun 21.

Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases.嵌套包含列表（NCList）：一种加速基因组比对和区间数据库区间查询的新算法。

Bioinformatics. 2007 Jun 1;23(11):1386-93. doi: 10.1093/bioinformatics/btl647. Epub 2007 Jan 18.

引用本文的文献

Exploring sp. M21F004 for Biocontrol of Bacterial and Fungal Phytopathogens.探索sp. M21F004对细菌和真菌植物病原体的生物防治作用。

Mar Drugs. 2024 Nov 28;22(12):534. doi: 10.3390/md22120534.

Sensitive and error-tolerant annotation of protein-coding DNA with BATH.利用BATH对蛋白质编码DNA进行灵敏且容错的注释。

Bioinform Adv. 2024 Jun 14;4(1):vbae088. doi: 10.1093/bioadv/vbae088. eCollection 2024.

Evolutionary Structure Conservation and Covariance Scores.进化结构保守性和协变分数。

Methods Mol Biol. 2024;2726:255-284. doi: 10.1007/978-1-0716-3519-3_11.

A Noble Extract of sp. M20A4R8 Efficiently Controlling the Influenza Virus-Induced Cell Death.sp. M20A4R8的一种有效控制流感病毒诱导的细胞死亡的珍贵提取物。

Microorganisms. 2024 Mar 28;12(4):677. doi: 10.3390/microorganisms12040677.

Identification and characterization of a marine bacterium extract from Mameliella sp. M20D2D8 with antiviral effects against influenza A and B viruses.从 Mameliella sp. M20D2D8 中鉴定和表征具有抗甲型和乙型流感病毒活性的海洋细菌提取物。

Arch Virol. 2024 Feb 7;169(3):41. doi: 10.1007/s00705-024-05979-8.

Sensitive and error-tolerant annotation of protein-coding DNA with BATH.使用BATH对蛋白质编码DNA进行灵敏且容错的注释。

bioRxiv. 2024 Jan 1:2023.12.31.573773. doi: 10.1101/2023.12.31.573773.

Description and Genomic Characteristics of sp. nov., Isolated from Kimchi.描述并鉴定了一株来自泡菜的新的。

J Microbiol Biotechnol. 2023 Nov 28;33(11):1448-1456. doi: 10.4014/jmb.2306.06010. Epub 2023 Jul 25.

Evolution and Phylogeny of MicroRNAs - Protocols, Pitfalls, and Problems.MicroRNAs 的进化与系统发生：方案、陷阱和问题。

Methods Mol Biol. 2022;2257:211-233. doi: 10.1007/978-1-0716-1170-8_11.

Lysobacter arenosi sp. nov. and Lysobacter solisilvae sp. nov. isolated from soil.土壤中分离得到的阿克氏菌属新种和索利利克斯氏菌属新种。

J Microbiol. 2021 Aug;59(8):709-717. doi: 10.1007/s12275-021-1156-y. Epub 2021 Jun 1.

Remote homology search with hidden Potts models.使用隐式 Potts 模型进行远程同源搜索。

PLoS Comput Biol. 2020 Nov 30;16(11):e1008085. doi: 10.1371/journal.pcbi.1008085. eCollection 2020 Nov.

本文引用的文献

Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA.探索基因组暗物质：对非编码RNA同源性搜索方法性能的批判性评估

Genome Res. 2007 Jan;17(1):117-25. doi: 10.1101/gr.5890907. Epub 2006 Dec 6.

Searching genomes for noncoding RNA using FastR.使用FastR在基因组中搜索非编码RNA。

IEEE/ACM Trans Comput Biol Bioinform. 2005 Oct-Dec;2(4):366-79. doi: 10.1109/TCBB.2005.57.

A sequence-based filtering method for ncRNA identification and its application to searching for riboswitch elements.一种基于序列的非编码RNA识别过滤方法及其在核糖开关元件搜索中的应用。

Bioinformatics. 2006 Jul 15;22(14):e557-65. doi: 10.1093/bioinformatics/btl232.

Pfam: clans, web tools and services.蛋白质家族数据库（Pfam）：家族分类、网络工具及服务

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D247-51. doi: 10.1093/nar/gkj149.

A computational screen for mammalian pseudouridylation guide H/ACA RNAs.哺乳动物假尿嘧啶化引导H/ACA RNA的计算筛选。

RNA. 2006 Jan;12(1):15-25. doi: 10.1261/rna.2210406.

Sequence-based heuristics for faster annotation of non-coding RNA families.基于序列的启发式方法，用于更快地注释非编码RNA家族。

Bioinformatics. 2006 Jan 1;22(1):35-9. doi: 10.1093/bioinformatics/bti743. Epub 2005 Nov 2.

Evolutionary models for insertions and deletions in a probabilistic modeling framework.概率建模框架下插入和缺失的进化模型。

BMC Bioinformatics. 2005 Mar 21;6:63. doi: 10.1186/1471-2105-6-63.

Using evolutionary Expectation Maximization to estimate indel rates.使用进化期望最大化算法来估计插入缺失率。

Bioinformatics. 2005 May 15;21(10):2294-300. doi: 10.1093/bioinformatics/bti177. Epub 2005 Feb 24.

Rfam: annotating non-coding RNAs in complete genomes.Rfam：对完整基因组中的非编码RNA进行注释。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D121-4. doi: 10.1093/nar/gki081.

A probabilistic model for the evolution of RNA structure.一种RNA结构进化的概率模型。

BMC Bioinformatics. 2004 Oct 26;5:166. doi: 10.1186/1471-2105-5-166.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于更快RNA相似性搜索的查询依赖条带法（QDB）。

Query-dependent banding (QDB) for faster RNA similarity searches.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献