生物序列分析中有效的模糊性检查

Effective ambiguity checking in biosequence analysis.

作者信息

Reeder Janina, Steffen Peter, Giegerich Robert

机构信息

InternationaI NRW Graduate School of Bioinformatics and Genome Research, Center of Biotechnology (CeBiTec), Bielefeld University, Postfach 10 01 31, 33501 Bielefeld, Germany.

出版信息

BMC Bioinformatics. 2005 Jun 20;6:153. doi: 10.1186/1471-2105-6-153.

DOI:10.1186/1471-2105-6-153

PMID:15967024

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1215473/

Abstract

BACKGROUND

Ambiguity is a problem in biosequence analysis that arises in various analysis tasks solved via dynamic programming, and in particular, in the modeling of families of RNA secondary structures with stochastic context free grammars. Several types of analysis are invalidated by the presence of ambiguity. As this problem inherits undecidability (as we show here) from the namely problem for context free languages, there is no complete algorithmic solution to the problem of ambiguity checking.

RESULTS

We explain frequently observed sources of ambiguity, and show how to avoid them. We suggest four testing procedures that may help to detect ambiguity when present, including a just-in-time test that permits to work safely with a potentially ambiguous grammar. We introduce, for the special case of stochastic context free grammars and RNA structure modeling, an automated partial procedure for proving non-ambiguity. It is used to demonstrate non-ambiguity for several relevant grammars.

CONCLUSION

Our mechanical proof procedure and our testing methods provide a powerful arsenal of methods to ensure non-ambiguity.

摘要

背景

模糊性是生物序列分析中的一个问题，它出现在通过动态规划解决的各种分析任务中，尤其是在用随机上下文无关文法对RNA二级结构家族进行建模时。模糊性的存在会使几种类型的分析无效。由于这个问题从上下文无关语言的判定问题继承了不可判定性（正如我们在此所示），所以不存在用于模糊性检查问题的完整算法解决方案。

结果

我们解释了经常观察到的模糊性来源，并展示了如何避免它们。我们提出了四种测试程序，这些程序可能有助于在存在模糊性时检测到它，包括一种即时测试，该测试允许安全地使用潜在模糊的文法。对于随机上下文无关文法和RNA结构建模的特殊情况，我们引入了一种用于证明非模糊性的自动部分程序。它被用于证明几种相关文法的非模糊性。

结论

我们的机械证明程序和测试方法提供了一套强大的方法来确保非模糊性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37d7/1215473/87e79dd8318d/1471-2105-6-153-1.jpg

相似文献

Effective ambiguity checking in biosequence analysis.

BMC Bioinformatics. 2005 Jun 20;6:153. doi: 10.1186/1471-2105-6-153.

Semantics and ambiguity of stochastic RNA family models.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):499-516. doi: 10.1109/TCBB.2010.12.

Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures.

Proc IEEE Comput Syst Bioinform Conf. 2004:290-9.

Evolving stochastic context--free grammars for RNA secondary structure prediction.

BMC Bioinformatics. 2012 May 4;13:78. doi: 10.1186/1471-2105-13-78.

Stochastic modeling of RNA pseudoknotted structures: a grammatical approach.

Bioinformatics. 2003;19 Suppl 1:i66-73. doi: 10.1093/bioinformatics/btg1007.

A stochastic context free grammar based framework for analysis of protein sequences.

BMC Bioinformatics. 2009 Oct 8;10:323. doi: 10.1186/1471-2105-10-323.

Multithreaded parsing for predicting RNA secondary structures.

Int J Bioinform Res Appl. 2010;6(6):609-21. doi: 10.1504/IJBRA.2010.038741.

Introduction to stochastic context free grammars.

Methods Mol Biol. 2014;1097:85-106. doi: 10.1007/978-1-62703-709-9_5.

Syntactic complexity and ambiguity resolution in a free word order language: behavioral and electrophysiological evidences from Basque.

Brain Lang. 2009 Apr;109(1):1-17. doi: 10.1016/j.bandl.2008.12.003. Epub 2009 Feb 14.

Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures.

Bioinformatics. 2005 Jun 1;21(11):2611-7. doi: 10.1093/bioinformatics/bti385. Epub 2005 Mar 22.

引用本文的文献

Evolving stochastic context--free grammars for RNA secondary structure prediction.

BMC Bioinformatics. 2012 May 4;13:78. doi: 10.1186/1471-2105-13-78.

A folding algorithm for extended RNA secondary structures.

Bioinformatics. 2011 Jul 1;27(13):i129-36. doi: 10.1093/bioinformatics/btr220.

Structural analysis of aligned RNAs.

Nucleic Acids Res. 2006;34(19):5471-81. doi: 10.1093/nar/gkl692. Epub 2006 Oct 4.

Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints.

BMC Bioinformatics. 2006 Sep 4;7:400. doi: 10.1186/1471-2105-7-400.

Complete probabilistic analysis of RNA shapes.

BMC Biol. 2006 Feb 15;4:5. doi: 10.1186/1741-7007-4-5.

Versatile and declarative dynamic programming using pair algebras.

BMC Bioinformatics. 2005 Sep 12;6:224. doi: 10.1186/1471-2105-6-224.

本文引用的文献

Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction.

BMC Bioinformatics. 2004 Jun 4;5:71. doi: 10.1186/1471-2105-5-71.

Complete suboptimal folding of RNA and the stability of secondary structures.

Biopolymers. 1999 Feb;49(2):145-65. doi: 10.1002/(SICI)1097-0282(199902)49:2<145::AID-BIP4>3.0.CO;2-G.

Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information.

Nucleic Acids Res. 1981 Jan 10;9(1):133-48. doi: 10.1093/nar/9.1.133.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物序列分析中有效的模糊性检查

Effective ambiguity checking in biosequence analysis.

作者信息

Reeder Janina, Steffen Peter, Giegerich Robert

机构信息

InternationaI NRW Graduate School of Bioinformatics and Genome Research, Center of Biotechnology (CeBiTec), Bielefeld University, Postfach 10 01 31, 33501 Bielefeld, Germany.