Reeder Janina, Steffen Peter, Giegerich Robert
InternationaI NRW Graduate School of Bioinformatics and Genome Research, Center of Biotechnology (CeBiTec), Bielefeld University, Postfach 10 01 31, 33501 Bielefeld, Germany.
BMC Bioinformatics. 2005 Jun 20;6:153. doi: 10.1186/1471-2105-6-153.
Ambiguity is a problem in biosequence analysis that arises in various analysis tasks solved via dynamic programming, and in particular, in the modeling of families of RNA secondary structures with stochastic context free grammars. Several types of analysis are invalidated by the presence of ambiguity. As this problem inherits undecidability (as we show here) from the namely problem for context free languages, there is no complete algorithmic solution to the problem of ambiguity checking.
We explain frequently observed sources of ambiguity, and show how to avoid them. We suggest four testing procedures that may help to detect ambiguity when present, including a just-in-time test that permits to work safely with a potentially ambiguous grammar. We introduce, for the special case of stochastic context free grammars and RNA structure modeling, an automated partial procedure for proving non-ambiguity. It is used to demonstrate non-ambiguity for several relevant grammars.
Our mechanical proof procedure and our testing methods provide a powerful arsenal of methods to ensure non-ambiguity.
模糊性是生物序列分析中的一个问题,它出现在通过动态规划解决的各种分析任务中,尤其是在用随机上下文无关文法对RNA二级结构家族进行建模时。模糊性的存在会使几种类型的分析无效。由于这个问题从上下文无关语言的判定问题继承了不可判定性(正如我们在此所示),所以不存在用于模糊性检查问题的完整算法解决方案。
我们解释了经常观察到的模糊性来源,并展示了如何避免它们。我们提出了四种测试程序,这些程序可能有助于在存在模糊性时检测到它,包括一种即时测试,该测试允许安全地使用潜在模糊的文法。对于随机上下文无关文法和RNA结构建模的特殊情况,我们引入了一种用于证明非模糊性的自动部分程序。它被用于证明几种相关文法的非模糊性。
我们的机械证明程序和测试方法提供了一套强大的方法来确保非模糊性。