Sakakibara Yasubumi
Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan.
IEEE Trans Pattern Anal Mach Intell. 2005 Jul;27(7):1051-62. doi: 10.1109/TPAMI.2005.140.
Bioinformatics is an active research area aimed at developing intelligent systems for analyses of molecular biology. Many methods based on formal language theory, statistical theory, and learning theory have been developed for modeling and analyzing biological sequences such as DNA, RNA, and proteins. Especially, grammatical inference methods are expected to find some grammatical structures hidden in biological sequences. In this article, we give an overview of a series of our grammatical approaches to biological sequence analyses and related researches and focus on learning stochastic grammars from biological sequences and predicting their functions based on learned stochastic grammars.
生物信息学是一个活跃的研究领域,旨在开发用于分子生物学分析的智能系统。已经开发了许多基于形式语言理论、统计理论和学习理论的方法,用于对DNA、RNA和蛋白质等生物序列进行建模和分析。特别是,语法推断方法有望发现隐藏在生物序列中的一些语法结构。在本文中,我们概述了我们对生物序列分析的一系列语法方法及相关研究,并重点介绍了从生物序列中学习随机语法以及基于所学随机语法预测其功能。