Verdú-Mas Jose Luis, Carrasco Rafael C, Calera-Rubio Jorge
Departament de Llenguatges i Sistemes Informàtics, Universidad de Alicante, E-03071 Alicante, Spain.
IEEE Trans Pattern Anal Mach Intell. 2005 Jul;27(7):1040-50. doi: 10.1109/TPAMI.2005.144.
Probabilistic k-testable models (usually known as k-gram models in the case of strings) can be easily identified from samples and allow for smoothing techniques to deal with unseen events during pattern classification. In this paper, we introduce the family of stochastic k-testable tree languages and describe how these models can approximate any stochastic rational tree language. The model is applied to the task of learning a probabilistic k-testable model from a sample of parsed sentences. In particular, a parser for a natural language grammar that incorporates smoothing is shown.
概率性k可测试模型(在字符串的情况下通常称为k元语法模型)可以很容易地从样本中识别出来,并允许使用平滑技术来处理模式分类过程中未出现的事件。在本文中,我们引入了随机k可测试树语言族,并描述了这些模型如何逼近任何随机有理树语言。该模型应用于从解析句子样本中学习概率性k可测试模型的任务。特别地,展示了一个结合了平滑的自然语言语法解析器。