Corazza Anna, Satta Giorgio
Department of Physics, University of Naples Federico II, via Cinthia, Napoli, Italy.
IEEE Trans Pattern Anal Mach Intell. 2007 Aug;29(8):1379-93. doi: 10.1109/TPAMI.2007.1065.
In this paper, we consider probabilistic context-free grammars, a class of generative devices that has been successfully exploited in several applications of syntactic pattern matching, especially in statistical natural language parsing. We investigate the problem of training probabilistic context-free grammars on the basis of distributions defined over an infinite set of trees or an infinite set of sentences by minimizing the cross-entropy. This problem has applications in cases of context-free approximation of distributions generated by more expressive statistical models. We show several interesting theoretical properties of probabilistic context-free grammars that are estimated in this way, including the previously unknown equivalence between the grammar cross-entropy with the input distribution and the so-called derivational entropy of the grammar itself. We discuss important consequences of these results involving the standard application of the maximum-likelihood estimator on finite tree and sentence samples, as well as other finite-state models such as Hidden Markov Models and probabilistic finite automata.
在本文中,我们考虑概率上下文无关文法,这是一类生成装置,已在句法模式匹配的多个应用中得到成功应用,尤其是在统计自然语言解析方面。我们研究基于在无限树集或无限句子集上定义的分布,通过最小化交叉熵来训练概率上下文无关文法的问题。这个问题在更具表现力的统计模型生成的分布的上下文无关近似的情况下有应用。我们展示了以这种方式估计的概率上下文无关文法的几个有趣的理论性质,包括语法交叉熵与输入分布之间先前未知的等价性以及语法本身的所谓推导熵。我们讨论了这些结果的重要影响,涉及最大似然估计器在有限树和句子样本上的标准应用,以及其他有限状态模型,如隐马尔可夫模型和概率有限自动机。