Ambati Bharat Ram, Deoskar Tejaswini, Steedman Mark
1ILCC, School of Informatics, University of Edinburgh, Edinburgh, UK.
2Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, The Netherlands.
Lang Resour Eval. 2018;52(1):67-100. doi: 10.1007/s10579-017-9379-6. Epub 2017 Jan 25.
In this paper, we present an approach for automatically creating a combinatory categorial grammar (CCG) treebank from a dependency treebank for the subject-object-verb language Hindi. Rather than a direct conversion from dependency trees to CCG trees, we propose a two stage approach: a language independent generic algorithm first extracts a CCG lexicon from the dependency treebank. An exhaustive CCG parser then creates a treebank of CCG derivations. We also discuss special cases of this generic algorithm to handle linguistic phenomena specific to Hindi. In doing so we extract different constructions with long-range dependencies like coordinate constructions and non-projective dependencies resulting from constructions like relative clauses, noun elaboration and verbal modifiers.
在本文中,我们提出了一种从印地语(主宾动语言)的依存句法库自动创建组合范畴语法(CCG)句法库的方法。我们不是直接将依存树转换为CCG树,而是提出了一种两阶段方法:一种与语言无关的通用算法首先从依存句法库中提取CCG词库。然后,一个详尽的CCG解析器创建一个CCG推导的句法库。我们还讨论了这种通用算法的特殊情况,以处理印地语特有的语言现象。在此过程中,我们提取了具有长距离依存关系的不同结构,如并列结构以及由关系从句、名词细化和动词修饰语等结构产生的非投射依存关系。