Computer Languages and Systems Department, University of the Basque Country, Donostia-San Sebastian, Spain.
Computer Architecture and Technology Department, University of the Basque Country, Donostia-San Sebastian, Spain.
PLoS One. 2019 Sep 12;14(9):e0221801. doi: 10.1371/journal.pone.0221801. eCollection 2019.
This paper describes the process of adapting the Stanford Coreference resolution module to the Basque language, taking into account the characteristics of the language. The module has been integrated in a linguistic analysis pipeline obtaining an end-to-end coreference resolution system for the Basque language. The adaptation process explained can benefit and facilitate other languages with similar characteristics in the implementation of their coreference resolution systems. During the experimentation phase, we have demonstrated that language-specific features have a noteworthy effect on coreference resolution, obtaining a gain in CoNLL score of 7.07 with respect to the baseline system. We have also analysed the effect that preprocessing has in coreference resolution, comparing the results obtained with automatic mentions versus gold mentions. When gold mentions are provided, the results increase 11.5 points in CoNLL score in comparison with results obtained when automatic mentions are used. The contribution of each sieve is analysed concluding that morphology is essential for agglutinative languages to obtain good performance in coreference resolution. Finally, an error analysis of the coreference resolution system is presented which have revealed our system's weak points and help to determine the improvements of the system. As a result of the error analysis, we have enriched the Basque coreference resolution adding new two sieves, obtaining an improvement of 0.24 points in CoNLL F1 when automatic mentions are used and of 0.39 points when the gold mentions are provided.
本文描述了将斯坦福共指解析模块适配到巴斯克语的过程,考虑到语言的特点。该模块已经集成到语言分析管道中,得到了巴斯克语的端到端共指解析系统。所解释的适配过程可以为具有类似特征的其他语言在实现其共指解析系统时提供帮助和便利。在实验阶段,我们已经证明语言特定特征对共指解析有显著影响,与基线系统相比,共指得分提高了 7.07 分。我们还分析了预处理对共指解析的影响,比较了自动提及与黄金提及的结果。当提供黄金提及时,与使用自动提及相比,共指得分提高了 11.5 分。分析了每个筛子的贡献,得出结论,形态学对于黏着语在共指解析中获得良好性能至关重要。最后,对共指解析系统进行了错误分析,揭示了系统的弱点,并有助于确定系统的改进。作为错误分析的结果,我们通过添加两个新的筛子来丰富巴斯克语的共指解析,当使用自动提及时,共指 F1 提高了 0.24 分,当提供黄金提及时,共指 F1 提高了 0.39 分。