Suppr超能文献

针对蛋白质-蛋白质相互作用的生物医学语料库对两种依存句法分析器的评估。

Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions.

作者信息

Pyysalo Sampo, Ginter Filip, Pahikkala Tapio, Boberg Jorma, Järvinen Jouni, Salakoski Tapio

机构信息

Turku Centre for Computer Science (TUCS), Department of Computer Science, University of Turku, Lemminkäisenkatu 14A, 20520 Turku, Finland.

出版信息

Int J Med Inform. 2006 Jun;75(6):430-42. doi: 10.1016/j.ijmedinf.2005.06.009. Epub 2005 Aug 11.

Abstract

We present an evaluation of Link Grammar and Connexor Machinese Syntax, two major broad-coverage dependency parsers, on a custom hand-annotated corpus consisting of sentences regarding protein-protein interactions. In the evaluation, we apply the notion of an interaction subgraph, which is the subgraph of a dependency graph expressing a protein-protein interaction. We measure the performance of the parsers for recovery of individual dependencies, fully correct parses, and interaction subgraphs. For Link Grammar, an open system that can be inspected in detail, we further perform a comprehensive failure analysis, report specific causes of error, and suggest potential modifications to the grammar. We find that both parsers perform worse on biomedical English than previously reported on general English. While Connexor Machinese Syntax significantly outperforms Link Grammar, the failure analysis suggests specific ways in which the latter could be modified for better performance in the domain.

摘要

我们展示了对两种主要的广泛覆盖的依存句法分析器——链接语法(Link Grammar)和康奈克索机器句法(Connexor Machinese Syntax)——在一个由关于蛋白质-蛋白质相互作用的句子组成的自定义手工标注语料库上的评估。在评估中,我们应用了相互作用子图的概念,它是表示蛋白质-蛋白质相互作用的依存关系图的子图。我们衡量了句法分析器在恢复单个依存关系、完全正确的句法分析以及相互作用子图方面的性能。对于链接语法这个可以详细检查的开放系统,我们进一步进行了全面的失败分析,报告了具体的错误原因,并提出了对语法的潜在修改建议。我们发现,这两种句法分析器在生物医学英语上的表现都比之前在通用英语上的报告更差。虽然康奈克索机器句法明显优于链接语法,但失败分析提出了一些具体方法,通过这些方法可以对链接语法进行修改,以在该领域获得更好的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验