Rinaldi Fabio, Schneider Gerold, Kaljurand Kaarel, Hess Michael, Romacker Martin
Institute of Computational Linguistics, IFI, University of Zurich, Switzerland.
BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-7-S3-S3.
The biomedical domain is witnessing a rapid growth of the amount of published scientific results, which makes it increasingly difficult to filter the core information. There is a real need for support tools that 'digest' the published results and extract the most important information.
We describe and evaluate an environment supporting the extraction of domain-specific relations, such as protein-protein interactions, from a richly-annotated corpus. We use full, deep-linguistic parsing and manually created, versatile patterns, expressing a large set of syntactic alternations, plus semantic ontology information.
The experiments show that our approach described is capable of delivering high-precision results, while maintaining sufficient levels of recall. The high level of abstraction of the rules used by the system, which are considerably more powerful and versatile than finite-state approaches, allows speedy interactive development and validation.
生物医学领域发表的科学成果数量正在迅速增长,这使得筛选核心信息变得越来越困难。确实需要能够“消化”已发表成果并提取最重要信息的支持工具。
我们描述并评估了一个支持从丰富注释语料库中提取特定领域关系(如蛋白质-蛋白质相互作用)的环境。我们使用完整的深度语言解析以及手动创建的通用模式,这些模式表达了大量的句法变化,再加上语义本体信息。
实验表明,我们所描述的方法能够提供高精度的结果,同时保持足够的召回率。系统使用的规则具有高度抽象性,比有限状态方法更强大、更通用,这使得能够快速进行交互式开发和验证。