Suppr超能文献

大规模方向关系提取与解析。

Large-scale directional relationship extraction and resolution.

作者信息

Giles Cory B, Wren Jonathan D

机构信息

Arthritis and Immunology Research Program, Oklahoma Medical Research Foundation, 825 N,E, 13th Street, Oklahoma City, Oklahoma 73104-5005, USA.

出版信息

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S11. doi: 10.1186/1471-2105-9-S9-S11.

Abstract

BACKGROUND

Relationships between entities such as genes, chemicals, metabolites, phenotypes and diseases in MEDLINE are often directional. That is, one may affect the other in a positive or negative manner. Detection of causality and direction is key in piecing pathways together and in examining possible implications of experimental results. Because of the size and growth of biomedical literature, it is increasingly important to be able to automate this process as much as possible.

RESULTS

Here we present a method of relation extraction using dependency graph parsing with SVM classification. We tested the SVM classifier first on gold standard corpora from GENIA and find it achieved 82% precision and 94.8% recall (F-measure of 87.9) on these standardized test sets. We then applied the entire system to all available MEDLINE abstracts for two target interactions with known effects. We find that while some directional relations are extracted with low ambiguity, others are apparently contradictory, at least when considered in an isolated context. When examined, it is apparent some are dependent upon the surrounding context (e.g. whether the relationship referred to short-term or long-term effects, or whether the focus was extracellular versus intracellular).

CONCLUSION

Thesaurus-based directional relation extraction can be done with reasonable accuracy, but is prone to false-positives on larger corpora due to noun modifiers. Furthermore, methods of resolving or disambiguating relationship context and contingencies are important for large-scale corpora.

摘要

背景

医学文献数据库(MEDLINE)中基因、化学物质、代谢产物、表型和疾病等实体之间的关系通常是有方向性的。也就是说,一个实体可能以正向或负向的方式影响另一个实体。因果关系和方向性的检测是拼凑通路以及检验实验结果可能含义的关键。由于生物医学文献的规模和增长,尽可能自动化这一过程变得越来越重要。

结果

在此,我们提出一种使用依赖图解析和支持向量机分类的关系提取方法。我们首先在来自基因标注(GENIA)的金标准语料库上测试支持向量机分类器,发现在这些标准化测试集上它达到了82%的精确率和94.8%的召回率(F值为87.9)。然后,我们将整个系统应用于所有可用的MEDLINE摘要,以研究两种具有已知效应的目标相互作用。我们发现,虽然一些方向性的关系被提取出来时歧义度较低,但其他一些关系显然相互矛盾,至少在孤立的语境中考虑时是这样。经检查发现,有些关系取决于周围的语境(例如,所提及的关系是短期效应还是长期效应,或者重点是细胞外还是细胞内)。

结论

基于词库的方向性关系提取可以达到合理的准确率,但由于名词修饰语的存在,在处理较大规模语料库时容易出现误报。此外,解决或消除关系语境和偶发事件歧义的方法对于大规模语料库很重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c408/2537562/b173f9ce0f08/1471-2105-9-S9-S11-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验