Suppr超能文献

针对一项整理任务的疾病、药物和基因之间的排名关系。

Ranking relations between diseases, drugs and genes for a curation task.

作者信息

Clematide Simon, Rinaldi Fabio

机构信息

Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, 8050 Zurich, Switzerland.

出版信息

J Biomed Semantics. 2012 Oct 5;3 Suppl 3(Suppl 3):S5. doi: 10.1186/2041-1480-3-S3-S5.

Abstract

BACKGROUND

One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Several large resources of curated relations between biomedical entities are currently available, such as the Pharmacogenomics Knowledge Base (PharmGKB) or the Comparative Toxicogenomics Database (CTD).Biomedical text mining systems, and in particular those which deal with the extraction of relationships among entities, could make better use of the wealth of already curated material.

RESULTS

We propose a simple and effective method based on logistic regression (also known as maximum entropy modeling) for an optimized ranking of relation candidates utilizing curated abstracts. Furthermore, we examine the effects and difficulties of using widely available metadata (i.e. MeSH terms and chemical substance index terms) for relation extraction. Cross-validation experiments result in an improvement of the ranking quality in terms of AUCiP/R by 39% (PharmGKB) and 116% (CTD) against a frequency-based baseline of 0.39 (PharmGKB) and 0.21 (CTD). For the TAP-10 metrics, we achieve an improvement of 53% (PharmGKB) and 134% (CTD) against the same baseline system (0.21 PharmGKB and 0.15 CTD).

CONCLUSIONS

Our experiments with the PharmGKB and the CTD database show a strong positive effect for the ranking of relation candidates utilizing the vast amount of curated relations covered by currently available knowledge databases. The tasks of concept identification and candidate relation generation profit from the adaptation to previously curated material. This presents an effective and practical method suitable for conservative extension and re-validation of biomedical relations from texts that has been successfully used for curation experiments with the PharmGKB and CTD database.

摘要

背景

生物医学文本挖掘系统有望从文献中提取的关键信息之一是不同类型生物医学实体(蛋白质、基因、疾病、药物等)之间的相互作用。目前有几个生物医学实体之间经过整理的关系的大型资源,例如药物基因组学知识库(PharmGKB)或比较毒理基因组学数据库(CTD)。生物医学文本挖掘系统,尤其是那些处理实体间关系提取的系统,可以更好地利用已整理的大量材料。

结果

我们提出了一种基于逻辑回归(也称为最大熵建模)的简单有效方法,用于利用经过整理的摘要对关系候选进行优化排序。此外,我们研究了使用广泛可用的元数据(即医学主题词和化学物质索引词)进行关系提取的效果和困难。交叉验证实验表明,与基于频率的基线(PharmGKB为0.39,CTD为0.21)相比,在AUCiP/R方面,排名质量提高了39%(PharmGKB)和116%(CTD)。对于TAP - 10指标,与相同的基线系统(PharmGKB为0.21,CTD为0.15)相比,我们实现了53%(PharmGKB)和134%(CTD)的提高。

结论

我们使用PharmGKB和CTD数据库进行的实验表明,利用当前可用知识数据库涵盖的大量经过整理的关系对关系候选进行排序具有很强的积极效果。概念识别和候选关系生成任务受益于对先前整理材料的适应。这提出了一种有效且实用的方法,适用于从文本中保守扩展和重新验证生物医学关系,该方法已成功用于PharmGKB和CTD数据库的整理实验。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/517b/3465213/d5bd33c16034/2041-1480-3-S3-S5-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验