Suppr超能文献

从文献中提取 HIV 耐药性的因果关系。

Extracting causal relations on HIV drug resistance from literature.

机构信息

Computational Science, University of Amsterdam, Science Park 107, 1098 XG Amsterdam, The Netherlands.

出版信息

BMC Bioinformatics. 2010 Feb 23;11:101. doi: 10.1186/1471-2105-11-101.

Abstract

BACKGROUND

In HIV treatment it is critical to have up-to-date resistance data of applicable drugs since HIV has a very high rate of mutation. These data are made available through scientific publications and must be extracted manually by experts in order to be used by virologists and medical doctors. Therefore there is an urgent need for a tool that partially automates this process and is able to retrieve relations between drugs and virus mutations from literature.

RESULTS

In this work we present a novel method to extract and combine relationships between HIV drugs and mutations in viral genomes. Our extraction method is based on natural language processing (NLP) which produces grammatical relations and applies a set of rules to these relations. We applied our method to a relevant set of PubMed abstracts and obtained 2,434 extracted relations with an estimated performance of 84% for F-score. We then combined the extracted relations using logistic regression to generate resistance values for each <drug, mutation> pair. The results of this relation combination show more than 85% agreement with the Stanford HIVDB for the ten most frequently occurring mutations. The system is used in 5 hospitals from the Virolab project http://www.virolab.org to preselect the most relevant novel resistance data from literature and present those to virologists and medical doctors for further evaluation.

CONCLUSIONS

The proposed relation extraction and combination method has a good performance on extracting HIV drug resistance data. It can be used in large-scale relation extraction experiments. The developed methods can also be applied to extract other type of relations such as gene-protein, gene-disease, and disease-mutation.

摘要

背景

在 HIV 治疗中,拥有最新的适用药物耐药性数据至关重要,因为 HIV 的突变率非常高。这些数据通过科学出版物提供,必须由专家手动提取,以便病毒学家和医生使用。因此,迫切需要一种能够部分自动化该过程并能够从文献中检索药物与病毒突变之间关系的工具。

结果

在这项工作中,我们提出了一种从病毒基因组中提取和组合 HIV 药物与突变之间关系的新方法。我们的提取方法基于自然语言处理(NLP),它生成语法关系,并对这些关系应用一组规则。我们将我们的方法应用于一组相关的 PubMed 摘要,并获得了 2434 个提取关系,估计 F 分数的性能为 84%。然后,我们使用逻辑回归对提取关系进行组合,为每个 <药物,突变> 对生成耐药值。这种关系组合的结果与斯坦福 HIVDB 对于最常发生的 10 种突变的一致性超过 85%。该系统在来自 Virolab 项目的 5 家医院中使用,用于从文献中预先选择最相关的新型耐药数据,并将其呈现给病毒学家和医生进行进一步评估。

结论

所提出的关系提取和组合方法在提取 HIV 耐药数据方面具有良好的性能。它可以用于大规模的关系提取实验。开发的方法还可以应用于提取其他类型的关系,如基因-蛋白质、基因-疾病和疾病-突变。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4133/2841207/286d518bfff6/1471-2105-11-101-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验