一个来自生物医学文章的带注释语料库，用于构建药物-食物相互作用数据库。

An annotated corpus from biomedical articles to construct a drug-food interaction database.

作者信息

Kim Siun, Choi Yoona, Won Jung-Hyun, Mi Oh Jung, Lee Howard

机构信息

Department of Applied Biomedical Engineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea; Center for Convergence Approaches in Drug Development, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea.

Center for Convergence Approaches in Drug Development, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea; Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea.

出版信息

J Biomed Inform. 2022 Feb;126:103985. doi: 10.1016/j.jbi.2022.103985. Epub 2022 Jan 7.

DOI:10.1016/j.jbi.2022.103985

PMID:35007753

Abstract

MOTIVATION

While drug-food interaction (DFI) may undermine the efficacy and safety of drugs, DFI detection has been difficult because a well-organized database for DFI did not exist. To construct a DFI database and build a natural language processing system extracting DFI from biomedical articles, we formulated the DFI extraction tasks and manually annotated texts that could have contained DFI information. In this article, we introduced a new annotated corpus for extracting DFI, the DFI corpus.

RESULTS

The DFI corpus contains 2270 abstracts of biomedical articles accessible through PubMed and 2498 sentences that contain DFI and/or drug-drug information (DDI), a substantial amount of information about drug/food entities, evidence-levels of abstracts and relations between named entities. BERT models pre-trained on the biomedical domain achieved a F1 score 55.0% in extracting DFI key-sentences. To the best of our knowledge, the DFI corpus is the largest public corpus for drug-food interaction.

AVAILABILITY AND IMPLEMENTATION

Our corpus is available at https://github.com/ccadd-snu/corpus-for-DFI-extraction.

摘要

动机

虽然药物-食物相互作用（DFI）可能会破坏药物的疗效和安全性，但由于不存在一个组织良好的DFI数据库，DFI检测一直很困难。为了构建一个DFI数据库并建立一个从生物医学文章中提取DFI的自然语言处理系统，我们制定了DFI提取任务并手动注释了可能包含DFI信息的文本。在本文中，我们介绍了一个用于提取DFI的新注释语料库，即DFI语料库。

结果

DFI语料库包含2270篇可通过PubMed获取的生物医学文章摘要以及2498个包含DFI和/或药物-药物信息（DDI）的句子，大量关于药物/食物实体的信息、摘要的证据级别以及命名实体之间的关系。在生物医学领域预训练的BERT模型在提取DFI关键句子时的F1分数达到了55.0%。据我们所知，DFI语料库是最大的药物-食物相互作用公共语料库。

可用性和实现方式

我们的语料库可在https://github.com/ccadd-snu/corpus-for-DFI-extraction获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一个来自生物医学文章的带注释语料库，用于构建药物-食物相互作用数据库。

An annotated corpus from biomedical articles to construct a drug-food interaction database.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现方式

相似文献

引用本文的文献

一个来自生物医学文章的带注释语料库，用于构建药物-食物相互作用数据库。

An annotated corpus from biomedical articles to construct a drug-food interaction database.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现方式

相似文献

引用本文的文献