Kim Siun, Choi Yoona, Won Jung-Hyun, Mi Oh Jung, Lee Howard
Department of Applied Biomedical Engineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea; Center for Convergence Approaches in Drug Development, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea.
Center for Convergence Approaches in Drug Development, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea; Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea.
J Biomed Inform. 2022 Feb;126:103985. doi: 10.1016/j.jbi.2022.103985. Epub 2022 Jan 7.
While drug-food interaction (DFI) may undermine the efficacy and safety of drugs, DFI detection has been difficult because a well-organized database for DFI did not exist. To construct a DFI database and build a natural language processing system extracting DFI from biomedical articles, we formulated the DFI extraction tasks and manually annotated texts that could have contained DFI information. In this article, we introduced a new annotated corpus for extracting DFI, the DFI corpus.
The DFI corpus contains 2270 abstracts of biomedical articles accessible through PubMed and 2498 sentences that contain DFI and/or drug-drug information (DDI), a substantial amount of information about drug/food entities, evidence-levels of abstracts and relations between named entities. BERT models pre-trained on the biomedical domain achieved a F1 score 55.0% in extracting DFI key-sentences. To the best of our knowledge, the DFI corpus is the largest public corpus for drug-food interaction.
Our corpus is available at https://github.com/ccadd-snu/corpus-for-DFI-extraction.
虽然药物-食物相互作用(DFI)可能会破坏药物的疗效和安全性,但由于不存在一个组织良好的DFI数据库,DFI检测一直很困难。为了构建一个DFI数据库并建立一个从生物医学文章中提取DFI的自然语言处理系统,我们制定了DFI提取任务并手动注释了可能包含DFI信息的文本。在本文中,我们介绍了一个用于提取DFI的新注释语料库,即DFI语料库。
DFI语料库包含2270篇可通过PubMed获取的生物医学文章摘要以及2498个包含DFI和/或药物-药物信息(DDI)的句子,大量关于药物/食物实体的信息、摘要的证据级别以及命名实体之间的关系。在生物医学领域预训练的BERT模型在提取DFI关键句子时的F1分数达到了55.0%。据我们所知,DFI语料库是最大的药物-食物相互作用公共语料库。
我们的语料库可在https://github.com/ccadd-snu/corpus-for-DFI-extraction获取。