Suppr超能文献

多种采样方案和深度学习可提高文献中药物-药物相互作用信息检索分析中的主动学习性能。

Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature.

机构信息

Department of Biomedical Informatics, Ohio State University, Columbus, OH, 43210, USA.

出版信息

J Biomed Semantics. 2023 May 30;14(1):5. doi: 10.1186/s13326-023-00287-7.

Abstract

BACKGROUND

Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper.

RESULTS

PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively.

CONCLUSIONS

By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.

摘要

背景

药物-药物相互作用(DDI)信息检索(IR)是从 PubMed 文献中进行的一项重要自然语言处理(NLP)任务。本文首次研究了主动学习(AL)在 DDI IR 分析中的应用。从 PubMed 摘要中进行 DDI IR 分析面临的挑战是,阳性 DDI 样本在大量阴性样本中相对较少。本文设计了随机负采样和正采样来提高 AL 分析的效率。本文展示了随机负采样和正采样的一致性。

结果

将 PubMed 摘要分为两个池。筛选池包含所有在 PubMed 中通过 DDI 关键字查询的摘要,而未筛选池则包含所有其他摘要。在指定的召回率为 0.95 时,评估并比较了 DDI IR 分析的精度。在使用支持向量机(SVM)的筛选池 IR 分析中,相似性采样加不确定性采样将精度从 0.89 提高到 0.92。在未筛选池 IR 分析中,综合随机负采样、正采样和相似性采样将精度从 0.72 提高到 0.81。当我们将 SVM 改为深度学习方法时,所有采样方案在筛选池和未筛选池中的 DDI AL 分析中都得到了一致的提高。深度学习在精度上对 SVM 有显著的提高,在筛选池中的精度为 0.96 对 0.92,在未筛选池中的精度为 0.90 对 0.81。

结论

通过将各种采样方案和深度学习算法集成到 AL 中,大大提高了文献中的 DDI IR 分析。随机负采样和正采样是在正负样本极不平衡的情况下提高 AL 分析效率的有效方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/096c/10228061/4f5da4897301/13326_2023_287_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验