Modi Salisu, Kasmiran Khairul Azhar, Mohd Sharef Nurfadhlina, Sharum Mohd Yunus
Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia; Department of Computer Science, Sokoto State University, Sokoto, Nigeria.
Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia.
J Biomed Inform. 2024 Mar;151:104603. doi: 10.1016/j.jbi.2024.104603. Epub 2024 Feb 6.
An adverse drug event (ADE) is any unfavorable effect that occurs due to the use of a drug. Extracting ADEs from unstructured clinical notes is essential to biomedical text extraction research because it helps with pharmacovigilance and patient medication studies.
From the considerable amount of clinical narrative text, natural language processing (NLP) researchers have developed methods for extracting ADEs and their related attributes. This work presents a systematic review of current methods.
Two biomedical databases have been searched from June 2022 until December 2023 for relevant publications regarding this review, namely the databases PubMed and Medline. Similarly, we searched the multi-disciplinary databases IEEE Xplore, Scopus, ScienceDirect, and the ACL Anthology. We adopted the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement guidelines and recommendations for reporting systematic reviews in conducting this review. Initially, we obtained 5,537 articles from the search results from the various databases between 2015 and 2023. Based on predefined inclusion and exclusion criteria for article selection, 100 publications have undergone full-text review, of which we consider 82 for our analysis.
We determined the general pattern for extracting ADEs from clinical notes, with named entity recognition (NER) and relation extraction (RE) being the dual tasks considered. Researchers that tackled both NER and RE simultaneously have approached ADE extraction as a "pipeline extraction" problem (n = 22), as a "joint task extraction" problem (n = 7), and as a "multi-task learning" problem (n = 6), while others have tackled only NER (n = 27) or RE (n = 20). We further grouped the reviews based on the approaches for data extraction, namely rule-based (n = 8), machine learning (n = 11), deep learning (n = 32), comparison of two or more approaches (n = 11), hybrid (n = 12) and large language models (n = 8). The most used datasets are MADE 1.0, TAC 2017 and n2c2 2018.
Extracting ADEs is crucial, especially for pharmacovigilance studies and patient medications. This survey showcases advances in ADE extraction research, approaches, datasets, and state-of-the-art performance in them. Challenges and future research directions are highlighted. We hope this review will guide researchers in gaining background knowledge and developing more innovative ways to address the challenges.
药物不良事件(ADE)是指因使用药物而发生的任何不良影响。从非结构化临床记录中提取ADE对于生物医学文本提取研究至关重要,因为它有助于药物警戒和患者用药研究。
自然语言处理(NLP)研究人员已从大量临床叙述文本中开发出提取ADE及其相关属性的方法。本文对当前方法进行了系统综述。
从2022年6月至2023年12月,检索了两个生物医学数据库,即PubMed和Medline数据库,以获取有关本综述的相关出版物。同样,我们还检索了多学科数据库IEEE Xplore、Scopus、ScienceDirect和ACL文集。在进行本综述时,我们采用了系统评价和Meta分析的首选报告项目(PRISMA)2020声明指南及报告系统评价的建议。最初,我们从2015年至2023年期间各个数据库的搜索结果中获得了5537篇文章。根据预先定义的文章选择纳入和排除标准,对100篇出版物进行了全文审查,其中我们纳入82篇进行分析。
我们确定了从临床记录中提取ADE的一般模式,其中命名实体识别(NER)和关系提取(RE)是所考虑的双重任务。同时处理NER和RE的研究人员将ADE提取视为“管道提取”问题(n = 22)、“联合任务提取”问题(n = 7)和“多任务学习”问题(n = 6),而其他研究人员仅处理NER(n = 27)或RE(n = 20)。我们根据数据提取方法进一步对综述进行了分组,即基于规则的(n = 8)、机器学习的(n = 11)、深度学习的(n = 32)、两种或多种方法比较的(n = 11)、混合的(n = 12)和大语言模型的(n = 8)。使用最多的数据集是MADE 1.0、TAC 2017和n2c2 2018。
提取ADE至关重要,特别是对于药物警戒研究和患者用药。本次调查展示了ADE提取研究、方法、数据集及其当前最佳性能方面的进展。突出了挑战和未来研究方向。我们希望本综述将指导研究人员获取背景知识并开发更具创新性的方法来应对挑战。