Hossain Elias, Rana Rajib, Higgins Niall, Soar Jeffrey, Barua Prabal Datta, Pisani Anthony R, Turner Kathryn
School of Engineering & Physical Sciences, North South University, Dhaka 1229, Bangladesh.
School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield Central QLD 4300, Australia.
Comput Biol Med. 2023 Mar;155:106649. doi: 10.1016/j.compbiomed.2023.106649. Epub 2023 Feb 10.
Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively.
After screening 261 articles from 11 databases, we included 127 papers for full-text review covering seven categories of articles: (1) medical note classification, (2) clinical entity recognition, (3) text summarisation, (4) deep learning (DL) and transfer learning architecture, (5) information extraction, (6) Medical language translation and (7) other NLP applications. This study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
EHR was the most commonly used data type among the selected articles, and the datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification being the most common application of ML or DL. The most common use cases were: the International Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity recognition (NER) for clinical descriptions and research on psychiatric disorders.
We find that the adopted ML models were not adequately assessed. In addition, the data imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts, perinatal self-harmed and ICD-9 classification.
自然语言处理(NLP)被广泛用于从电子健康记录(EHR)中提取临床见解。然而,缺乏带注释的数据、自动化工具以及其他挑战阻碍了NLP在EHR中的充分利用。人们研究并比较了各种机器学习(ML)、深度学习(DL)和NLP技术,以全面了解该领域的局限性和机遇。
在筛选了来自11个数据库的261篇文章后,我们纳入了127篇进行全文审查的论文,涵盖七类文章:(1)医学笔记分类,(2)临床实体识别,(3)文本摘要,(4)深度学习(DL)和迁移学习架构,(5)信息提取,(6)医学语言翻译,以及(7)其他NLP应用。本研究遵循系统评价和Meta分析的首选报告项目(PRISMA)指南。
EHR是所选文章中最常用的数据类型,并且数据集主要是非结构化的。使用了各种ML和DL方法,预测或分类是ML或DL最常见的应用。最常见的用例是:国际疾病分类第九版(ICD-9)分类、临床笔记分析以及用于临床描述和精神疾病研究的命名实体识别(NER)。
我们发现所采用的ML模型没有得到充分评估。此外,数据不平衡问题相当重要,但我们必须找到解决这一潜在问题的技术。未来的研究应解决研究中的关键局限性,主要是识别狼疮性肾炎、自杀未遂、围产期自我伤害和ICD-9分类。