Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE, Amsterdam, The Netherlands.
Castor EDC, Amsterdam, The Netherlands.
J Biomed Semantics. 2020 Nov 16;11(1):14. doi: 10.1186/s13326-020-00231-z.
Free-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations.
Two reviewers examined publications indexed by Scopus, IEEE, MEDLINE, EMBASE, the ACM Digital Library, and the ACL Anthology. Publications reporting on NLP for mapping clinical text from EHRs to ontology concepts were included. Year, country, setting, objective, evaluation and validation methods, NLP algorithms, terminology systems, dataset size and language, performance measures, reference standard, generalizability, operational use, and source code availability were extracted. The studies' objectives were categorized by way of induction. These results were used to define recommendations.
Two thousand three hundred fifty five unique studies were identified. Two hundred fifty six studies reported on the development of NLP algorithms for mapping free text to ontology concepts. Seventy-seven described development and evaluation. Twenty-two studies did not perform a validation on unseen data and 68 studies did not perform external validation. Of 23 studies that claimed that their algorithm was generalizable, 5 tested this by external validation. A list of sixteen recommendations regarding the usage of NLP systems and algorithms, usage of data, evaluation and validation, presentation of results, and generalizability of results was developed.
We found many heterogeneous approaches to the reporting on the development and evaluation of NLP algorithms that map clinical text to ontology concepts. Over one-fourth of the identified publications did not perform an evaluation. In addition, over one-fourth of the included studies did not perform a validation, and 88% did not perform external validation. We believe that our recommendations, alongside an existing reporting standard, will increase the reproducibility and reusability of future studies and NLP algorithms in medicine.
电子健康记录(EHR)中的自由文本描述可用于临床研究和优化护理。然而,自由文本不能被计算机直接理解,因此其价值有限。自然语言处理(NLP)算法可以通过将本体概念附加到自由文本上来使其机器可理解。但是,NLP 算法的实现并没有得到一致的评估。因此,本研究的目的是审查当前用于开发和评估将临床文本片段映射到本体概念的 NLP 算法的方法。为了标准化算法的评估并减少研究之间的异质性,我们提出了一份建议清单。
两名审阅者检查了 Scopus、IEEE、MEDLINE、EMBASE、ACM 数字图书馆和 ACL 文集索引的出版物。包括报告从 EHR 中的临床文本映射到本体概念的 NLP 的出版物。提取出版物的年份、国家、设置、目标、评估和验证方法、NLP 算法、术语系统、数据集大小和语言、性能指标、参考标准、通用性、操作性使用和源代码可用性。通过归纳对研究的目标进行分类。这些结果用于定义建议。
确定了 2355 个独特的研究。256 项研究报告了用于将自由文本映射到本体概念的 NLP 算法的开发。77 项描述了开发和评估。22 项研究没有对未见数据进行验证,68 项研究没有进行外部验证。在 23 项声称其算法具有通用性的研究中,有 5 项通过外部验证进行了测试。制定了关于 NLP 系统和算法的使用、数据的使用、评估和验证、结果的呈现以及结果的通用性的十六项建议清单。
我们发现,在报告用于将临床文本映射到本体概念的 NLP 算法的开发和评估方面,存在许多异构方法。超过四分之一的已确定出版物未进行评估。此外,超过四分之一的纳入研究没有进行验证,88%的研究没有进行外部验证。我们相信,我们的建议以及现有的报告标准将提高未来医学中 NLP 算法和研究的可重复性和可重用性。