自然语言处理算法在将临床文本片段映射到本体概念上的应用：系统评价及对未来研究的建议。

Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies.

机构信息

Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE, Amsterdam, The Netherlands.

Castor EDC, Amsterdam, The Netherlands.

出版信息

J Biomed Semantics. 2020 Nov 16;11(1):14. doi: 10.1186/s13326-020-00231-z.

DOI:10.1186/s13326-020-00231-z

PMID:33198814

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7670625/

Abstract

BACKGROUND

Free-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations.

METHODS

Two reviewers examined publications indexed by Scopus, IEEE, MEDLINE, EMBASE, the ACM Digital Library, and the ACL Anthology. Publications reporting on NLP for mapping clinical text from EHRs to ontology concepts were included. Year, country, setting, objective, evaluation and validation methods, NLP algorithms, terminology systems, dataset size and language, performance measures, reference standard, generalizability, operational use, and source code availability were extracted. The studies' objectives were categorized by way of induction. These results were used to define recommendations.

RESULTS

Two thousand three hundred fifty five unique studies were identified. Two hundred fifty six studies reported on the development of NLP algorithms for mapping free text to ontology concepts. Seventy-seven described development and evaluation. Twenty-two studies did not perform a validation on unseen data and 68 studies did not perform external validation. Of 23 studies that claimed that their algorithm was generalizable, 5 tested this by external validation. A list of sixteen recommendations regarding the usage of NLP systems and algorithms, usage of data, evaluation and validation, presentation of results, and generalizability of results was developed.

CONCLUSION

We found many heterogeneous approaches to the reporting on the development and evaluation of NLP algorithms that map clinical text to ontology concepts. Over one-fourth of the identified publications did not perform an evaluation. In addition, over one-fourth of the included studies did not perform a validation, and 88% did not perform external validation. We believe that our recommendations, alongside an existing reporting standard, will increase the reproducibility and reusability of future studies and NLP algorithms in medicine.

摘要

背景

电子健康记录（EHR）中的自由文本描述可用于临床研究和优化护理。然而，自由文本不能被计算机直接理解，因此其价值有限。自然语言处理（NLP）算法可以通过将本体概念附加到自由文本上来使其机器可理解。但是，NLP 算法的实现并没有得到一致的评估。因此，本研究的目的是审查当前用于开发和评估将临床文本片段映射到本体概念的 NLP 算法的方法。为了标准化算法的评估并减少研究之间的异质性，我们提出了一份建议清单。

方法

两名审阅者检查了 Scopus、IEEE、MEDLINE、EMBASE、ACM 数字图书馆和 ACL 文集索引的出版物。包括报告从 EHR 中的临床文本映射到本体概念的 NLP 的出版物。提取出版物的年份、国家、设置、目标、评估和验证方法、NLP 算法、术语系统、数据集大小和语言、性能指标、参考标准、通用性、操作性使用和源代码可用性。通过归纳对研究的目标进行分类。这些结果用于定义建议。

结果

确定了 2355 个独特的研究。256 项研究报告了用于将自由文本映射到本体概念的 NLP 算法的开发。77 项描述了开发和评估。22 项研究没有对未见数据进行验证，68 项研究没有进行外部验证。在 23 项声称其算法具有通用性的研究中，有 5 项通过外部验证进行了测试。制定了关于 NLP 系统和算法的使用、数据的使用、评估和验证、结果的呈现以及结果的通用性的十六项建议清单。

结论

我们发现，在报告用于将临床文本映射到本体概念的 NLP 算法的开发和评估方面，存在许多异构方法。超过四分之一的已确定出版物未进行评估。此外，超过四分之一的纳入研究没有进行验证，88%的研究没有进行外部验证。我们相信，我们的建议以及现有的报告标准将提高未来医学中 NLP 算法和研究的可重复性和可重用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d1c/7670625/f84bba1c726f/13326_2020_231_Fig1_HTML.jpg

相似文献

Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies.

J Biomed Semantics. 2020 Nov 16;11(1):14. doi: 10.1186/s13326-020-00231-z.

Ensembles of natural language processing systems for portable phenotyping solutions.

J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review.

Artif Intell Med. 2023 Dec;146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1.

Systematic review of current natural language processing methods and applications in cardiology.

Heart. 2022 May 25;108(12):909-916. doi: 10.1136/heartjnl-2021-319769.

Natural language processing to identify lupus nephritis phenotype in electronic health records.

BMC Med Inform Decis Mak. 2024 Mar 3;22(Suppl 2):348. doi: 10.1186/s12911-024-02420-7.

Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.

J Med Internet Res. 2021 Jan 26;23(1):e24594. doi: 10.2196/24594.

Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.

J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379. doi: 10.1093/jamia/ocy173.

Identification of Preanesthetic History Elements by a Natural Language Processing Engine.

Anesth Analg. 2022 Dec 1;135(6):1162-1171. doi: 10.1213/ANE.0000000000006152. Epub 2022 Jul 15.

Development and evaluation of task-specific NLP framework in China.

Stud Health Technol Inform. 2015;216:1031.

引用本文的文献

Ontologies as the semantic bridge between artificial intelligence and healthcare.

Front Digit Health. 2025 Aug 29;7:1668385. doi: 10.3389/fdgth.2025.1668385. eCollection 2025.

Uncertainties in outcome modelling in radiation oncology.

Phys Imaging Radiat Oncol. 2025 May 7;34:100774. doi: 10.1016/j.phro.2025.100774. eCollection 2025 Apr.

Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters.

Wellcome Open Res. 2022 Oct 25;6:177. doi: 10.12688/wellcomeopenres.16867.3. eCollection 2021.

An automatic pipeline for temporal monitoring of radiotherapy-induced toxicities in head and neck cancer patients.

NPJ Precis Oncol. 2025 Feb 7;9(1):40. doi: 10.1038/s41698-025-00824-w.

Processing of Short-Form Content in Clinical Narratives: Systematic Scoping Review.

J Med Internet Res. 2024 Sep 26;26:e57852. doi: 10.2196/57852.

Natural Language Processing Algorithm to Extract Multiple Myeloma Stage From Oncology Notes in the Veterans Affairs Healthcare System.

JCO Clin Cancer Inform. 2024 Jul;8:e2300197. doi: 10.1200/CCI.23.00197.

Annotation-preserving machine translation of English corpora to validate Dutch clinical concept extraction tools.

J Am Med Inform Assoc. 2024 Aug 1;31(8):1725-1734. doi: 10.1093/jamia/ocae159.

MetaTron: advancing biomedical annotation empowering relation annotation and collaboration.

BMC Bioinformatics. 2024 Mar 14;25(1):112. doi: 10.1186/s12859-024-05730-9.

Toward Clinical-Grade Evaluation of Large Language Models.

Int J Radiat Oncol Biol Phys. 2024 Mar 15;118(4):916-920. doi: 10.1016/j.ijrobp.2023.11.012. Epub 2024 Feb 22.

The role of artificial intelligence in hastening time to recruitment in clinical trials.

BJR Open. 2023 May 16;5(1):20220023. doi: 10.1259/bjro.20220023. eCollection 2023.

本文引用的文献

Development of a Natural Language Processing Tool to Extract Radiation Treatment Sites.

Cureus. 2019 Oct 28;11(10):e6010. doi: 10.7759/cureus.6010.

Knowledge-based best of breed approach for automated detection of clinical events based on German free text digital hospital discharge letters.

PLoS One. 2019 Nov 27;14(11):e0224916. doi: 10.1371/journal.pone.0224916. eCollection 2019.

Data-driven method to enhance craniofacial and oral phenotype vocabularies.

J Am Dent Assoc. 2019 Nov;150(11):933-939.e2. doi: 10.1016/j.adaj.2019.05.029.

Ensembles of natural language processing systems for portable phenotyping solutions.

J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.

Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES.

J Biomed Semantics. 2019 Sep 18;10(1):14. doi: 10.1186/s13326-019-0207-3.

Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.

JMIR Med Inform. 2019 Sep 12;7(3):e14830. doi: 10.2196/14830.

Comparing Artificial Intelligence Approaches to Retrieve Clinical Reports Documenting Implantable Devices Posing MRI Safety Risks.

J Am Coll Radiol. 2020 Feb;17(2):272-279. doi: 10.1016/j.jacr.2019.07.018. Epub 2019 Aug 12.

Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies.

J Am Med Inform Assoc. 2019 Nov 1;26(11):1364-1369. doi: 10.1093/jamia/ocz068.

Natural language processing of German clinical colorectal cancer notes for guideline-based treatment evaluation.

Int J Med Inform. 2019 Jul;127:141-146. doi: 10.1016/j.ijmedinf.2019.04.022. Epub 2019 Apr 25.

Clinical text classification with rule-based features and knowledge-guided convolutional neural networks.

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):71. doi: 10.1186/s12911-019-0781-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

自然语言处理算法在将临床文本片段映射到本体概念上的应用：系统评价及对未来研究的建议。

Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献