基于自然语言处理的临床文档自动编码

Automated encoding of clinical documents based on natural language processing.

作者信息

Friedman Carol, Shagina Lyudmila, Lussier Yves, Hripcsak George

机构信息

Department of Biomedical Informatics, Columbia University, 622 West 168 Street, VC-5, New York, NY 10032, USA.

出版信息

J Am Med Inform Assoc. 2004 Sep-Oct;11(5):392-402. doi: 10.1197/jamia.M1552. Epub 2004 Jun 7.

DOI:10.1197/jamia.M1552

PMID:15187068

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC516246/

Abstract

OBJECTIVE

The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method.

METHODS

An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts.

RESULTS

Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91.

CONCLUSION

Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.

摘要

目的

本研究旨在开发一种基于自然语言处理（NLP）的方法，该方法能自动将整个临床文档映射为带有修饰符的编码，并对该方法进行定量评估。

方法

对现有的NLP系统MedLEE进行调整，以自动生成编码。该方法包括将MedLEE生成的由发现和修饰符组成的结构化输出进行匹配，以获得最具体的编码。在两项独立研究中，对应用于统一医学语言系统（UMLS）编码的召回率和精确率进行了评估。使用150个随机选择的句子组成的测试集来测量召回率，这些句子使用MedLEE进行处理。将结果与由七位专家手动确定的参考标准进行比较。使用另一个由150个随机选择的句子组成的测试集来测量精确率，通过该方法自动生成UMLS编码，然后由专家进行验证。

结果

该系统对所有术语进行UMLS编码的召回率为0.77（95%置信区间0.72 - 0.81），对于有相应UMLS编码的术语，召回率为0.83（0.79 - 0.87）。该系统提取所有术语的召回率为0.84（0.81 - 0.88）。专家提取术语的召回率范围为0.69至0.91。该系统的精确率为0.89（0.87 - 0.91），专家的精确率范围为0.61至0.91。

结论

使用基于NLP的方法完成了相关临床信息的提取和UMLS编码。该方法似乎与六位专家相当或优于他们。该方法的优点是它将文本与其他相关信息一起映射为编码，使编码输出适合有效检索。

相似文献

Automated encoding of clinical documents based on natural language processing.

J Am Med Inform Assoc. 2004 Sep-Oct;11(5):392-402. doi: 10.1197/jamia.M1552. Epub 2004 Jun 7.

Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon.

J Am Med Inform Assoc. 2005 May-Jun;12(3):275-85. doi: 10.1197/jamia.M1695. Epub 2005 Jan 31.

A pilot study of contextual UMLS indexing to improve the precision of concept-based representation in XML-structured clinical radiology reports.

J Am Med Inform Assoc. 2003 Nov-Dec;10(6):580-7. doi: 10.1197/jamia.M1369. Epub 2003 Aug 4.

Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS.

J Am Med Inform Assoc. 2001 Nov-Dec;8(6):598-609. doi: 10.1136/jamia.2001.0080598.

UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text.

J Biomed Inform. 2010 Aug;43(4):587-94. doi: 10.1016/j.jbi.2010.02.005. Epub 2010 Feb 10.

Automating SNOMED coding using medical language understanding: a feasibility study.

Proc AMIA Symp. 2001:418-22.

Automated outcome classification of emergency department computed tomography imaging reports.

Acad Emerg Med. 2013 Aug;20(8):848-54. doi: 10.1111/acem.12174.

Ontology-driven and weakly supervised rare disease identification from clinical notes.

BMC Med Inform Decis Mak. 2023 May 5;23(1):86. doi: 10.1186/s12911-023-02181-9.

Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text.

J Med Internet Res. 2015 Aug 31;17(8):e212. doi: 10.2196/jmir.4612.

Indexing anatomical phrases in neuro-radiology reports to the UMLS 2005AA.

AMIA Annu Symp Proc. 2005;2005:26-30.

引用本文的文献

MPSE identifies newborns for whole genome sequencing within 48 h of NICU admission.

NPJ Genom Med. 2025 Jun 12;10(1):47. doi: 10.1038/s41525-025-00506-3.

Identifying abdominal aortic aneurysm size and presence using Natural Language Processing of radiology reports: a systematic review and meta-analysis.

Abdom Radiol (NY). 2025 Jan 30. doi: 10.1007/s00261-025-04810-5.

Evaluating Large Language Models for Automated CPT Code Prediction in Endovascular Neurosurgery.

J Med Syst. 2025 Jan 24;49(1):15. doi: 10.1007/s10916-025-02149-4.

Using a natural language processing toolkit to classify electronic health records by psychiatric diagnosis.

Health Informatics J. 2024 Oct-Dec;30(4):14604582241296411. doi: 10.1177/14604582241296411.

Perspectives toward the application of Artificial Intelligence in anesthesiology-related practices in Saudi Arabia: A cross-sectional study of physicians views.

Health Sci Rep. 2024 Oct 14;7(10):e70099. doi: 10.1002/hsr2.70099. eCollection 2024 Oct.

Automating surgical procedure extraction for society of surgeons adult cardiac surgery registry using pretrained language models.

JAMIA Open. 2024 Jul 24;7(3):ooae054. doi: 10.1093/jamiaopen/ooae054. eCollection 2024 Oct.

Current Applications of Artificial Intelligence in Billing Practices and Clinical Plastic Surgery.

Plast Reconstr Surg Glob Open. 2024 Jul 1;12(7):e5939. doi: 10.1097/GOX.0000000000005939. eCollection 2024 Jul.

Natural Language Processing for Clinical Laboratory Data Repository Systems: Implementation and Evaluation for Respiratory Viruses.

JMIR AI. 2023 Jun 6;2:e44835. doi: 10.2196/44835.

Opportunities for the use of large language models in hepatology.

Clin Liver Dis (Hoboken). 2023 Sep 13;22(5):171-176. doi: 10.1097/CLD.0000000000000075. eCollection 2023 Nov.

Artificial Intelligence-Enabled Software Prototype to Inform Opioid Pharmacovigilance From Electronic Health Records: Development and Usability Study.

JMIR AI. 2023 Jan-Dec;2:e45000. doi: 10.2196/45000. Epub 2023 Jul 18.

本文引用的文献

IndexFinder: a method of extracting key concepts from clinical texts for indexing.

AMIA Annu Symp Proc. 2003;2003:763-7.

Towards linking patients and clinical information: detecting UMLS concepts in e-mail.

J Biomed Inform. 2003 Aug-Oct;36(4-5):334-41. doi: 10.1016/j.jbi.2003.09.017.

A pilot study of contextual UMLS indexing to improve the precision of concept-based representation in XML-structured clinical radiology reports.

J Am Med Inform Assoc. 2003 Nov-Dec;10(6):580-7. doi: 10.1197/jamia.M1369. Epub 2003 Aug 4.

Finding UMLS Metathesaurus concepts in MEDLINE.

Proc AMIA Symp. 2002:727-31.

Exploring text mining from MEDLINE.

Proc AMIA Symp. 2002:722-6.

Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS.

J Am Med Inform Assoc. 2002 Nov-Dec;9(6):621-36. doi: 10.1197/jamia.m1101.

A simple algorithm for identifying negated findings and diseases in discharge summaries.

J Biomed Inform. 2001 Oct;34(5):301-10. doi: 10.1006/jbin.2001.1029.

Selective automated indexing of findings and diagnoses in radiology reports.

J Biomed Inform. 2001 Aug;34(4):262-73. doi: 10.1006/jbin.2001.1025.

Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method.

J Biomed Inform. 2001 Aug;34(4):249-61. doi: 10.1006/jbin.2001.1023.

Automating SNOMED coding using medical language understanding: a feasibility study.

Proc AMIA Symp. 2001:418-22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于自然语言处理的临床文档自动编码

Automated encoding of clinical documents based on natural language processing.

作者信息

Friedman Carol, Shagina Lyudmila, Lussier Yves, Hripcsak George

机构信息

Department of Biomedical Informatics, Columbia University, 622 West 168 Street, VC-5, New York, NY 10032, USA.

出版信息

J Am Med Inform Assoc. 2004 Sep-Oct;11(5):392-402. doi: 10.1197/jamia.M1552. Epub 2004 Jun 7.

DOI:10.1197/jamia.M1552

PMID:15187068

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC516246/

Abstract

OBJECTIVE

METHODS

RESULTS

CONCLUSION

摘要

目的

本研究旨在开发一种基于自然语言处理（NLP）的方法，该方法能自动将整个临床文档映射为带有修饰符的编码，并对该方法进行定量评估。

基于自然语言处理的临床文档自动编码

Automated encoding of clinical documents based on natural language processing.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于自然语言处理的临床文档自动编码

Automated encoding of clinical documents based on natural language processing.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论