• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

慢性病临床记录的自然语言处理:系统综述

Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.

作者信息

Sheikhalishahi Seyedmostafa, Miotto Riccardo, Dudley Joel T, Lavelli Alberto, Rinaldi Fabio, Osmani Venet

机构信息

eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy.

Department of Information Engineering and Computer Science, University of Trento, Trento, Italy.

出版信息

JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.

DOI:10.2196/12239
PMID:31066697
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6528438/
Abstract

BACKGROUND

Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset.

OBJECTIVE

The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives.

METHODS

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles.

RESULTS

Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes.

CONCLUSIONS

Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/a5d30e969626/medinform_v7i2e12239_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/c66c42d890a4/medinform_v7i2e12239_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/a3c7af5c093f/medinform_v7i2e12239_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/690e4990dfc6/medinform_v7i2e12239_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/6da4510ddd85/medinform_v7i2e12239_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/a5d30e969626/medinform_v7i2e12239_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/c66c42d890a4/medinform_v7i2e12239_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/a3c7af5c093f/medinform_v7i2e12239_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/690e4990dfc6/medinform_v7i2e12239_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/6da4510ddd85/medinform_v7i2e12239_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ca/6528438/a5d30e969626/medinform_v7i2e12239_fig5.jpg
摘要

背景

鉴于慢性病在全球人口中的发病率不断上升,慢性病领域需要补充并超越循证医学的新方法。一个有前景的途径是电子健康记录(EHR)的二次利用,即对患者数据进行分析以开展临床和转化研究。基于机器学习处理EHR的方法有助于更好地理解患者的临床轨迹和慢性病风险预测,为获取此前未知的临床见解创造了独特机会。然而,大量临床病史仍隐藏在自由格式文本的临床叙述中。因此,要充分发挥EHR数据的潜力,取决于自然语言处理(NLP)方法的发展,以自动将临床文本转化为可指导临床决策并可能延缓或预防疾病发作的结构化临床数据。

目的

本研究的目的是全面概述应用于慢性病相关自由文本临床记录的NLP方法的发展和应用情况,包括调查NLP方法在理解临床叙述时面临的挑战。

方法

遵循系统评价和Meta分析的首选报告项目(PRISMA)指南,在5个数据库中进行检索,使用“临床记录”“自然语言处理”和“慢性病”及其变体作为关键词,以最大限度地覆盖相关文章。

结果

在考虑的2652篇文章中,106篇符合纳入标准。对纳入论文的审查确定了43种慢性病,然后使用国际疾病分类第十版将其进一步分为10个疾病类别。大多数研究集中在循环系统疾病(n = 38),而内分泌和代谢疾病最少(n = 14)。这是由于与代谢疾病相关的临床记录结构,其通常包含更多结构化数据,而循环系统疾病的医疗记录更多地关注非结构化数据,因此NLP的关注重点更强。审查表明,与基于规则的方法相比,机器学习方法的使用显著增加;然而,深度学习方法仍处于起步阶段(n = 3)。因此,大多数研究集中在疾病表型分类上,只有少数论文涉及从自由文本中提取合并症或临床记录与结构化数据的整合。由于预测的可解释性,相对简单的方法(如浅层分类器或与基于规则的方法结合)得到了显著应用,这对更复杂的方法来说仍然是一个重大问题。最后,公开可用数据的稀缺也可能导致了更先进方法(如从临床记录中提取词嵌入)的开发不足。

结论

仍需努力改进:(1)临床NLP方法从提取向理解的进展;(2)对实体之间关系而非孤立实体的识别;(3)时间提取以理解过去、当前和未来的临床事件;(4)利用替代临床知识来源;(5)大规模、去识别化临床语料库的可用性。

相似文献

1
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.慢性病临床记录的自然语言处理:系统综述
JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Machine Learning and Natural Language Processing in Mental Health: Systematic Review.机器学习和自然语言处理在心理健康中的应用:系统综述。
J Med Internet Res. 2021 May 4;23(5):e15708. doi: 10.2196/15708.
4
Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review.电子健康记录中与医疗决策相关的自然语言处理:一项系统综述。
Comput Biol Med. 2023 Mar;155:106649. doi: 10.1016/j.compbiomed.2023.106649. Epub 2023 Feb 10.
5
Extracting social determinants of health from electronic health records using natural language processing: a systematic review.利用自然语言处理从电子健康记录中提取健康的社会决定因素:系统评价。
J Am Med Inform Assoc. 2021 Nov 25;28(12):2716-2727. doi: 10.1093/jamia/ocab170.
6
Processing of Short-Form Content in Clinical Narratives: Systematic Scoping Review.临床叙事中短格式内容的处理:系统范围综述。
J Med Internet Res. 2024 Sep 26;26:e57852. doi: 10.2196/57852.
7
Clinical Text Data in Machine Learning: Systematic Review.机器学习中的临床文本数据:系统综述
JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.
8
Extraction of Information Related to Drug Safety Surveillance From Electronic Health Record Notes: Joint Modeling of Entities and Relations Using Knowledge-Aware Neural Attentive Models.从电子健康记录笔记中提取与药物安全监测相关的信息:使用知识感知神经注意力模型对实体和关系进行联合建模
JMIR Med Inform. 2020 Jul 10;8(7):e18417. doi: 10.2196/18417.
9
Extracting adverse drug events from clinical Notes: A systematic review of approaches used.从临床记录中提取药物不良事件:对所用方法的系统评价
J Biomed Inform. 2024 Mar;151:104603. doi: 10.1016/j.jbi.2024.104603. Epub 2024 Feb 6.
10
Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review.使用机器学习方法进行自然语言处理,以分析来自电子健康记录的非结构化患者报告结局:系统评价。
Artif Intell Med. 2023 Dec;146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1.

引用本文的文献

1
Artificial Intelligence in Nephrology: Pioneering Precision with Multimodal Intelligence.肾脏病学中的人工智能:借助多模态智能开创精准医学
Indian J Nephrol. 2025 Jul-Aug;35(4):470-479. doi: 10.25259/IJN_496_2024. Epub 2025 May 8.
2
Predicting 30-day hospital readmissions using ClinicalT5 with structured and unstructured electronic health records.使用ClinicalT5结合结构化和非结构化电子健康记录预测30天再入院情况。
PLoS One. 2025 Sep 2;20(9):e0328848. doi: 10.1371/journal.pone.0328848. eCollection 2025.
3
Supervised machine learning applied in nursing notes for identifying the need of childhood cancer patients for psychosocial support.

本文引用的文献

1
Opportunities and obstacles for deep learning in biology and medicine.深度学习在生物学和医学中的机遇与挑战。
J R Soc Interface. 2018 Apr;15(141). doi: 10.1098/rsif.2017.0387.
2
Natural language processing of clinical notes for identification of critical limb ischemia.临床记录的自然语言处理以识别严重肢体缺血。
Int J Med Inform. 2018 Mar;111:83-89. doi: 10.1016/j.ijmedinf.2017.12.024. Epub 2017 Dec 28.
3
Information extraction from Italian medical reports: An ontology-driven approach.从意大利医疗报告中提取信息:一种基于本体的方法。
监督式机器学习应用于护理记录,以识别儿童癌症患者对心理社会支持的需求。
Front Digit Health. 2025 Aug 7;7:1585309. doi: 10.3389/fdgth.2025.1585309. eCollection 2025.
4
Artificial Intelligence in Hypertrophic Cardiomyopathy: Advances, Challenges, and Future Directions for Personalized Risk Prediction and Management.肥厚型心肌病中的人工智能:个性化风险预测与管理的进展、挑战及未来方向
Cureus. 2025 Jul 14;17(7):e87907. doi: 10.7759/cureus.87907. eCollection 2025 Jul.
5
Enhanced global oil spill dataset from 1967 to 2023 based on text-form incident information.基于文本形式的事故信息生成的1967年至2023年全球石油泄漏增强数据集。
Sci Data. 2025 Aug 8;12(1):1394. doi: 10.1038/s41597-025-05601-9.
6
Current Landscape and Future Directions Regarding Generative Large Language Models in Stroke Care: Scoping Review.中风护理中生成式大语言模型的当前现状与未来方向:范围综述
JMIR Med Inform. 2025 Aug 7;13:e76636. doi: 10.2196/76636.
7
Comparing artificial intelligence- vs clinician-authored summaries of simulated primary care electronic health records.比较人工智能撰写的与临床医生撰写的模拟初级保健电子健康记录摘要。
JAMIA Open. 2025 Jul 30;8(4):ooaf082. doi: 10.1093/jamiaopen/ooaf082. eCollection 2025 Aug.
8
A modular pipeline for natural language processing-screened human abstraction of a pragmatic trial outcome from electronic health records.一种用于自然语言处理的模块化管道——从电子健康记录中筛选出实用试验结果的人工摘要。
medRxiv. 2025 Jun 24:2025.06.23.25330134. doi: 10.1101/2025.06.23.25330134.
9
A machine learning model using clinical notes to identify physician fatigue.一种利用临床记录来识别医生疲劳的机器学习模型。
Nat Commun. 2025 Jul 1;16(1):5791. doi: 10.1038/s41467-025-60865-4.
10
Evaluation of Facebook as a Longitudinal Data Source for Parkinson's Disease Insights.将Facebook作为帕金森病洞察的纵向数据源的评估。
J Clin Med. 2025 Jun 10;14(12):4093. doi: 10.3390/jcm14124093.
Int J Med Inform. 2018 Mar;111:140-148. doi: 10.1016/j.ijmedinf.2017.12.013. Epub 2017 Dec 23.
4
Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning.预测未来一年内高血压的发病情况:一项使用全州电子健康记录和机器学习的前瞻性研究。
J Med Internet Res. 2018 Jan 30;20(1):e22. doi: 10.2196/jmir.9268.
5
Automating Quality Measures for Heart Failure Using Natural Language Processing: A Descriptive Study in the Department of Veterans Affairs.使用自然语言处理技术实现心力衰竭质量指标自动化:退伍军人事务部的一项描述性研究。
JMIR Med Inform. 2018 Jan 15;6(1):e5. doi: 10.2196/medinform.9150.
6
Monitoring prescribing patterns using regression and electronic health records.使用回归和电子健康记录监测处方模式。
BMC Med Inform Decis Mak. 2017 Dec 19;17(1):175. doi: 10.1186/s12911-017-0575-5.
7
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.
8
Word2Vec inversion and traditional text classifiers for phenotyping lupus.用于狼疮表型分析的词向量反演和传统文本分类器
BMC Med Inform Decis Mak. 2017 Aug 22;17(1):126. doi: 10.1186/s12911-017-0518-1.
9
TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records.TEPAPA:一种新的基于计算机的特征学习管道,用于从基于文本的电子病历中挖掘预后和关联因素。
Sci Rep. 2017 Jul 31;7(1):6918. doi: 10.1038/s41598-017-07111-0.
10
A comparison of rule-based and machine learning approaches for classifying patient portal messages.基于规则和机器学习方法在患者门户消息分类中的比较。
Int J Med Inform. 2017 Sep;105:110-120. doi: 10.1016/j.ijmedinf.2017.06.004. Epub 2017 Jun 23.