Suppr
超能文献

监督式文本分类系统在电子病历中的 Fontan 患者检测准确率高于编码。

Supervised Text Classification System Detects Fontan Patients in Electronic Records With Higher Accuracy Than Codes.

机构信息

Department of Biomedical Informatics, School of Medicine Emory University Atlanta GA.

Vanderbilt University Medical Center Vanderbilt University Nashville TN.

出版信息

J Am Heart Assoc. 2023 Jul 4;12(13):e030046. doi: 10.1161/JAHA.123.030046. Epub 2023 Jun 22.

DOI:10.1161/JAHA.123.030046

PMID:37345821

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10356083/

Abstract

Background The Fontan operation is associated with significant morbidity and premature mortality. Fontan cases cannot always be identified by () codes, making it challenging to create large Fontan patient cohorts. We sought to develop natural language processing-based machine learning models to automatically detect Fontan cases from free texts in electronic health records, and compare their performances with code-based classification. Methods and Results We included free-text notes of 10 935 manually validated patients, 778 (7.1%) Fontan and 10 157 (92.9%) non-Fontan, from 2 health care systems. Using 80% of the patient data, we trained and optimized multiple machine learning models, support vector machines and 2 versions of RoBERTa (a robustly optimized transformer-based model for language understanding), for automatically identifying Fontan cases based on notes. For RoBERTa, we implemented a novel sliding window strategy to overcome its length limit. We evaluated the machine learning models and code-based classification on 20% of the held-out patient data using the score metric. The classification model, support vector machine, and RoBERTa achieved scores of 0.81 (95% CI, 0.79-0.83), 0.95 (95% CI, 0.92-0.97), and 0.89 (95% CI, 0.88-0.85) for the positive (Fontan) class, respectively. Support vector machines obtained the best performance (<0.05), and both natural language processing models outperformed code-based classification (<0.05). The sliding window strategy improved performance over the base model (<0.05) but did not outperform support vector machines. code-based classification produced more false positives. Conclusions Natural language processing models can automatically detect Fontan patients based on clinical notes with higher accuracy than codes, and the former demonstrated the possibility of further improvement.

摘要

背景法洛四联症根治术后患者存在较高的发病率和死亡率。由于无法仅通过（）代码识别法洛四联症根治术患者，因此难以创建大型法洛四联症根治术患者队列。我们试图开发基于自然语言处理的机器学习模型，以便从电子健康记录中的自由文本中自动检测法洛四联症根治术患者，并将其性能与基于代码的分类进行比较。

方法和结果我们纳入了来自 2 个医疗系统的 10935 例经人工验证患者的自由文本记录，其中 778 例（7.1%）为法洛四联症根治术患者，10157 例（92.9%）为非法洛四联症根治术患者。我们使用 80%的数据对多种机器学习模型（支持向量机和 2 种 RoBERTa（一种用于语言理解的稳健优化的转换器模型））进行了训练和优化，以根据注释自动识别法洛四联症根治术患者。对于 RoBERTa，我们实施了一种新的滑动窗口策略来克服其长度限制。我们使用评分指标在 20%的保留患者数据上评估了机器学习模型和基于代码的分类。分类模型、支持向量机和 RoBERTa 对阳性（法洛四联症根治术）类别的评分分别为 0.81（95%CI，0.79-0.83）、0.95（95%CI，0.92-0.97）和 0.89（95%CI，0.88-0.85）。支持向量机的性能最佳（<0.05），并且两种自然语言处理模型均优于基于代码的分类（<0.05）。滑动窗口策略提高了性能（<0.05），但不及支持向量机。基于代码的分类产生了更多的假阳性。

结论基于自然语言处理的模型可以根据临床记录自动检测法洛四联症根治术患者，其准确率高于代码，且前者具有进一步改进的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f11d/10356083/0cde1fd5907e/JAH3-12-e030046-g004.jpg

相似文献

Supervised Text Classification System Detects Fontan Patients in Electronic Records With Higher Accuracy Than Codes.

J Am Heart Assoc. 2023 Jul 4;12(13):e030046. doi: 10.1161/JAHA.123.030046. Epub 2023 Jun 22.

Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records.

Rheumatology (Oxford). 2020 May 1;59(5):1059-1065. doi: 10.1093/rheumatology/kez375.

Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.

J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.

Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches.

JMIR Med Inform. 2022 Jun 29;10(6):e37557. doi: 10.2196/37557.

Development of a Method for Automatic Matching of Unstructured Medical Data to ICD-10 Codes.

Stud Health Technol Inform. 2024 May 23;314:93-97. doi: 10.3233/SHTI240065.

Word2Vec inversion and traditional text classifiers for phenotyping lupus.

BMC Med Inform Decis Mak. 2017 Aug 22;17(1):126. doi: 10.1186/s12911-017-0518-1.

Evaluating large language models for health-related text classification tasks with public social media data.

J Am Med Inform Assoc. 2024 Oct 1;31(10):2181-2189. doi: 10.1093/jamia/ocae210.

Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning.

JMIR Med Inform. 2021 Aug 31;9(8):e23230. doi: 10.2196/23230.

Evaluating a Natural Language Processing-Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study.

J Med Internet Res. 2024 Sep 20;26:e58278. doi: 10.2196/58278.

Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study.

JMIR Med Inform. 2022 Nov 10;10(11):e41342. doi: 10.2196/41342.

引用本文的文献

Risk Factors for Adverse Outcomes in a Fontan Population.

Pediatr Cardiol. 2025 Jun 6. doi: 10.1007/s00246-025-03902-9.

On the use of natural language processing to implement the target trial framework using unstructured data from the electronic health record.

Glob Epidemiol. 2025 May 8;9:100204. doi: 10.1016/j.gloepi.2025.100204. eCollection 2025 Jun.

A phenotyping algorithm for classification of single ventricle physiology using electronic health records.

JAMIA Open. 2025 May 15;8(3):ooaf035. doi: 10.1093/jamiaopen/ooaf035. eCollection 2025 Jun.

Evaluating the efficacy of the new electronic dental assistant training program.

Sci Rep. 2025 May 5;15(1):15639. doi: 10.1038/s41598-025-99856-2.

Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.

Perspect Health Inf Manag. 2024 Jun 1;21(2):1d. eCollection 2024 Summer-Fall.

Machine Learning and Natural Language Processing to Improve Classification of Atrial Septal Defects in Electronic Health Records.

Birth Defects Res. 2025 Mar;117(3):e2451. doi: 10.1002/bdr2.2451.

Evaluating large language models for health-related text classification tasks with public social media data.

J Am Med Inform Assoc. 2024 Oct 1;31(10):2181-2189. doi: 10.1093/jamia/ocae210.

Adoption of network and plan-do-check-action in the international classification of disease 10 coding.

World J Clin Cases. 2024 Jul 6;12(19):3734-3743. doi: 10.12998/wjcc.v12.i19.3734.

本文引用的文献

The Australian and New Zealand Fontan Registry Quality of Life Study: Protocol for a population-based assessment of quality of life among people with a Fontan circulation, their parents, and siblings.

BMJ Open. 2022 Sep 20;12(9):e065726. doi: 10.1136/bmjopen-2022-065726.

Post-operative Morbidity and Mortality After Fontan Procedure in Patients with Heterotaxy and Other Situs Anomalies.

Pediatr Cardiol. 2022 Jun;43(5):952-959. doi: 10.1007/s00246-021-02804-w. Epub 2022 Jan 22.

Systemic ventricular assist device support of the Fontan circulation yields promising outcomes: An analysis of The Society of Thoracic Surgeons Pedimacs and Intermacs Databases.

J Thorac Cardiovasc Surg. 2022 Aug;164(2):353-364. doi: 10.1016/j.jtcvs.2021.11.054. Epub 2021 Nov 27.

A mapping algorithm for International Classification of Diseases 10th Revision codes for congenital heart surgery benchmark procedures.

J Thorac Cardiovasc Surg. 2022 Jun;163(6):2232-2239. doi: 10.1016/j.jtcvs.2021.10.015. Epub 2021 Oct 21.

Higher Incidence of Protein-Losing Enteropathy in Patients with Single Systemic Right Ventricle.

Pediatr Cardiol. 2021 Jan;42(1):178-181. doi: 10.1007/s00246-020-02468-y. Epub 2020 Sep 25.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

An interpretable natural language processing system for written medical examination assessment.

J Biomed Inform. 2019 Oct;98:103268. doi: 10.1016/j.jbi.2019.103268. Epub 2019 Aug 14.

The REDCap consortium: Building an international community of software platform partners.

J Biomed Inform. 2019 Jul;95:103208. doi: 10.1016/j.jbi.2019.103208. Epub 2019 May 9.

Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.

JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.

National trends in Fontan operation and in-hospital outcomes in the USA.

Heart. 2019 May;105(9):708-714. doi: 10.1136/heartjnl-2018-313680. Epub 2018 Oct 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

监督式文本分类系统在电子病历中的 Fontan 患者检测准确率高于编码。

Supervised Text Classification System Detects Fontan Patients in Electronic Records With Higher Accuracy Than Codes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译