• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用机器学习和自然语言处理技术改善电子健康记录中房间隔缺损的分类

Machine Learning and Natural Language Processing to Improve Classification of Atrial Septal Defects in Electronic Health Records.

作者信息

Guo Yuting, Shi Haoming, Book Wendy M, Ivey Lindsey Carrie, Rodriguez Fred H, Sameni Reza, Raskind-Hood Cheryl, Robichaux Chad, Downing Karrie F, Sarker Abeed

机构信息

Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA.

Department of Biomedical Engineering, Georgia Institute Technology, Atlanta, Georgia, USA.

出版信息

Birth Defects Res. 2025 Mar;117(3):e2451. doi: 10.1002/bdr2.2451.

DOI:10.1002/bdr2.2451
PMID:40035168
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11955907/
Abstract

BACKGROUND

International Classification of Disease (ICD) codes can accurately identify patients with certain congenital heart defects (CHDs). In ICD-defined CHD data sets, the code for secundum atrial septal defect (ASD) is the most common, but it has a low positive predictive value for CHD, potentially resulting in the drawing of erroneous conclusions from such data sets. Methods with reduced false positive rates for CHD among individuals captured with the ASD ICD code are needed for public health surveillance.

METHODS

We propose a two-level classification system, which includes a CHD and an ASD classification model, to categorize cases with an ASD ICD code into three groups: ASD, other CHD, or no CHD (including patent foramen ovale). In the proposed approach, a machine learning model that leverages structured data is combined with a text classification system. We compare performances for three text classification strategies: support vector machines (SVMs) using text-based features, a robustly optimized Transformer-based model (RoBERTa), and a scalable tree boosting system using non-text-based features (XGBoost).

RESULTS

Using SVM for both CHD and ASD resulted in the best performance for the ASD and no CHD group, achieving F scores of 0.53 (±0.05) and 0.78 (±0.02), respectively. XGBoost for CHD and SVM for ASD classification performed best for the other CHD group (F score: 0.39 [±0.03]).

CONCLUSIONS

This study demonstrates that it is feasible to use patients' clinical notes and machine learning to perform more fine-grained classification compared to ICD codes, particularly with higher PPV for CHD. The proposed approach can improve CHD surveillance.

摘要

背景

国际疾病分类(ICD)编码能够准确识别患有某些先天性心脏病(CHD)的患者。在ICD定义的CHD数据集中,继发孔房间隔缺损(ASD)的编码最为常见,但它对CHD的阳性预测值较低,可能导致从此类数据集中得出错误结论。公共卫生监测需要降低被ASD ICD编码捕获的个体中CHD假阳性率的方法。

方法

我们提出了一个两级分类系统,包括CHD和ASD分类模型,将具有ASD ICD编码的病例分为三组:ASD、其他CHD或无CHD(包括卵圆孔未闭)。在所提出的方法中,一个利用结构化数据的机器学习模型与一个文本分类系统相结合。我们比较了三种文本分类策略的性能:使用基于文本特征的支持向量机(SVM)、经过稳健优化的基于Transformer的模型(RoBERTa)以及使用非文本特征的可扩展树提升系统(XGBoost)。

结果

CHD和ASD均使用SVM时,ASD组和无CHD组的性能最佳,F分数分别为0.53(±0.05)和0.78(±0.02)。CHD使用XGBoost且ASD分类使用SVM时,其他CHD组的性能最佳(F分数:0.39 [±0.03])。

结论

本研究表明,与ICD编码相比,使用患者的临床记录和机器学习进行更细粒度的分类是可行的,尤其是对CHD具有更高的阳性预测值。所提出的方法可以改善CHD监测。

相似文献

1
Machine Learning and Natural Language Processing to Improve Classification of Atrial Septal Defects in Electronic Health Records.利用机器学习和自然语言处理技术改善电子健康记录中房间隔缺损的分类
Birth Defects Res. 2025 Mar;117(3):e2451. doi: 10.1002/bdr2.2451.
2
Supervised Text Classification System Detects Fontan Patients in Electronic Records With Higher Accuracy Than Codes.监督式文本分类系统在电子病历中的 Fontan 患者检测准确率高于编码。
J Am Heart Assoc. 2023 Jul 4;12(13):e030046. doi: 10.1161/JAHA.123.030046. Epub 2023 Jun 22.
3
Positive Predictive Value of , and , Codes for Identification of Congenital Heart Defects.阳性预测值 、 和 编码在先天性心脏病识别中的应用。
J Am Heart Assoc. 2023 Aug 15;12(16):e030821. doi: 10.1161/JAHA.123.030821. Epub 2023 Aug 7.
4
A Generalized Machine Learning Model for Identifying Congenital Heart Defects (CHDs) Using ICD Codes.一种使用国际疾病分类代码识别先天性心脏病(CHD)的广义机器学习模型。
Birth Defects Res. 2025 Feb;117(2):e2440. doi: 10.1002/bdr2.2440.
5
The 745.5 issue in code-based, adult congenital heart disease population studies: Relevance to current and future ICD-9-CM and ICD-10-CM studies.基于代码的成人先天性心脏病人群研究中的745.5问题:与当前及未来ICD-9-CM和ICD-10-CM研究的相关性。
Congenit Heart Dis. 2018 Jan;13(1):59-64. doi: 10.1111/chd.12563. Epub 2017 Dec 20.
6
A machine learning model for predicting congenital heart defects from administrative data.基于行政数据的先天性心脏病预测机器学习模型。
Birth Defects Res. 2023 Nov 1;115(18):1693-1707. doi: 10.1002/bdr2.2245. Epub 2023 Sep 8.
7
Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.使用预训练语言模型和先进提示学习技术的自主国际疾病分类编码:对一个使用医学文本的自动分析系统的评估
JMIR Med Inform. 2025 Jan 6;13:e63020. doi: 10.2196/63020.
8
How Well Do Codes Predict True Congenital Heart Defects? A Centers for Disease Control and Prevention-Based Multisite Validation Project.这些代码能多准确地预测真正的先天性心脏缺陷?一项基于疾病控制与预防中心的多地点验证项目。
J Am Heart Assoc. 2022 Aug 2;11(15):e024911. doi: 10.1161/JAHA.121.024911. Epub 2022 Jul 19.
9
Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records.利用电子健康记录纳入自然语言处理以改善轴性脊柱关节炎的分类。
Rheumatology (Oxford). 2020 May 1;59(5):1059-1065. doi: 10.1093/rheumatology/kez375.
10
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.

本文引用的文献

1
Long term outcome after surgical ASD-closure at young age: Longitudinal follow-up up to 50 years after surgery.年轻时行 ASD 封堵术后的长期结局:术后长达 50 年的纵向随访。
Int J Cardiol. 2024 Feb 15;397:131616. doi: 10.1016/j.ijcard.2023.131616. Epub 2023 Nov 27.
2
A machine learning model for predicting congenital heart defects from administrative data.基于行政数据的先天性心脏病预测机器学习模型。
Birth Defects Res. 2023 Nov 1;115(18):1693-1707. doi: 10.1002/bdr2.2245. Epub 2023 Sep 8.
3
Positive Predictive Value of , and , Codes for Identification of Congenital Heart Defects.阳性预测值 、 和 编码在先天性心脏病识别中的应用。
J Am Heart Assoc. 2023 Aug 15;12(16):e030821. doi: 10.1161/JAHA.123.030821. Epub 2023 Aug 7.
4
Atrial septal defect-associated pulmonary hypertension with decompensated heart failure: outcomes after fenestrated device closure.房间隔缺损相关的肺动脉高压伴失代偿性心力衰竭:开窗封堵器封堵术后的结局
Cardiol Young. 2024 Feb;34(2):395-400. doi: 10.1017/S104795112300152X. Epub 2023 Jul 19.
5
Applying Deep Learning Model to Predict Diagnosis Code of Medical Records.应用深度学习模型预测病历诊断代码
Diagnostics (Basel). 2023 Jul 6;13(13):2297. doi: 10.3390/diagnostics13132297.
6
Supervised Text Classification System Detects Fontan Patients in Electronic Records With Higher Accuracy Than Codes.监督式文本分类系统在电子病历中的 Fontan 患者检测准确率高于编码。
J Am Heart Assoc. 2023 Jul 4;12(13):e030046. doi: 10.1161/JAHA.123.030046. Epub 2023 Jun 22.
7
Natural Language Processing Model for Identifying Critical Findings-A Multi-Institutional Study.自然语言处理模型在识别关键发现中的应用:一项多机构研究。
J Digit Imaging. 2023 Feb;36(1):105-113. doi: 10.1007/s10278-022-00712-w. Epub 2022 Nov 7.
8
Health Care Usage Among Adolescents With Congenital Heart Defects at 5 Sites in the United States, 2011 to 2013.美国 5 个地区 2011 至 2013 年先天性心脏病青少年的医疗保健使用情况。
J Am Heart Assoc. 2022 Sep 20;11(18):e026172. doi: 10.1161/JAHA.122.026172. Epub 2022 Sep 14.
9
Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification.与健康相关的社交媒体文本分类的预训练模型和策略比较。
Healthcare (Basel). 2022 Aug 5;10(8):1478. doi: 10.3390/healthcare10081478.
10
How Well Do Codes Predict True Congenital Heart Defects? A Centers for Disease Control and Prevention-Based Multisite Validation Project.这些代码能多准确地预测真正的先天性心脏缺陷?一项基于疾病控制与预防中心的多地点验证项目。
J Am Heart Assoc. 2022 Aug 2;11(15):e024911. doi: 10.1161/JAHA.121.024911. Epub 2022 Jul 19.