• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用网络分析模块度对健康码系统进行分组并降低机器学习模型的维度

"Using network analysis modularity to group health code systems and decrease dimensionality in machine learning models".

作者信息

Askar Mohsen, Småbrekke Lars, Holsbø Einar, Bongo Lars Ailo, Svendsen Kristian

机构信息

Department of Pharmacy, Faculty of Health Sciences, UiT-The Arctic University of Norway, PO Box 6050, Stakkevollan, N-9037 Tromsø, Norway.

Department of Computer Science, Faculty of Science and Technology, UiT-The Arctic University of Norway, PO, Box 6050 Stakkevollan, N-9037 Tromsø, Norway.

出版信息

Explor Res Clin Soc Pharm. 2024 Jun 11;14:100463. doi: 10.1016/j.rcsop.2024.100463. eCollection 2024 Jun.

DOI:10.1016/j.rcsop.2024.100463
PMID:38974056
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11227014/
Abstract

BACKGROUND

Machine learning (ML) prediction models in healthcare and pharmacy-related research face challenges with encoding high-dimensional Healthcare Coding Systems (HCSs) such as ICD, ATC, and DRG codes, given the trade-off between reducing model dimensionality and minimizing information loss.

OBJECTIVES

To investigate using Network Analysis modularity as a method to group HCSs to improve encoding in ML models.

METHODS

The MIMIC-III dataset was utilized to create a multimorbidity network in which ICD-9 codes are the nodes and the edges are the number of patients sharing the same ICD-9 code pairs. A modularity detection algorithm was applied using different resolution thresholds to generate 6 sets of modules. The impact of four grouping strategies on the performance of predicting 90-day Intensive Care Unit readmissions was assessed. The grouping strategies compared: 1) binary encoding of codes, 2) encoding codes grouped by network modules, 3) grouping codes to the highest level of ICD-9 hierarchy, and 4) grouping using the single-level Clinical Classification Software (CCS). The same methodology was also applied to encode DRG codes but limiting the comparison to a single modularity threshold to binary encoding.The performance was assessed using Logistic Regression, Support Vector Machine with a non-linear kernel, and Gradient Boosting Machines algorithms. Accuracy, Precision, Recall, AUC, and F1-score with 95% confidence intervals were reported.

RESULTS

Models utilized modularity encoding outperformed ungrouped codes binary encoding models. The accuracy improved across all algorithms ranging from 0.736 to 0.78 for the modularity encoding, to 0.727 to 0.779 for binary encoding. AUC, recall, and precision also improved across almost all algorithms. In comparison with other grouping approaches, modularity encoding generally showed slightly higher performance in AUC, ranging from 0.813 to 0.837, and precision, ranging from 0.752 to 0.782.

CONCLUSIONS

Modularity encoding enhances the performance of ML models in pharmacy research by effectively reducing dimensionality and retaining necessary information. Across the three algorithms used, models utilizing modularity encoding showed superior or comparable performance to other encoding approaches. Modularity encoding introduces other advantages such as it can be used for both hierarchical and non-hierarchical HCSs, the approach is clinically relevant, and can enhance ML models' clinical interpretation. A Python package has been developed to facilitate the use of the approach for future research.

摘要

背景

在医疗保健和药学相关研究中,机器学习(ML)预测模型在对诸如ICD、ATC和DRG编码等高维医疗编码系统(HCS)进行编码时面临挑战,因为在降低模型维度和最小化信息损失之间存在权衡。

目的

研究使用网络分析模块度作为一种对HCS进行分组的方法,以改进ML模型中的编码。

方法

利用MIMIC-III数据集创建一个共病网络,其中ICD-9编码为节点,边为共享相同ICD-9编码对的患者数量。应用模块度检测算法,使用不同的分辨率阈值生成6组模块。评估了四种分组策略对预测90天重症监护病房再入院性能的影响。比较的分组策略为:1)编码的二进制编码,2)按网络模块分组的编码,3)将编码分组到ICD-9层次结构的最高级别,4)使用单级临床分类软件(CCS)进行分组。同样的方法也应用于对DRG编码进行编码,但将比较限制在二进制编码的单个模块度阈值上。使用逻辑回归、具有非线性核的支持向量机和梯度提升机算法评估性能。报告了准确率、精确率、召回率、AUC和F1分数以及95%置信区间。

结果

使用模块度编码的模型优于未分组编码的二进制编码模型。在所有算法中,模块度编码的准确率从0.736提高到0.78,二进制编码的准确率从0.727提高到0.779。几乎所有算法的AUC、召回率和精确率也有所提高。与其他分组方法相比,模块度编码在AUC(范围为0.813至0.837)和精确率(范围为0.752至0.782)方面通常表现略高。

结论

模块度编码通过有效降低维度并保留必要信息,提高了药学研究中ML模型的性能。在所使用的三种算法中,使用模块度编码的模型表现优于或与其他编码方法相当。模块度编码还具有其他优点,例如它可用于分层和非分层的HCS,该方法具有临床相关性,并且可以增强ML模型的临床解释性。已开发了一个Python包,以方便该方法在未来研究中的使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/11227014/c75c8f164819/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/11227014/41e7b2fbd603/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/11227014/c75c8f164819/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/11227014/41e7b2fbd603/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/11227014/c75c8f164819/gr2.jpg

相似文献

1
"Using network analysis modularity to group health code systems and decrease dimensionality in machine learning models".利用网络分析模块度对健康码系统进行分组并降低机器学习模型的维度
Explor Res Clin Soc Pharm. 2024 Jun 11;14:100463. doi: 10.1016/j.rcsop.2024.100463. eCollection 2024 Jun.
2
Unsupervised Feature Selection to Identify Important ICD-10 and ATC Codes for Machine Learning on a Cohort of Patients With Coronary Heart Disease: Retrospective Study.无监督特征选择以识别冠心病患者队列机器学习中的重要国际疾病分类第十版(ICD - 10)和解剖治疗化学分类系统(ATC)编码:回顾性研究
JMIR Med Inform. 2024 Jul 26;12:e52896. doi: 10.2196/52896.
3
Impact of diagnosis code grouping method on clinical prediction model performance: A multi-site retrospective observational study.诊断代码分组方法对临床预测模型性能的影响:一项多站点回顾性观察研究。
Int J Med Inform. 2021 Jul;151:104466. doi: 10.1016/j.ijmedinf.2021.104466. Epub 2021 Apr 16.
4
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
5
Automated ICD coding for primary diagnosis via clinically interpretable machine learning.通过具有临床解释能力的机器学习实现主要诊断的自动化 ICD 编码。
Int J Med Inform. 2021 Sep;153:104543. doi: 10.1016/j.ijmedinf.2021.104543. Epub 2021 Jul 27.
6
[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].基于监督机器学习算法构建脓毒症休克患者死亡风险预测模型
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2024 Apr;36(4):345-352. doi: 10.3760/cma.j.cn121430-20230930-00832.
7
Patient Embeddings From Diagnosis Codes for Health Care Prediction Tasks: Pat2Vec Machine Learning Framework.用于医疗保健预测任务的诊断代码患者嵌入:Pat2Vec机器学习框架
JMIR AI. 2023 Apr 21;2:e40755. doi: 10.2196/40755.
8
An empirical evaluation of deep learning for ICD-9 code assignment using MIMIC-III clinical notes.基于 MIMIC-III 临床记录的深度学习方法在 ICD-9 编码任务中的实证评估
Comput Methods Programs Biomed. 2019 Aug;177:141-153. doi: 10.1016/j.cmpb.2019.05.024. Epub 2019 May 25.
9
Automated ICD coding via unsupervised knowledge integration (UNITE).基于无监督知识集成的 ICD 编码自动化(UNITE)。
Int J Med Inform. 2020 Jul;139:104135. doi: 10.1016/j.ijmedinf.2020.104135. Epub 2020 Apr 4.
10
Machine learning algorithms for predicting COVID-19 mortality in Ethiopia.用于预测埃塞俄比亚 COVID-19 死亡率的机器学习算法。
BMC Public Health. 2024 Jun 28;24(1):1728. doi: 10.1186/s12889-024-19196-0.

本文引用的文献

1
The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century.人工智能在医院和诊所中的作用:变革21世纪的医疗保健
Bioengineering (Basel). 2024 Mar 29;11(4):337. doi: 10.3390/bioengineering11040337.
2
Artificial intelligence in the field of pharmacy practice: A literature review.药学实践领域中的人工智能:一篇文献综述。
Explor Res Clin Soc Pharm. 2023 Oct 21;12:100346. doi: 10.1016/j.rcsop.2023.100346. eCollection 2023 Dec.
3
Data-Centric AI for Healthcare Fraud Detection.用于医疗欺诈检测的以数据为中心的人工智能。
SN Comput Sci. 2023;4(4):389. doi: 10.1007/s42979-023-01809-x. Epub 2023 May 11.
4
Phenotypic Disease Network Analysis to Identify Comorbidity Patterns in Hospitalized Patients with Ischemic Heart Disease Using Large-Scale Administrative Data.使用大规模管理数据进行表型疾病网络分析以识别缺血性心脏病住院患者的共病模式
Healthcare (Basel). 2022 Jan 1;10(1):80. doi: 10.3390/healthcare10010080.
5
An introduction to network analysis for studies of medication use.网络分析在药物使用研究中的应用简介。
Res Social Adm Pharm. 2021 Dec;17(12):2054-2061. doi: 10.1016/j.sapharm.2021.06.021. Epub 2021 Jun 29.
6
The need to separate the wheat from the chaff in medical informatics: Introducing a comprehensive checklist for the (self)-assessment of medical AI studies.需要在医学信息学中去芜存菁:引入全面的清单,用于(自我)评估医学人工智能研究。
Int J Med Inform. 2021 Sep;153:104510. doi: 10.1016/j.ijmedinf.2021.104510. Epub 2021 Jun 2.
7
Development and Validation of Machine Learning Models to Predict Admission From Emergency Department to Inpatient and Intensive Care Units.开发和验证机器学习模型以预测从急诊科收治入院和进入重症监护病房。
Ann Emerg Med. 2021 Aug;78(2):290-302. doi: 10.1016/j.annemergmed.2021.02.029. Epub 2021 May 8.
8
Impact of diagnosis code grouping method on clinical prediction model performance: A multi-site retrospective observational study.诊断代码分组方法对临床预测模型性能的影响:一项多站点回顾性观察研究。
Int J Med Inform. 2021 Jul;151:104466. doi: 10.1016/j.ijmedinf.2021.104466. Epub 2021 Apr 16.
9
Early Prediction of Unplanned 30-Day Hospital Readmission: Model Development and Retrospective Data Analysis.非计划30天再入院的早期预测:模型开发与回顾性数据分析
JMIR Med Inform. 2021 Mar 23;9(3):e16306. doi: 10.2196/16306.
10
Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist.临床人工智能建模的最低信息要求:MI-CLAIM清单
Nat Med. 2020 Sep;26(9):1320-1324. doi: 10.1038/s41591-020-1041-y.