• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用电子健康记录数据开发和验证用于糖尿病的各种表型算法。

Development and validation of various phenotyping algorithms for Diabetes Mellitus using data from electronic health records.

机构信息

Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.; Research Department, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina..

Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.

出版信息

Comput Methods Programs Biomed. 2017 Dec;152:53-70. doi: 10.1016/j.cmpb.2017.09.009. Epub 2017 Sep 14.

DOI:10.1016/j.cmpb.2017.09.009
PMID:29054261
Abstract

BACKGROUND AND OBJECTIVE

Recent progression towards precision medicine has encouraged the use of electronic health records (EHRs) as a source for large amounts of data, which is required for studying the effect of treatments or risk factors in more specific subpopulations. Phenotyping algorithms allow to automatically classify patients according to their particular electronic phenotype thus facilitating the setup of retrospective cohorts. Our objective is to compare the performance of different classification strategies (only using standardized problems, rule-based algorithms, statistical learning algorithms (six learners) and stacked generalization (five versions)), for the categorization of patients according to their diabetic status (diabetics, not diabetics and inconclusive; Diabetes of any type) using information extracted from EHRs.

METHODS

Patient information was extracted from the EHR at Hospital Italiano de Buenos Aires, Buenos Aires, Argentina. For the derivation and validation datasets, two probabilistic samples of patients from different years (2005: n = 1663; 2015: n = 800) were extracted. The only inclusion criterion was age (≥40 & <80 years). Four researchers manually reviewed all records and classified patients according to their diabetic status (diabetic: diabetes registered as a health problem or fulfilling the ADA criteria; non-diabetic: not fulfilling the ADA criteria and having at least one fasting glycemia below 126 mg/dL; inconclusive: no data regarding their diabetic status or only one abnormal value). The best performing algorithms within each strategy were tested on the validation set.

RESULTS

The standardized codes algorithm achieved a Kappa coefficient value of 0.59 (95% CI 0.49, 0.59) in the validation set. The Boolean logic algorithm reached 0.82 (95% CI 0.76, 0.88). A slightly higher value was achieved by the Feedforward Neural Network (0.9, 95% CI 0.85, 0.94). The best performing learner was the stacked generalization meta-learner that reached a Kappa coefficient value of 0.95 (95% CI 0.91, 0.98).

CONCLUSIONS

The stacked generalization strategy and the feedforward neural network showed the best classification metrics in the validation set. The implementation of these algorithms enables the exploitation of the data of thousands of patients accurately.

摘要

背景和目的

最近精准医学的发展鼓励使用电子健康记录(EHRs)作为大量数据的来源,这是研究治疗效果或风险因素在更特定亚人群中的作用所必需的。表型算法允许根据患者的特定电子表型自动对患者进行分类,从而方便回顾性队列的建立。我们的目的是比较不同分类策略(仅使用标准化问题、基于规则的算法、统计学习算法(六种学习者)和堆叠泛化(五种版本))在根据 EHR 提取的信息对患者进行糖尿病状态分类(糖尿病患者、非糖尿病患者和不确定;任何类型的糖尿病)方面的性能。

方法

从阿根廷布宜诺斯艾利斯的意大利医院的 EHR 中提取患者信息。对于推导和验证数据集,从不同年份(2005 年:n=1663;2015 年:n=800)中提取了两个患者的概率样本。唯一的纳入标准是年龄(≥40 岁且<80 岁)。四名研究人员手动审查了所有记录,并根据他们的糖尿病状态对患者进行分类(糖尿病:将糖尿病作为健康问题登记或符合 ADA 标准;非糖尿病:不符合 ADA 标准且至少有一次空腹血糖低于 126mg/dL;不确定:没有关于他们的糖尿病状态的数据或只有一个异常值)。在验证集中测试了每种策略中表现最好的算法。

结果

标准化代码算法在验证集中的 Kappa 系数值为 0.59(95%CI 0.49,0.59)。布尔逻辑算法达到 0.82(95%CI 0.76,0.88)。前馈神经网络(0.9,95%CI 0.85,0.94)的数值略高。表现最好的学习者是堆叠泛化元学习者,其 Kappa 系数值为 0.95(95%CI 0.91,0.98)。

结论

在验证集中,堆叠泛化策略和前馈神经网络表现出最好的分类指标。这些算法的实现使我们能够准确地利用数千名患者的数据。

相似文献

1
Development and validation of various phenotyping algorithms for Diabetes Mellitus using data from electronic health records.利用电子健康记录数据开发和验证用于糖尿病的各种表型算法。
Comput Methods Programs Biomed. 2017 Dec;152:53-70. doi: 10.1016/j.cmpb.2017.09.009. Epub 2017 Sep 14.
2
Development and Validation of Various Phenotyping Algorithms for Diabetes Mellitus Using Data from Electronic Health Records.
Stud Health Technol Inform. 2017;245:366-369.
3
Development and validation of algorithms to classify type 1 and 2 diabetes according to age at diagnosis using electronic health records.开发和验证使用电子健康记录根据诊断时的年龄对 1 型和 2 型糖尿病进行分类的算法。
BMC Med Res Methodol. 2020 Feb 24;20(1):35. doi: 10.1186/s12874-020-00921-3.
4
Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals.评估电子健康记录数据源及识别高血压个体的算法方法。
J Am Med Inform Assoc. 2017 Jan;24(1):162-171. doi: 10.1093/jamia/ocw071. Epub 2016 Aug 7.
5
A rule-based electronic phenotyping algorithm for detecting clinically relevant cardiovascular disease cases.一种用于检测临床相关心血管疾病病例的基于规则的电子表型分析算法。
BMC Res Notes. 2017 Jul 14;10(1):281. doi: 10.1186/s13104-017-2600-2.
6
Deep Phenotyping of Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives With a Sequence Motif Discovery Tool: Algorithm Development and Validation.利用序列基序发现工具识别表型叙述的语言模式对中国电子健康记录进行深度表型分析:算法开发与验证
J Med Internet Res. 2022 Jun 3;24(6):e37213. doi: 10.2196/37213.
7
Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance.结合电子健康记录中的计费代码、临床记录和药物信息可提供卓越的表型分析性能。
J Am Med Inform Assoc. 2016 Apr;23(e1):e20-7. doi: 10.1093/jamia/ocv130. Epub 2015 Sep 2.
8
Identifying lupus patients in electronic health records: Development and validation of machine learning algorithms and application of rule-based algorithms.在电子健康记录中识别狼疮患者:机器学习算法的开发和验证以及基于规则算法的应用。
Semin Arthritis Rheum. 2019 Aug;49(1):84-90. doi: 10.1016/j.semarthrit.2019.01.002. Epub 2019 Jan 4.
9
Relational machine learning for electronic health record-driven phenotyping.用于电子健康记录驱动的表型分析的关系机器学习。
J Biomed Inform. 2014 Dec;52:260-70. doi: 10.1016/j.jbi.2014.07.007. Epub 2014 Jul 15.
10
The Impact of "Possible Patients" on Phenotyping Algorithms: Electronic Phenotype Algorithms Can Only Be Reproduced by Sharing Detailed Annotation Criteria.“潜在患者”对表型算法的影响:电子表型算法只有通过共享详细的注释标准才能重现。
Stud Health Technol Inform. 2017;245:432-436.

引用本文的文献

1
A phenotyping algorithm for classification of single ventricle physiology using electronic health records.一种使用电子健康记录对单心室生理进行分类的表型分析算法。
JAMIA Open. 2025 May 15;8(3):ooaf035. doi: 10.1093/jamiaopen/ooaf035. eCollection 2025 Jun.
2
Predictive modeling of multi-class diabetes mellitus using machine learning and filtering iraqi diabetes data dynamics.基于机器学习的多类别糖尿病预测建模及伊拉克糖尿病数据动态过滤
PLoS One. 2024 May 16;19(5):e0300785. doi: 10.1371/journal.pone.0300785. eCollection 2024.
3
Development of phenotyping algorithms for hypertensive disorders of pregnancy (HDP) and their application in more than 22,000 pregnant women.
开发用于妊娠高血压疾病(HDP)的表型算法及其在 22000 多名孕妇中的应用。
Sci Rep. 2024 Mar 15;14(1):6292. doi: 10.1038/s41598-024-55914-9.
4
Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms With Electronic Health Records.改善成人当前糖化血红蛋白预测:使用机器学习算法结合电子健康记录。
JMIR Med Inform. 2021 May 24;9(5):e25237. doi: 10.2196/25237.
5
Performance evaluation of case definitions of type 1 diabetes for health insurance claims data in Japan.日本健康保险索赔数据中 1 型糖尿病病例定义的性能评估。
BMC Med Inform Decis Mak. 2021 Feb 11;21(1):52. doi: 10.1186/s12911-021-01422-z.
6
A multi-class classification model for supporting the diagnosis of type II diabetes mellitus.一种支持II型糖尿病诊断的多分类模型。
PeerJ. 2020 Sep 10;8:e9920. doi: 10.7717/peerj.9920. eCollection 2020.
7
Automated Phenotyping Tool for Identifying Developmental Language Disorder Cases in Health Systems Data (APT-DLD): A New Research Algorithm for Deployment in Large-Scale Electronic Health Record Systems.用于在卫生系统数据中识别发育性语言障碍病例的自动化表型分析工具(APT-DLD):一种用于在大规模电子健康记录系统中部署的新研究算法。
J Speech Lang Hear Res. 2020 Sep 15;63(9):3019-3035. doi: 10.1044/2020_JSLHR-19-00397. Epub 2020 Aug 11.
8
Phenotype Inference with Semi-Supervised Mixed Membership Models.基于半监督混合成员模型的表型推断
Proc Mach Learn Res. 2019 Aug;106:304-324.
9
Diabetes and the direct secondary use of electronic health records: Using routinely collected and stored data to drive research and understanding.糖尿病与电子健康记录的直接二次利用:利用常规收集和存储的数据推动研究与认知。
Digit Health. 2018 Oct 8;4:2055207618804650. doi: 10.1177/2055207618804650. eCollection 2018 Jan-Dec.