文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

A machine learning-based framework to identify type 2 diabetes through electronic health records.

作者信息

Zheng Tao, Xie Wei, Xu Liling, He Xiaoying, Zhang Ya, You Mingrong, Yang Gong, Chen You

机构信息

Institute of Image Communication and Networking, Shanghai Jiao Tong University, Shanghai, China; Tongren Hospital Shanghai Jiao Tong University, Shanghai, China.

Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN, USA.

出版信息

Int J Med Inform. 2017 Jan;97:120-127. doi: 10.1016/j.ijmedinf.2016.09.014. Epub 2016 Oct 1.


DOI:10.1016/j.ijmedinf.2016.09.014
PMID:27919371
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5144921/
Abstract

OBJECTIVE: To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate. MATERIALS AND METHODS: We propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort retrieved from a regional distributed EHR repository ranging from 2012 to 2014. RESULTS: We apply top-performing machine learning algorithms on the engineered features. We benchmark and contrast the accuracy, precision, AUC, sensitivity and specificity of classification models against the state-of-the-art expert algorithm for identification of T2DM subjects. Our results indicate that the framework achieved high identification performances (∼0.98 in average AUC), which are much higher than the state-of-the-art algorithm (0.71 in AUC). DISCUSSION: Expert algorithm-based identification of T2DM subjects from EHR is often hampered by the high missing rates due to their conservative selection criteria. Our framework leverages machine learning and feature engineering to loosen such selection criteria to achieve a high identification rate of cases and controls. CONCLUSIONS: Our proposed framework demonstrates a more accurate and efficient approach for identifying subjects with and without T2DM from EHR.

摘要

相似文献

[1]
A machine learning-based framework to identify type 2 diabetes through electronic health records.

Int J Med Inform. 2017-1

[2]
Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach.

J Diabetes Sci Technol. 2017-7

[3]
Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.

J Biomed Inform. 2019-10-14

[4]
Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: A cross-sectional, unselected, retrospective study.

J Biomed Inform. 2016-4

[5]
Application of supervised machine learning algorithms for classification and prediction of type-2 diabetes disease status in Afar regional state, Northeastern Ethiopia 2021.

Sci Rep. 2023-5-13

[6]
Automated feature selection of predictors in electronic medical records data.

Biometrics. 2019-3

[7]
Machine Learning for the Prediction of New-Onset Diabetes Mellitus during 5-Year Follow-up in Non-Diabetic Patients with Cardiovascular Risks.

Yonsei Med J. 2019-2

[8]
Validating an ontology-based algorithm to identify patients with type 2 diabetes mellitus in electronic health records.

Int J Med Inform. 2014-10

[9]
Machine learning-based prediction models for home discharge in patients with COVID-19: Development and evaluation using electronic health records.

PLoS One. 2023

[10]
Comparison of machine learning algorithms for the identification of acute exacerbations in chronic obstructive pulmonary disease.

Comput Methods Programs Biomed. 2020-5

引用本文的文献

[1]
Machine learning and artificial intelligence in type 2 diabetes prediction: a comprehensive 33-year bibliometric and literature analysis.

Front Digit Health. 2025-3-27

[2]
Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.

Perspect Health Inf Manag. 2024-6-1

[3]
Efficient diagnosis of diabetes mellitus using an improved ensemble method.

Sci Rep. 2025-1-25

[4]
: Empowering artificial intelligence for respiratory diseases diagnosis based on electronic health records, a multicenter study.

MedComm (2020). 2025-1-12

[5]
Study on risk factors of impaired fasting glucose and development of a prediction model based on Extreme Gradient Boosting algorithm.

Front Endocrinol (Lausanne). 2024

[6]
Enhancing severe hypoglycemia prediction in type 2 diabetes mellitus through multi-view co-training machine learning model for imbalanced dataset.

Sci Rep. 2024-9-30

[7]
Applying machine learning approaches for predicting obesity risk using US health administrative claims database.

BMJ Open Diabetes Res Care. 2024-9-26

[8]
Treatment Discontinuation Prediction in Patients With Diabetes Using a Ranking Model: Machine Learning Model Development.

JMIR Bioinform Biotechnol. 2022-9-23

[9]
Machine learning to identify chronic cough from administrative claims data.

Sci Rep. 2024-1-30

[10]
A Machine Learning Web App to Predict Diabetic Blood Glucose Based on a Basic Noninvasive Health Checkup, Sociodemographic Characteristics, and Dietary Information: Case Study.

JMIR Diabetes. 2023-11-24

本文引用的文献

[1]
Supporting Regularized Logistic Regression Privately and Efficiently.

PLoS One. 2016-6-6

[2]
Introducing Machine Learning Concepts with WEKA.

Methods Mol Biol. 2016

[3]
Multivariate Analysis of Genotype-Phenotype Association.

Genetics. 2016-4

[4]
A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families.

BMC Genomics. 2015-5-15

[5]
A new initiative on precision medicine.

N Engl J Med. 2015-2-26

[6]
SecureMA: protecting participant privacy in genetic association meta-analysis.

Bioinformatics. 2014-12-1

[7]
Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data.

Nat Biotechnol. 2013-12

[8]
A review of approaches to identifying patient phenotype cohorts using electronic health records.

J Am Med Inform Assoc. 2013-11-7

[9]
Prevalence and control of diabetes in Chinese adults.

JAMA. 2013-9-4

[10]
A comparison of electronic health records at two major Peking University Hospitals in China to United States meaningful use objectives.

BMC Med Inform Decis Mak. 2013-8-28

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索