Suppr超能文献

基于集成特征选择的糖尿病肾病稳健临床标志物识别。

Robust clinical marker identification for diabetic kidney disease with ensemble feature selection.

机构信息

Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, Kansas, USA.

Big Data Decision Institute, Jinan University, Guangzhou, PRC.

出版信息

J Am Med Inform Assoc. 2019 Mar 1;26(3):242-253. doi: 10.1093/jamia/ocy165.

Abstract

OBJECTIVE

Diabetic kidney disease (DKD) is one of the most frequent complications in diabetes associated with substantial morbidity and mortality. To accelerate DKD risk factor discovery, we present an ensemble feature selection approach to identify a robust set of discriminant factors using electronic medical records (EMRs).

MATERIAL AND METHODS

We identified a retrospective cohort of 15 645 adult patients with type 2 diabetes, excluding those with pre-existing kidney disease, and utilized all available clinical data types in modeling. We compared 3 machine-learning-based embedded feature selection methods in conjunction with 6 feature ensemble techniques for selecting top-ranked features in terms of robustness to data perturbations and predictability for DKD onset.

RESULTS

The gradient boosting machine (GBM) with weighted mean rank feature ensemble technique achieved the best performance with an AUC of 0.82 [95%-CI, 0.81-0.83] on internal validation and 0.71 [95%-CI, 0.68-0.73] on external temporal validation. The ensemble model identified a set of 440 features from 84 872 unique clinical features that are both predicative of DKD onset and robust against data perturbations, including 191 labs, 51 visit details (mainly vital signs), 39 medications, 34 orders, 30 diagnoses, and 95 other clinical features.

DISCUSSION

Many of the top-ranked features have not been included in the state-of-art DKD prediction models, but their relationships with kidney function have been suggested in existing literature.

CONCLUSION

Our ensemble feature selection framework provides an option for identifying a robust and parsimonious feature set unbiasedly from EMR data, which effectively aids in knowledge discovery for DKD risk factors.

摘要

目的

糖尿病肾病(DKD)是糖尿病最常见的并发症之一,与大量发病率和死亡率相关。为了加速 DKD 危险因素的发现,我们提出了一种集成特征选择方法,使用电子病历(EMR)来识别一组稳健的判别因素。

材料和方法

我们确定了一个包含 15645 名成年 2 型糖尿病患者的回顾性队列,排除了那些有预先存在的肾脏疾病的患者,并在建模中利用了所有可用的临床数据类型。我们比较了 3 种基于机器学习的嵌入式特征选择方法与 6 种特征集成技术,以选择在数据扰动和 DKD 发病预测方面表现稳健的顶级特征。

结果

梯度提升机(GBM)与加权平均秩特征集成技术在内部验证中的 AUC 为 0.82[95%CI,0.81-0.83],在外部时间验证中的 AUC 为 0.71[95%CI,0.68-0.73],表现最佳。该集成模型从 84872 个独特的临床特征中识别出了一组 440 个特征,这些特征既可以预测 DKD 的发病,又可以对数据扰动具有稳健性,包括 191 个实验室、51 个就诊细节(主要是生命体征)、39 种药物、34 个医嘱、30 个诊断和 95 个其他临床特征。

讨论

许多排名最高的特征都没有被纳入最先进的 DKD 预测模型中,但它们与肾功能的关系在现有文献中已经有所提及。

结论

我们的集成特征选择框架为从 EMR 数据中识别稳健和简约的特征集提供了一种选择,这有效地帮助了 DKD 危险因素的知识发现。

相似文献

引用本文的文献

8
Machine Learning Models for Prediction of Diabetic Microvascular Complications.机器学习模型预测糖尿病微血管并发症。
J Diabetes Sci Technol. 2024 Mar;18(2):273-286. doi: 10.1177/19322968231223726. Epub 2024 Jan 8.

本文引用的文献

1
Diagnosis of diabetic kidney disease: state of the art and future perspective.糖尿病肾病的诊断:现状与未来展望
Kidney Int Suppl (2011). 2018 Jan;8(1):2-7. doi: 10.1016/j.kisu.2017.10.003. Epub 2017 Dec 29.
7
Feature Selection Based on Structured Sparsity: A Comprehensive Study.基于结构稀疏性的特征选择:全面研究
IEEE Trans Neural Netw Learn Syst. 2017 Jul;28(7):1490-1507. doi: 10.1109/TNNLS.2016.2551724. Epub 2016 Apr 22.
10
Classification of radiology reports for falls in an HIV study cohort.一项HIV研究队列中跌倒的放射学报告分类
J Am Med Inform Assoc. 2016 Apr;23(e1):e113-7. doi: 10.1093/jamia/ocv155. Epub 2015 Nov 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验