Suppr超能文献

使用LightGBM预测非糖尿病人群的胰岛素抵抗及其临床价值的队列验证:横断面和回顾性队列研究

Prediction of Insulin Resistance in Nondiabetic Population Using LightGBM and Cohort Validation of Its Clinical Value: Cross-Sectional and Retrospective Cohort Study.

作者信息

Peng Ting, Miao Rujia, Xiong Hao, Lin Yanhui, Fan Duzhen, Ren Jiayi, Wang Jiangang, Li Yuan, Chen Jianwen

机构信息

Health Management Center, Third Xiangya Hospital, Changsha, China.

School of Mathematics and Statistics, Hunan University of Technology and Business, 569 Yuelu District, Changsha, 410205, China, 86 18692269664, 86 88618571.

出版信息

JMIR Med Inform. 2025 Jun 13;13:e72238. doi: 10.2196/72238.

Abstract

BACKGROUND

Insulin resistance (IR), a precursor to type 2 diabetes and a major risk factor for various chronic diseases, is becoming increasingly prevalent in China due to population aging and unhealthy lifestyles. Current methods like the gold-standard hyperinsulinemic-euglycemic clamp has limitations in practical application. The development of more convenient and efficient methods to predict and manage IR in nondiabetic populations will have prevention and control value.

OBJECTIVE

This study aimed to develop and validate a machine learning prediction model for IR in a nondiabetic population, using low-cost diagnostic indicators and questionnaire surveys.

METHODS

A cross-sectional study was conducted for model development, and a retrospective cohort study was used for validation. Data from 17,287 adults with normal fasting blood glucose who underwent physical exams and completed surveys at the Health Management Center of Xiangya Third Hospital, Central South University, from January 2018 to August 2022, were analyzed. IR was assessed using the Homeostasis Model Assessment (HOMA-IR) method. The dataset was split into 80% (13,128/16,411) training and 20% (32,83/16,411) testing. A total of 5 machine learning algorithms, namely random forest, Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting, Gradient Boosting Machine, and CatBoost were used. Model optimization included resampling, feature selection, and hyperparameter tuning. Performance was evaluated using F1-score, accuracy, sensitivity, specificity, area under the curve (AUC), and Kappa value. Shapley Additive Explanations analysis was used to assess feature importance. For clinical implication investigation, a different retrospective cohort of 20,369 nondiabetic participants (from the Xiangya Third Hospital database between January 2017 and January 2019) was used for time-to-event analysis with Kaplan-Meier survival curves.

RESULTS

Data from 16,411 nondiabetic individuals were analyzed. We randomly selected 13,128 participants for the training group, and 3283 participants for the validation group. The final model included 34 lifestyle-related questionnaire features and 17 biochemical markers. In the validation group, their AUC were all greater than 0.90. In the test group, all AUC were also greater than 0.80. The LightGBM model showed the best IR prediction performance with an accuracy of 0.7542, sensitivity of 0.6639, specificity of 0.7642, F1-score of 0.6748, Kappa value of 0.3741, and AUC of 0.8456. Top 10 features included BMI, fasting blood glucose, high-density lipoprotein cholesterol, triglycerides, creatinine, alanine aminotransferase, sex, total bilirubin, age, and albumin/globulin ratio. In the validation queue, all participants were separated into the high-risk IR group and the low-risk IR group according to the LightGBM algorithm. Out of 5101 high-risk IR participants, 235 (4.6%) developed diabetes, while 137 (0.9%) of 15,268 low-risk IR participants did. This resulted in a hazard ratio of 5.1, indicating a significantly higher risk for the high-risk IR group.

CONCLUSIONS

By leveraging low-cost laboratory indicators and questionnaire data, the LightGBM model effectively predicts IR status in nondiabetic individuals, aiding in large-scale IR screening and diabetes prevention, and it may potentially become an efficient and practical tool for insulin sensitivity assessment in these settings.

摘要

背景

胰岛素抵抗(IR)是2型糖尿病的先兆,也是各种慢性疾病的主要危险因素。由于人口老龄化和不健康的生活方式,IR在中国正变得越来越普遍。当前的方法,如金标准的高胰岛素正葡萄糖钳夹技术在实际应用中存在局限性。开发更便捷高效的方法来预测和管理非糖尿病人群的IR具有预防和控制价值。

目的

本研究旨在利用低成本诊断指标和问卷调查,开发并验证一种针对非糖尿病人群IR的机器学习预测模型。

方法

进行一项横断面研究用于模型开发,并采用回顾性队列研究进行验证。分析了2018年1月至2022年8月在中南大学湘雅三医院健康管理中心进行体检并完成调查的17287名空腹血糖正常成年人的数据。使用稳态模型评估(HOMA-IR)方法评估IR。数据集被分为80%(13128/16411)用于训练和20%(3283/16411)用于测试。共使用了5种机器学习算法,即随机森林、轻梯度提升机(LightGBM)、极限梯度提升、梯度提升机和CatBoost。模型优化包括重采样、特征选择和超参数调整。使用F1分数、准确率、灵敏度、特异性、曲线下面积(AUC)和Kappa值评估性能。使用Shapley加性解释分析评估特征重要性。为了进行临床意义研究,使用了一个不同的回顾性队列,即20369名非糖尿病参与者(来自湘雅三医院2017年1月至2019年1月的数据库),通过Kaplan-Meier生存曲线进行事件发生时间分析。

结果

分析了16411名非糖尿病个体的数据。我们随机选择13128名参与者作为训练组,3283名参与者作为验证组。最终模型包括34个与生活方式相关的问卷特征和17个生化标志物。在验证组中,它们的AUC均大于0.90。在测试组中,所有AUC也大于0.80。LightGBM模型表现出最佳的IR预测性能,准确率为0.7542,灵敏度为0.6639,特异性为0.7642,F1分数为0.6748,Kappa值为0.3741,AUC为0.8456。前10个特征包括体重指数、空腹血糖、高密度脂蛋白胆固醇、甘油三酯、肌酐、丙氨酸转氨酶、性别、总胆红素、年龄和白蛋白/球蛋白比值。在验证队列中,根据LightGBM算法将所有参与者分为高风险IR组和低风险IR组。在5101名高风险IR参与者中,235人(4.6%)患糖尿病,而在15268名低风险IR参与者中,137人(0.9%)患糖尿病。这导致风险比为5.1,表明高风险IR组的风险显著更高。

结论

通过利用低成本的实验室指标和问卷数据,LightGBM模型有效地预测了非糖尿病个体的IR状态,有助于大规模IR筛查和糖尿病预防,并且它可能成为这些情况下胰岛素敏感性评估的一种高效实用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8db/12180673/61e611a3fde0/medinform-v13-e72238-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验