Tsai Shang-Feng, Yang Chao-Tung, Liu Wei-Ju, Lee Chia-Lin
Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung, Taiwan.
School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
EClinicalMedicine. 2023 Apr 4;58:101934. doi: 10.1016/j.eclinm.2023.101934. eCollection 2023 Apr.
Insulin resistance (IR) is associated with diabetes mellitus, cardiovascular disease (CV), and mortality. Few studies have used machine learning to predict IR in the non-diabetic population.
In this prospective cohort study, we trained a predictive model for IR in the non-diabetic populations using the US National Health and Nutrition Examination Survey (NHANES, from JAN 01, 1999 to DEC 31, 2012) database and the Taiwan MAJOR (from JAN 01, 2008 to DEC 31, 2017) database. We analysed participants in the NHANES and MAJOR and participants were excluded if they were aged <18 years old, had incomplete laboratory data, or had DM. To investigate the clinical implications (CV and all-cause mortality) of this trained model, we tested it with the Taiwan biobank (TWB) database from DEC 10, 2008 to NOV 30, 2018. We then used SHapley Additive exPlanation (SHAP) values to explain differences across the machine learning models.
Of all participants (combined NHANES and MJ databases), we randomly selected 14,705 participants for the training group, and 4018 participants for the validation group. In the validation group, their areas under the curve (AUC) were all >0.8 (highest being XGboost, 0.87). In the test group, all AUC were also >0.80 (highest being XGboost, 0.88). Among all 9 features (age, gender, race, body mass index, fasting plasma glucose (FPG), glycohemoglobin, triglyceride, total cholesterol and high-density cholesterol), BMI had the highest value of feature importance on IR (0.43 for XGboost and 0.47 for RF algorithms). All participants from the TWB database were separated into the IR group and the non-IR group according to the XGboost algorithm. The Kaplan-Meier survival curve showed a significant difference between the IR and non-IR groups (p < 0.0001 for CV mortality, and p = 0.0006 for all-cause mortality). Therefore, the XGboost model has clear clinical implications for predicting IR, aside from CV and all-cause mortality.
To predict IR in non-diabetic patients with high accuracy, only 9 easily obtained features are needed for prediction accuracy using our machine learning model. Similarly, the model predicts IR patients with significantly higher CV and all-cause mortality. The model can be applied to both Asian and Caucasian populations in clinical practice.
Taichung Veterans General Hospital, Taiwan and Japan Society for the Promotion of Science KAKENHI Grant Number JP21KK0293.
胰岛素抵抗(IR)与糖尿病、心血管疾病(CV)及死亡率相关。很少有研究使用机器学习来预测非糖尿病人群的IR。
在这项前瞻性队列研究中,我们使用美国国家健康与营养检查调查(NHANES,1999年1月1日至2012年12月31日)数据库和台湾MAJOR(2008年1月1日至2017年12月31日)数据库,为非糖尿病人群训练了一个IR预测模型。我们分析了NHANES和MAJOR中的参与者,年龄<18岁、实验室数据不完整或患有糖尿病的参与者被排除。为了研究这个训练模型的临床意义(CV和全因死亡率),我们使用2008年12月10日至2018年11月30日的台湾生物银行(TWB)数据库对其进行测试。然后我们使用夏普利值(SHAP)来解释机器学习模型之间的差异。
在所有参与者(NHANES和MJ数据库合并)中,我们随机选择14705名参与者作为训练组,4018名参与者作为验证组。在验证组中,他们的曲线下面积(AUC)均>0.8(最高的是XGBoost,为0.87)。在测试组中,所有AUC也>0.80(最高的是XGBoost,为0.88)。在所有9个特征(年龄、性别、种族、体重指数、空腹血糖(FPG)、糖化血红蛋白、甘油三酯、总胆固醇和高密度胆固醇)中,BMI对IR的特征重要性值最高(XGBoost算法为0.43,随机森林(RF)算法为0.47)。根据XGBoost算法,将TWB数据库中的所有参与者分为IR组和非IR组。Kaplan-Meier生存曲线显示IR组和非IR组之间存在显著差异(CV死亡率p<0.0001,全因死亡率p = 0.0006)。因此XGBoost模型除了对CV和全因死亡率有预测作用外,对预测IR也有明确的临床意义。
为了高精度地预测非糖尿病患者的IR,使用我们的机器学习模型进行预测准确性仅需要9个容易获得的特征。同样,该模型预测IR患者的CV和全因死亡率显著更高。该模型可在临床实践中应用于亚洲和白种人群体。
台湾台中荣民总医院和日本学术振兴会科研资助金编号JP21KK0293。