Zhang Qian, Wan Nai-Jun
Department of Pediatrics, Beijing Jishuitan Hospital, Beijing, People's Republic of China.
Diabetes Metab Syndr Obes. 2022 Sep 27;15:2963-2975. doi: 10.2147/DMSO.S380772. eCollection 2022.
Due to the increasing insulin resistance (IR) in childhood, rates of diabetes and cardiovascular disease may rise in the future and seriously threaten the healthy development of children. Finding an easy way to predict IR in children can help pediatricians to identify these children in time and intervene appropriately, which is particularly important for practitioners in primary health care.
Seventeen features from 503 children 6-12 years old were collected. We defined IR by HOMA-IR greater than 3.0, thus classifying children with IR and those without IR. Data were preprocessed by multivariate imputation and oversampling to resolve missing values and data imbalances; then, recursive feature elimination was applied to further select features of interest, and 5 machine learning methods-namely, logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), and gradient boosting with categorical features support (CatBoost)-were used for model training. We tested the trained models on an external test set containing information from 133 children, from which performance metrics were extracted and the optimal model was selected.
After feature selection, the numbers of chosen features for the LR, SVM, RF, XGBoost, and CatBoost models were 6, 9, 10, 14, and 6, respectively. Among them, glucose, waist circumference, and age were chosen as predictors by most of the models. Finally, all 5 models achieved good performance on the external test set. Both XGBoost and CatBoost had the same AUC (0.85), which was highest among those of all models. Their accuracy, sensitivity, precision, and F1 scores were also close, but the specificity of XGBoost reached 0.79, which was significantly higher than that of CatBoost, so XGBoost was chosen as the optimal model.
The model developed herein has a good predictive ability for IR in children 6-12 years old and can be clinically applied to help pediatricians identify children with IR in a simple and inexpensive way.
由于儿童期胰岛素抵抗(IR)不断增加,未来糖尿病和心血管疾病的发病率可能上升,并严重威胁儿童的健康发育。找到一种简单的方法来预测儿童的IR,有助于儿科医生及时识别这些儿童并进行适当干预,这对初级卫生保健从业者尤为重要。
收集了503名6至12岁儿童的17项特征。我们将HOMA-IR大于3.0定义为IR,从而将有IR的儿童和无IR的儿童进行分类。数据通过多变量插补和过采样进行预处理,以解决缺失值和数据不平衡问题;然后,应用递归特征消除进一步选择感兴趣的特征,并使用5种机器学习方法,即逻辑回归(LR)、支持向量机(SVM)、随机森林(RF)、极端梯度提升(XGBoost)和支持分类特征的梯度提升(CatBoost)进行模型训练。我们在一个包含133名儿童信息的外部测试集上测试了训练好的模型,从中提取性能指标并选择最优模型。
经过特征选择后,LR、SVM、RF、XGBoost和CatBoost模型所选特征的数量分别为6、9、10、14和6。其中,大多数模型都选择血糖、腰围和年龄作为预测指标。最后,所有5个模型在外部测试集上均表现良好。XGBoost和CatBoost的AUC相同(0.85),在所有模型中最高。它们的准确率、灵敏度、精确率和F1分数也相近,但XGBoost的特异性达到0.79,显著高于CatBoost,因此选择XGBoost作为最优模型。
本文开发的模型对6至12岁儿童的IR具有良好的预测能力,可在临床上应用,以帮助儿科医生以简单且经济的方式识别有IR的儿童。