Graduate School, Beijing University of Chinese Medicine, Beijing, China.
Department of Pediatrics, China-Japan Friendship Hospital, Beijing, China.
Endocrine. 2022 Jun;77(1):63-72. doi: 10.1007/s12020-022-03072-1. Epub 2022 May 18.
We adopted the machine-learning algorithms and deep-learning sequential model to determine and optimize most important factors for overweight and obesity in Chinese preschool-aged children.
This is a cross-sectional survey conducted in 2020 at Beijing and Tangshan. Using a stratified cluster random sampling strategy, children aged 3-6 years were enrolled. Data were analyzed using the PyCharm and Python.
A total of 9478 children were eligible for inclusion, including 1250 children with overweight or obesity. All children were randomly divided into the training group and testing group at a 6:4 ratio. After comparison, support vector machine (SVM) outperformed the other algorithms (accuracy: 0.9457), followed by gradient boosting machine (GBM) (accuracy: 0.9454). As reflected by other 4 performance indexes, GBM had the highest F1 score (0.7748), followed by SVM with F1 score at 0.7731. After importance ranking, the top 5 factors seemed sufficient to obtain descent performance under GBM algorithm, including age, eating speed, number of relatives with obesity, sweet drinking, and paternal education. The performance of the top 5 factors was reinforced by the deep-learning sequential model.
We have identified 5 important factors that can be fed to GBM algorithm to better differentiate children with overweight or obesity from the general children, with decent prediction performance.
我们采用机器学习算法和深度学习序列模型来确定和优化中国学龄前儿童超重和肥胖的最重要因素。
这是 2020 年在北京和唐山进行的一项横断面调查。采用分层整群随机抽样策略,纳入 3-6 岁儿童。使用 PyCharm 和 Python 进行数据分析。
共有 9478 名儿童符合纳入标准,其中 1250 名儿童超重或肥胖。所有儿童被随机分为训练组和测试组,比例为 6:4。经过比较,支持向量机(SVM)优于其他算法(准确率:0.9457),其次是梯度提升机(GBM)(准确率:0.9454)。其他 4 项性能指标反映,GBM 的 F1 得分最高(0.7748),其次是 SVM 的 F1 得分(0.7731)。在重要性排名后,前 5 个因素似乎足以在 GBM 算法下获得良好的性能,包括年龄、进食速度、肥胖亲属人数、甜食饮料和父亲教育。深度学习序列模型强化了前 5 个因素的性能。
我们确定了 5 个重要因素,可以输入 GBM 算法,以更好地区分超重或肥胖儿童与一般儿童,具有良好的预测性能。