Suppr超能文献

一种基于集成的特征选择框架,用于选择儿童肥胖的风险因素以辅助政策制定。

An ensemble-based feature selection framework to select risk factors of childhood obesity for policy decision making.

作者信息

Shi Xi, Nikolic Gorana, Epelde Gorka, Arrúe Mónica, Bidaurrazaga Van-Dierdonck Joseba, Bilbao Roberto, De Moor Bart

机构信息

Department of Electrical Engineering (ESAT), Stadius Centre for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10 - box 2446, 3001, Leuven, Belgium.

Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Donostia-San Sebastián, Spain.

出版信息

BMC Med Inform Decis Mak. 2021 Jul 21;21(1):222. doi: 10.1186/s12911-021-01580-0.

Abstract

BACKGROUND

The increasing prevalence of childhood obesity makes it essential to study the risk factors with a sample representative of the population covering more health topics for better preventive policies and interventions. It is aimed to develop an ensemble feature selection framework for large-scale data to identify risk factors of childhood obesity with good interpretability and clinical relevance.

METHODS

We analyzed the data collected from 426,813 children under 18 during 2000-2019. A BMI above the 90th percentile for the children of the same age and gender was defined as overweight. An ensemble feature selection framework, Bagging-based Feature Selection framework integrating MapReduce (BFSMR), was proposed to identify risk factors. The framework comprises 5 models (filter with mutual information/SVM-RFE/Lasso/Ridge/Random Forest) from filter, wrapper, and embedded feature selection methods. Each feature selection model identified 10 variables based on variable importance. Considering accuracy, F-score, and model characteristics, the models were classified into 3 levels with different weights: Lasso/Ridge, Filter/SVM-RFE, and Random Forest. The voting strategy was applied to aggregate the selected features, with both feature weights and model weights taken into consideration. We compared our voting strategy with another two for selecting top-ranked features in terms of 6 dimensions of interpretability.

RESULTS

Our method performed the best to select the features with good interpretability and clinical relevance. The top 10 features selected by BFSMR are age, sex, birth year, breastfeeding type, smoking habit and diet-related knowledge of both children and mothers, exercise, and Mother's systolic blood pressure.

CONCLUSION

Our framework provides a solution for identifying a diverse and interpretable feature set without model bias from large-scale data, which can help identify risk factors of childhood obesity and potentially some other diseases for future interventions or policies.

摘要

背景

儿童肥胖患病率不断上升,因此有必要以具有代表性的人群样本研究风险因素,涵盖更多健康主题,以制定更好的预防政策和干预措施。旨在为大规模数据开发一个集成特征选择框架,以识别具有良好可解释性和临床相关性的儿童肥胖风险因素。

方法

我们分析了2000年至2019年期间收集的426813名18岁以下儿童的数据。将同年龄、同性别的儿童中BMI高于第90百分位数定义为超重。提出了一个集成特征选择框架,即基于Bagging的集成MapReduce的特征选择框架(BFSMR),以识别风险因素。该框架包括来自过滤、包装和嵌入式特征选择方法的5个模型(互信息过滤/SVM-RFE/Lasso/Ridge/随机森林)。每个特征选择模型根据变量重要性识别10个变量。考虑到准确性、F分数和模型特征,将模型分为具有不同权重的3个级别:Lasso/Ridge、过滤/SVM-RFE和随机森林。应用投票策略汇总所选特征,同时考虑特征权重和模型权重。我们将我们的投票策略与另外两种策略在可解释性的6个维度上选择排名靠前的特征进行了比较。

结果

我们的方法在选择具有良好可解释性和临床相关性的特征方面表现最佳。BFSMR选择的前10个特征是年龄、性别、出生年份、母乳喂养类型、儿童和母亲的吸烟习惯及饮食相关知识、运动以及母亲的收缩压。

结论

我们的框架为从大规模数据中识别多样化且可解释的特征集提供了一个解决方案,且无模型偏差,这有助于识别儿童肥胖以及未来干预或政策可能涉及的其他一些疾病的风险因素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fb5/8293582/e8420f1b521e/12911_2021_1580_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验