Silva Sepulveda Rosario, Boman Magnus
Karolinska Institutet, Department of Medicine Solna, Division of Clinical Epidemiology, Stockholm, Sweden.
MedTechLabs, BioClinicum, Karolinska University Hospital, Stockholm, Sweden.
Front Public Health. 2025 Jan 7;12:1369041. doi: 10.3389/fpubh.2024.1369041. eCollection 2024.
Mexico has one of the highest global incidences of paediatric overweight and obesity. Public health interventions have shown only moderate success, possibly from relying on knowledge extracted using limited types of statistical data analysis methods.
To explore if multimodal machine learning can enhance identifying predictive features from obesogenic environments and investigating complex disease or social patterns, using the Mexican National Health and Nutrition Survey.
We grouped features into five data modalities corresponding to paediatric population exogenous factors, in two multimodal machine learning pipelines, against a unimodal early fusion baseline. The supervised pipeline employed four methods: Linear classifier with Elastic Net regularisation, k-Nearest Neighbour, Decision Tree, and Random Forest. The unsupervised pipeline used traditional methods with k-Means and hierarchical clustering, with the optimal number of clusters calculated to be = 2.
The decision tree classifier in the supervised early fusion approach produced the best quantitative results. The top five most important features for classifying child or adolescent health were measures of an adult in the household, selected at random: BMI, obesity diagnosis, being single, seeking care at private healthcare, and having paid TV in the home. Unsupervised learning approaches varied in the optimal number of clusters but agreed on the importance of home environment features when analysing inter-cluster patterns. Main findings from this study differed from previous studies using only traditional statistical methods on the same database. Notably, the BMI of a randomised adult within the household emerged as the most important feature, rather than maternal BMI, as reported in previous literature where unwanted cultural bias went undetected.
Our general conclusion is that multimodal machine learning is a promising approach for comprehensively analysing obesogenic environments. The modalities allowed for a multimodal approach designed to critically analyse data signal strength and reveal sources of unwanted bias. In particular, it may aid in developing more effective public health policies to address the ongoing paediatric obesity epidemic in Mexico.
墨西哥是全球儿童超重和肥胖发病率最高的国家之一。公共卫生干预措施仅取得了一定程度的成功,这可能是因为依赖于使用有限类型的统计数据分析方法提取的知识。
利用墨西哥国家卫生和营养调查,探讨多模态机器学习是否能够增强从致肥胖环境中识别预测特征以及研究复杂疾病或社会模式的能力。
在两个多模态机器学习流程中,我们将特征分为与儿童群体外部因素相对应的五种数据模态,并与单模态早期融合基线进行对比。监督式流程采用了四种方法:带弹性网络正则化的线性分类器、k近邻、决策树和随机森林。无监督式流程使用了k均值和层次聚类的传统方法,计算得出的最优聚类数为2。
监督式早期融合方法中的决策树分类器产生了最佳的定量结果。对儿童或青少年健康进行分类时,最重要的五个特征是随机选择的家庭中成年人的指标:体重指数(BMI)、肥胖诊断、单身、在私立医疗保健机构就医以及家中有付费电视。无监督学习方法的最优聚类数各不相同,但在分析聚类间模式时,都认同家庭环境特征的重要性。本研究的主要发现与之前在同一数据库上仅使用传统统计方法的研究不同。值得注意的是,家庭中随机选择的成年人的BMI成为了最重要的特征,而不是如之前文献报道的母亲BMI,之前的文献未发现其中存在的不良文化偏见。
我们的总体结论是,多模态机器学习是全面分析致肥胖环境的一种有前景的方法。这些模态允许采用多模态方法,旨在批判性地分析数据信号强度并揭示不必要偏差的来源。特别是,它可能有助于制定更有效的公共卫生政策,以应对墨西哥持续的儿童肥胖流行问题。