Wang Hongman, Song Yifan, Bi Hua
School of Humanities, Southeast University, Nanjing, China.
Faculty of Humanities and Social Sciences, Macao Polytechnic University, Macau, Macao SAR, China.
Front Big Data. 2025 Jul 10;8:1574683. doi: 10.3389/fdata.2025.1574683. eCollection 2025.
Community health outcomes significantly impact older populations' wellbeing and quality of life. Traditional analytical methods often struggle to accurately predict health risks at the community level due to their inability to capture complex, non-linear relationships among various health determinants. This study employs a Random Forest Algorithm (RFA) to address this limitation and enhance the predictive modeling of community health outcomes. By leveraging ensemble learning techniques and multi-factor analysis, this study aims to identify and quantify the relative contributions of key health indicators to risk assessment. The study begins with comprehensive data collection from diverse health sources, followed by a systematic preprocessing stage, which includes resolving missing values, normalizing variables, and encoding categorical features. Using bootstrap sampling, multiple decision trees were trained on random subsets of health data, ensuring variability in the model learning. The trees grow to full depth and aggregate their predictions to enhance the accuracy. An out-of-bag (OOB) error estimation was applied to refine the model and provide unbiased performance assessments, ensuring robust generalization to unseen data. The proposed model effectively analyzes key health indicators, ranking the feature importance to determine the most influential predictors of health risks. Results indicate that RFA achieves an accuracy rate of 92%, outperforming conventional prediction methods in terms of precision and recall. These findings underscore the efficacy of Random Forest in identifying critical health risk factors, paving the way for targeted and data-driven public health management strategies and interventions tailored to older adults.
社区健康结果对老年人群的福祉和生活质量有重大影响。传统分析方法往往难以准确预测社区层面的健康风险,因为它们无法捕捉各种健康决定因素之间复杂的非线性关系。本研究采用随机森林算法(RFA)来解决这一局限性,并加强对社区健康结果的预测建模。通过利用集成学习技术和多因素分析,本研究旨在识别和量化关键健康指标对风险评估的相对贡献。该研究首先从各种健康来源进行全面的数据收集,随后是一个系统的预处理阶段,包括解决缺失值、标准化变量以及对分类特征进行编码。使用自助抽样法,在健康数据的随机子集中训练多个决策树,确保模型学习的可变性。这些树生长到最大深度并汇总它们的预测结果以提高准确性。应用袋外(OOB)误差估计来优化模型并提供无偏的性能评估,确保对未见过的数据具有强大的泛化能力。所提出的模型有效地分析了关键健康指标,对特征重要性进行排序以确定健康风险的最有影响力的预测因素。结果表明,RFA的准确率达到92%,在精度和召回率方面优于传统预测方法。这些发现强调了随机森林在识别关键健康风险因素方面的有效性,为针对老年人的有针对性的、数据驱动的公共卫生管理策略和干预措施铺平了道路。