Vos Gideon, van Eijk Liza, Sarnyai Zoltan, Rahimi Azghadi Mostafa
College of Science and Engineering, James Cook University, James Cook Dr, Townsville, 4811, QLD, Australia.
College of Health Care Sciences, James Cook University, James Cook Dr, Townsville, 4811, QLD, Australia.
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
INTRODUCTION: Machine Learning (ML) is transforming medical research by enhancing diagnostic accuracy, predicting disease progression, and personalizing treatments. While general models trained on large datasets identify broad patterns across populations, the diversity of human biology, shaped by genetics, environment, and lifestyle, often limits their effectiveness. This has driven a shift towards subject-specific models that incorporate individual biological and clinical data for more precise predictions and personalized care. However, developing these models presents significant practical and financial challenges. Additionally, ML models initialized through stochastic processes with random seeds can suffer from reproducibility issues when those seeds are changed, leading to variations in predictive performance and feature importance. To address this, this study introduces a novel validation approach to enhance model interpretability, stabilizing predictive performance and feature importance at both the group and subject-specific levels. METHODS: We conducted initial experiments using a single Random Forest (RF) model initialized with a random seed for key stochastic processes, on nine datasets that varied in domain problems, sample size, and demographics. Different validation techniques were applied to assess model accuracy and reproducibility while evaluating feature importance consistency. Next, the experiment was repeated for each dataset for up to 400 trials per subject, randomly seeding the machine learning algorithm between each trial. This introduced variability in the initialization of model parameters, thus providing a more comprehensive evaluation of the machine learning model's features and performance consistency. The repeated trials generated up to 400 feature sets per subject. By aggregating feature importance rankings across trials, our method identified the most consistently important features, reducing the impact of noise and random variation in feature selection. The top subject-specific feature importance set across all trials was then identified. Finally, using all subject-specific feature sets, the top group-specific feature importance set was also created. This process resulted in stable, reproducible feature rankings, enhancing both subject-level and group-level model explainability. RESULTS: We found that machine learning models with stochastic initialization were particularly susceptible to variations in reproducibility, predictive accuracy, and feature importance due to random seed selection and validation techniques during training. Changes in random seeds altered weight initialization, optimization paths, and feature rankings, leading to fluctuations in test accuracy and interpretability. These findings align with prior research on the sensitivity of stochastic models to initialization randomness. This study builds on that understanding by introducing a novel repeated trials validation approach with random seed variation, significantly reducing variability in feature rankings and improving the consistency of model performance metrics. The method enabled robust identification of key features for each subject using a single, generic machine learning model, making predictions more interpretable and stable across experiments. CONCLUSION: Subject-specific models improve generalization by addressing variability in human biology but are often costly and impractical for clinical trials. In this study, we introduce a novel validation technique for determining both group- and subject-specific feature importance within a general machine learning model, achieving greater stability in feature selection, higher predictive accuracy, and improved model interpretability. Our proposed approach ensures reproducible accuracy metrics and reliable feature rankings when using models incorporating stochastic processes, making machine learning models more robust and clinically applicable.
引言:机器学习(ML)正在通过提高诊断准确性、预测疾病进展和实现治疗个性化来改变医学研究。虽然在大型数据集上训练的通用模型能够识别总体人群中的广泛模式,但由基因、环境和生活方式塑造的人类生物学多样性往往会限制其有效性。这推动了向特定个体模型的转变,这些模型纳入个体生物学和临床数据以进行更精确的预测和个性化护理。然而,开发这些模型面临重大的实际和资金挑战。此外,通过具有随机种子的随机过程初始化的ML模型,当这些种子发生变化时,可能会出现可重复性问题,导致预测性能和特征重要性的差异。为了解决这个问题,本研究引入了一种新颖的验证方法,以增强模型的可解释性,在群体和特定个体层面稳定预测性能和特征重要性。 方法:我们使用单个随机森林(RF)模型进行了初步实验,该模型针对关键随机过程用随机种子进行初始化,实验数据来自九个在领域问题、样本大小和人口统计学方面存在差异的数据集。应用不同的验证技术来评估模型准确性和可重复性,同时评估特征重要性的一致性。接下来,针对每个数据集重复该实验,每个受试者最多进行400次试验,每次试验之间随机设置机器学习算法的种子。这在模型参数初始化中引入了可变性,从而对机器学习模型的特征和性能一致性进行了更全面的评估。重复试验为每个受试者生成多达400个特征集。通过汇总各次试验的特征重要性排名,我们的方法确定了最一致重要的特征,减少了特征选择中噪声和随机变化的影响。然后确定所有试验中特定个体的顶级特征重要性集。最后,使用所有特定个体的特征集,还创建了顶级群体特定特征重要性集。这个过程产生了稳定、可重复的特征排名,增强了个体层面和群体层面模型的可解释性。 结果:我们发现,由于训练期间随机种子的选择和验证技术,具有随机初始化的机器学习模型在可重复性、预测准确性和特征重要性方面特别容易出现变化。随机种子的变化改变了权重初始化、优化路径和特征排名,导致测试准确性和可解释性的波动。这些发现与之前关于随机模型对初始化随机性敏感性的研究一致。本研究在此基础上,通过引入一种带有随机种子变化的新颖重复试验验证方法,显著降低了特征排名的可变性,提高了模型性能指标的一致性。该方法能够使用单个通用机器学习模型稳健地识别每个受试者的关键特征,使预测在不同实验中更具可解释性和稳定性。 结论:特定个体模型通过解决人类生物学中的变异性来提高泛化能力,但对于临床试验来说往往成本高昂且不切实际。在本研究中,我们引入了一种新颖的验证技术,用于确定通用机器学习模型中群体和特定个体的特征重要性,在特征选择中实现更高的稳定性、更高的预测准确性和更好的模型可解释性。我们提出的方法在使用包含随机过程的模型时确保了可重复的准确性指标和可靠的特征排名,使机器学习模型更稳健且适用于临床。
Comput Methods Programs Biomed. 2025-6-21
Clin Orthop Relat Res. 2024-9-1
Cochrane Database Syst Rev. 2025-3-25
Health Technol Assess. 2006-9