不仅仅是“大”数据：样本量、测量误差和无信息预测变量对数字干预预后模型开发的重要性。

Not just "big" data: Importance of sample size, measurement error, and uninformative predictors for developing prognostic models for digital interventions.

作者信息

McNamara Mary E, Zisser Mackenzie, Beevers Christopher G, Shumake Jason

机构信息

Department of Psychology and Institute for Mental Health Research, University of Texas at Austin, USA.

出版信息

Behav Res Ther. 2022 Jun;153:104086. doi: 10.1016/j.brat.2022.104086. Epub 2022 Apr 14.

DOI:10.1016/j.brat.2022.104086

PMID:35462242

Abstract

There is strong interest in developing a more efficient mental health care system. Digital interventions and predictive models of treatment prognosis will likely play an important role in this endeavor. This article reviews the application of popular machine learning models to the prediction of treatment prognosis, with a particular focus on digital interventions. Assuming that the prediction of treatment prognosis will involve modeling a complex combination of interacting features with measurement error in both the predictors and outcomes, our simulations suggest that to optimize complex prediction models, sample sizes in the thousands will be required. Machine learning methods capable of discovering complex interactions and nonlinear effects (e.g., decision tree ensembles such as gradient boosted machines) perform particularly well in large samples when the predictors and outcomes have virtually no measurement error. However, in the presence of moderate measurement error, these methods provide little or no benefit over regularized linear regression, even with very large sample sizes (N = 100,000) and a non-linear ground truth. Given these sample size requirements, we argue that the scalability of digital interventions, especially when used in combination with optimal measurement practices, provides one of the most effective ways to study treatment prediction models. We conclude with suggestions about how to implement these algorithms into clinical practice.

摘要

人们对开发更高效的精神卫生保健系统有着浓厚兴趣。数字干预和治疗预后预测模型可能会在这一努力中发挥重要作用。本文回顾了流行的机器学习模型在治疗预后预测中的应用，特别关注数字干预。假设治疗预后的预测将涉及对具有测量误差的预测变量和结果中相互作用特征的复杂组合进行建模，我们的模拟表明，为了优化复杂的预测模型，将需要数千个样本量。当预测变量和结果几乎没有测量误差时，能够发现复杂相互作用和非线性效应的机器学习方法（例如梯度提升机等决策树集成方法）在大样本中表现尤其出色。然而，在存在中等测量误差的情况下，即使样本量非常大（N = 100,000）且真实情况为非线性，这些方法相对于正则化线性回归也几乎没有或没有优势。鉴于这些样本量要求，我们认为数字干预的可扩展性，特别是与最佳测量实践结合使用时，提供了研究治疗预测模型最有效的方法之一。我们最后就如何将这些算法应用于临床实践提出了建议。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

不仅仅是“大”数据：样本量、测量误差和无信息预测变量对数字干预预后模型开发的重要性。

Not just "big" data: Importance of sample size, measurement error, and uninformative predictors for developing prognostic models for digital interventions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

不仅仅是“大”数据：样本量、测量误差和无信息预测变量对数字干预预后模型开发的重要性。

Not just "big" data: Importance of sample size, measurement error, and uninformative predictors for developing prognostic models for digital interventions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献