Suppr超能文献

评估应用于移动健康研究的机器学习的验证和偏差的实用方法。

Practical approaches in evaluating validation and biases of machine learning applied to mobile health studies.

作者信息

Allgaier Johannes, Pryss Rüdiger

机构信息

Institute of Clinical Epidemiology and Biometry, Julius-Maximilians-University Würzburg, Josef-Schneider-Straße 2, Würzburg, Germany.

出版信息

Commun Med (Lond). 2024 Apr 22;4(1):76. doi: 10.1038/s43856-024-00468-0.

Abstract

BACKGROUND

Machine learning (ML) models are evaluated in a test set to estimate model performance after deployment. The design of the test set is therefore of importance because if the data distribution after deployment differs too much, the model performance decreases. At the same time, the data often contains undetected groups. For example, multiple assessments from one user may constitute a group, which is usually the case in mHealth scenarios.

METHODS

In this work, we evaluate a model's performance using several cross-validation train-test-split approaches, in some cases deliberately ignoring the groups. By sorting the groups (in our case: Users) by time, we additionally simulate a concept drift scenario for better external validity. For this evaluation, we use 7 longitudinal mHealth datasets, all containing Ecological Momentary Assessments (EMA). Further, we compared the model performance with baseline heuristics, questioning the essential utility of a complex ML model.

RESULTS

Hidden groups in the dataset leads to overestimation of ML performance after deployment. For prediction, a user's last completed questionnaire is a reasonable heuristic for the next response, and potentially outperforms a complex ML model. Because we included 7 studies, low variance appears to be a more fundamental phenomenon of mHealth datasets.

CONCLUSIONS

The way mHealth-based data are generated by EMA leads to questions of user and assessment level and appropriate validation of ML models. Our analysis shows that further research needs to follow to obtain robust ML models. In addition, simple heuristics can be considered as an alternative for ML. Domain experts should be consulted to find potentially hidden groups in the data.

摘要

背景

机器学习(ML)模型在测试集中进行评估,以估计部署后的模型性能。因此,测试集的设计至关重要,因为如果部署后的数据分布差异过大,模型性能就会下降。同时,数据中常常包含未被检测到的组。例如,来自同一用户的多次评估可能构成一个组,这在移动健康(mHealth)场景中通常如此。

方法

在这项工作中,我们使用几种交叉验证训练 - 测试分割方法来评估模型性能,在某些情况下故意忽略这些组。通过按时间对组(在我们的案例中:用户)进行排序,我们还模拟了概念漂移场景以提高外部有效性。对于此评估,我们使用了7个纵向移动健康数据集,所有数据集都包含生态瞬时评估(EMA)。此外,我们将模型性能与基线启发式方法进行了比较,质疑复杂ML模型的基本效用。

结果

数据集中的隐藏组会导致部署后对ML性能的高估。对于预测,用户最后完成的问卷对于下一次回答是一种合理的启发式方法,并且可能优于复杂的ML模型。由于我们纳入了7项研究,低方差似乎是移动健康数据集更基本的现象。

结论

基于移动健康的数据由EMA生成的方式引发了关于用户和评估层面以及ML模型适当验证的问题。我们的分析表明,需要进一步开展研究以获得稳健的ML模型。此外,简单的启发式方法可被视为ML的替代方法。应咨询领域专家以发现数据中潜在的隐藏组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4f9/11035658/e8715c8a8d28/43856_2024_468_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验