评估应用于移动健康研究的机器学习的验证和偏差的实用方法。

Practical approaches in evaluating validation and biases of machine learning applied to mobile health studies.

作者信息

Allgaier Johannes, Pryss Rüdiger

机构信息

Institute of Clinical Epidemiology and Biometry, Julius-Maximilians-University Würzburg, Josef-Schneider-Straße 2, Würzburg, Germany.

出版信息

Commun Med (Lond). 2024 Apr 22;4(1):76. doi: 10.1038/s43856-024-00468-0.

DOI:10.1038/s43856-024-00468-0

PMID:38649784

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11035658/

Abstract

BACKGROUND

Machine learning (ML) models are evaluated in a test set to estimate model performance after deployment. The design of the test set is therefore of importance because if the data distribution after deployment differs too much, the model performance decreases. At the same time, the data often contains undetected groups. For example, multiple assessments from one user may constitute a group, which is usually the case in mHealth scenarios.

METHODS

In this work, we evaluate a model's performance using several cross-validation train-test-split approaches, in some cases deliberately ignoring the groups. By sorting the groups (in our case: Users) by time, we additionally simulate a concept drift scenario for better external validity. For this evaluation, we use 7 longitudinal mHealth datasets, all containing Ecological Momentary Assessments (EMA). Further, we compared the model performance with baseline heuristics, questioning the essential utility of a complex ML model.

RESULTS

Hidden groups in the dataset leads to overestimation of ML performance after deployment. For prediction, a user's last completed questionnaire is a reasonable heuristic for the next response, and potentially outperforms a complex ML model. Because we included 7 studies, low variance appears to be a more fundamental phenomenon of mHealth datasets.

CONCLUSIONS

The way mHealth-based data are generated by EMA leads to questions of user and assessment level and appropriate validation of ML models. Our analysis shows that further research needs to follow to obtain robust ML models. In addition, simple heuristics can be considered as an alternative for ML. Domain experts should be consulted to find potentially hidden groups in the data.

摘要

背景

机器学习（ML）模型在测试集中进行评估，以估计部署后的模型性能。因此，测试集的设计至关重要，因为如果部署后的数据分布差异过大，模型性能就会下降。同时，数据中常常包含未被检测到的组。例如，来自同一用户的多次评估可能构成一个组，这在移动健康（mHealth）场景中通常如此。

方法

在这项工作中，我们使用几种交叉验证训练 - 测试分割方法来评估模型性能，在某些情况下故意忽略这些组。通过按时间对组（在我们的案例中：用户）进行排序，我们还模拟了概念漂移场景以提高外部有效性。对于此评估，我们使用了7个纵向移动健康数据集，所有数据集都包含生态瞬时评估（EMA）。此外，我们将模型性能与基线启发式方法进行了比较，质疑复杂ML模型的基本效用。

结果

数据集中的隐藏组会导致部署后对ML性能的高估。对于预测，用户最后完成的问卷对于下一次回答是一种合理的启发式方法，并且可能优于复杂的ML模型。由于我们纳入了7项研究，低方差似乎是移动健康数据集更基本的现象。

结论

基于移动健康的数据由EMA生成的方式引发了关于用户和评估层面以及ML模型适当验证的问题。我们的分析表明，需要进一步开展研究以获得稳健的ML模型。此外，简单的启发式方法可被视为ML的替代方法。应咨询领域专家以发现数据中潜在的隐藏组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4f9/11035658/e8715c8a8d28/43856_2024_468_Fig1_HTML.jpg

相似文献

Practical approaches in evaluating validation and biases of machine learning applied to mobile health studies.

Commun Med (Lond). 2024 Apr 22;4(1):76. doi: 10.1038/s43856-024-00468-0.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Predicting Ecological Momentary Assessments in an App for Tinnitus by Learning From Each User's Stream With a Contextual Multi-Armed Bandit.

Front Neurosci. 2022 Apr 11;16:836834. doi: 10.3389/fnins.2022.836834. eCollection 2022.

Deep convolutional neural network and IoT technology for healthcare.

Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.

A usability design checklist for Mobile electronic data capturing forms: the validation process.

BMC Med Inform Decis Mak. 2019 Jan 9;19(1):4. doi: 10.1186/s12911-018-0718-3.

[Standard technical specifications for methacholine chloride (Methacholine) bronchial challenge test (2023)].

Zhonghua Jie He He Hu Xi Za Zhi. 2024 Feb 12;47(2):101-119. doi: 10.3760/cma.j.cn112147-20231019-00247.

Learning From Others Without Sacrificing Privacy: Simulation Comparing Centralized and Federated Machine Learning on Mobile Health Data.

JMIR Mhealth Uhealth. 2021 Mar 30;9(3):e23728. doi: 10.2196/23728.

Investigating Receptivity and Affect Using Machine Learning: Ecological Momentary Assessment and Wearable Sensing Study.

JMIR Mhealth Uhealth. 2024 Feb 7;12:e46347. doi: 10.2196/46347.

Data-driven evolution of water quality models: An in-depth investigation of innovative outlier detection approaches-A case study of Irish Water Quality Index (IEWQI) model.

Water Res. 2024 May 15;255:121499. doi: 10.1016/j.watres.2024.121499. Epub 2024 Mar 20.

Development of an Intervention Targeting Multiple Health Behaviors Among High School Students: Participatory Design Study Using Heuristic Evaluation and Usability Testing.

JMIR Mhealth Uhealth. 2020 Oct 29;8(10):e17999. doi: 10.2196/17999.

引用本文的文献

Recognizing and understanding stress in adults during Covid-19: Data insights from the corona health app.

Data Brief. 2025 Aug 12;62:111967. doi: 10.1016/j.dib.2025.111967. eCollection 2025 Oct.

Mental health and ecological momentary assessments during COVID-19: Data from the corona health app adolescents study.

Data Brief. 2025 May 7;60:111619. doi: 10.1016/j.dib.2025.111619. eCollection 2025 Jun.

Machine learning-based differentiation of schizophrenia and bipolar disorder using multiscale fuzzy entropy and relative power from resting-state EEG.

Transl Psychiatry. 2025 Apr 12;15(1):144. doi: 10.1038/s41398-025-03354-y.

Exploring the predictive power of antinuclear antibodies and Rheumatoid factor correlations in anticipating therapeutic outcomes for female patients with coexisting Sjögren's syndrome and Rheumatoid arthritis.

J Oral Biol Craniofac Res. 2025 Mar-Apr;15(2):288-296. doi: 10.1016/j.jobcr.2025.01.012. Epub 2025 Feb 11.

Physical health and ecological momentary assessments during COVID-19: Data from the 'Corona Health' app users.

Data Brief. 2025 Jan 13;59:111289. doi: 10.1016/j.dib.2025.111289. eCollection 2025 Apr.

Process mining in mHealth data analysis.

NPJ Digit Med. 2024 Oct 23;7(1):299. doi: 10.1038/s41746-024-01297-0.

本文引用的文献

The statistical analysis plan for the unification of treatments and interventions for tinnitus patients randomized clinical trial (UNITI-RCT).

Trials. 2023 Jul 24;24(1):472. doi: 10.1186/s13063-023-07303-2.

Self-Assessment of Having COVID-19 With the Corona Check mHealth App.

IEEE J Biomed Health Inform. 2023 Jun;27(6):2794-2805. doi: 10.1109/JBHI.2023.3264999. Epub 2023 Jun 5.

Associations of Country-Specific and Sociodemographic Factors With Self-Reported COVID-19-Related Symptoms: Multivariable Analysis of Data From the CoronaCheck Mobile Health Platform.

JMIR Public Health Surveill. 2023 Feb 3;9:e40958. doi: 10.2196/40958.

Prediction of Tinnitus Perception Based on Daily Life MHealth Data Using Country Origin and Season.

J Clin Med. 2022 Jul 22;11(15):4270. doi: 10.3390/jcm11154270.

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.

Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.

UNITI Mobile-EMI-Apps for a Large-Scale European Study on Tinnitus.

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2358-2362. doi: 10.1109/EMBC46164.2021.9630482.

Predicting the gender of individuals with tinnitus based on daily life data of the TrackYourTinnitus mHealth platform.

Sci Rep. 2021 Sep 15;11(1):18375. doi: 10.1038/s41598-021-96731-8.

Corona Health-A Study- and Sensor-Based Mobile App Platform Exploring Aspects of the COVID-19 Pandemic.

Int J Environ Res Public Health. 2021 Jul 10;18(14):7395. doi: 10.3390/ijerph18147395.

"How Come You Don't Call Me?" Smartphone Communication App Usage as an Indicator of Loneliness and Social Well-Being across the Adult Lifespan during the COVID-19 Pandemic.

Int J Environ Res Public Health. 2021 Jun 8;18(12):6212. doi: 10.3390/ijerph18126212.

Towards a unification of treatments and interventions for tinnitus patients: The EU research and innovation action UNITI.

Prog Brain Res. 2021;260:441-451. doi: 10.1016/bs.pbr.2020.12.005. Epub 2021 Feb 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估应用于移动健康研究的机器学习的验证和偏差的实用方法。

Practical approaches in evaluating validation and biases of machine learning applied to mobile health studies.

作者信息

Allgaier Johannes, Pryss Rüdiger

机构信息

Institute of Clinical Epidemiology and Biometry, Julius-Maximilians-University Würzburg, Josef-Schneider-Straße 2, Würzburg, Germany.