使用机器学习确定调查未完成的个体和方法学风险因素：来自美国千年队列研究的结果。

Characterizing individual and methodological risk factors for survey non-completion using machine learning: findings from the U.S. Millennium Cohort Study.

作者信息

Carnes Nate C, Kolaja Claire A, Lewis Crystal L, Castañeda Sheila F, Rull Rudolph P

机构信息

Deployment Health Research Department, Naval Health Research Center, San Diego, CA, USA.

Leidos, Inc, San Diego, CA, USA.

出版信息

BMC Med Res Methodol. 2025 Jul 14;25(1):174. doi: 10.1186/s12874-025-02620-3.

DOI:10.1186/s12874-025-02620-3

PMID:40660111

Abstract

BACKGROUND

Missing survey data can threaten the validity and generalizability of findings from longitudinal cohort studies. Respondent characteristics and survey attributes may contribute to patterns of survey non-completion, a form of missing data in which respondents begin but do not finish a survey, that can lead to biased conclusions. The objectives of the present research are to demonstrate how machine learning can identify survey non-completion and to characterize individual and methodological factors that are associated with this form of data missingness.

METHODS

The present study developed a novel machine learning algorithm to characterize survey non-completion in the Millennium Cohort Study during the 2019-2021 data collection cycle that included a 30- to 45-min paper or web-based follow-up survey for previously enrolled panels (Panels 1-4, n = 80,986) and a 30- to 45-min web-based baseline survey for new enrollees (Panel 5, n = 58,609). We then examined the effect of individual characteristics and survey attributes on survey non-completion.

RESULTS

This algorithm achieved 99% accuracy and showed that 0.29% of follow-up respondents and 15.43% of new enrollees were survey non-completers. Our findings suggest that certain military and sociodemographic characteristics (e.g., enlisted pay grades) were associated with increased survey non-completion in the 2019-2021 cycle. Survey attributes explained a large proportion of the variability in survey non-completion, with our analyses indicating a higher likelihood of survey non-completion in Sects. (1) located toward the beginning of the survey, (2) with sensitive questions, and (3) with fewer questions.

CONCLUSION

This research highlights the importance of accounting for potential respondent bias due to survey non-completion and identifies factors associated with this type of missing data.

摘要

背景

缺失的调查数据可能会威胁纵向队列研究结果的有效性和普遍性。受访者特征和调查属性可能导致调查未完成模式，这是一种缺失数据形式，即受访者开始但未完成调查，可能导致有偏差的结论。本研究的目的是证明机器学习如何识别调查未完成情况，并描述与这种数据缺失形式相关的个体和方法学因素。

方法

本研究开发了一种新颖的机器学习算法，以描述2019 - 2021年数据收集周期中千禧队列研究的调查未完成情况，该周期包括对先前登记小组（第1 - 4组，n = 80,986）进行30至45分钟的纸质或网络后续调查，以及对新登记人员（第5组，n = 58,609）进行30至45分钟的网络基线调查。然后，我们研究了个体特征和调查属性对调查未完成情况的影响。

结果

该算法的准确率达到99%，结果显示0.29%的后续受访者和15.43%的新登记人员为调查未完成者。我们的研究结果表明，某些军事和社会人口特征（如士兵薪级）与2019 - 2021周期中调查未完成情况的增加有关。调查属性解释了调查未完成情况中很大一部分变异性，我们的分析表明，在以下部分调查未完成的可能性更高：（1）位于调查开始部分；（2）包含敏感问题；（3）问题较少。

结论

本研究强调了考虑因调查未完成导致的潜在受访者偏差的重要性，并识别了与这类缺失数据相关的因素。

相似文献

Characterizing individual and methodological risk factors for survey non-completion using machine learning: findings from the U.S. Millennium Cohort Study.

BMC Med Res Methodol. 2025 Jul 14;25(1):174. doi: 10.1186/s12874-025-02620-3.

Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.

Cochrane Database Syst Rev. 2015 Jul 27;2015(7):MR000042. doi: 10.1002/14651858.MR000042.pub2.

A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.

Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.

Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

Education support services for improving school engagement and academic performance of children and adolescents with a chronic health condition.

Cochrane Database Syst Rev. 2023 Feb 8;2(2):CD011538. doi: 10.1002/14651858.CD011538.pub2.

Regional cerebral blood flow single photon emission computed tomography for detection of Frontotemporal dementia in people with suspected dementia.

Cochrane Database Syst Rev. 2015 Jun 23;2015(6):CD010896. doi: 10.1002/14651858.CD010896.pub2.

Educational interventions for the management of cancer-related fatigue in adults.

Cochrane Database Syst Rev. 2016 Nov 24;11(11):CD008144. doi: 10.1002/14651858.CD008144.pub2.

本文引用的文献

Cohort Profile Update: The US Millennium Cohort Study-evaluating the impact of military experiences on service members and veteran health.

Int J Epidemiol. 2023 Aug 2;52(4):e222-e231. doi: 10.1093/ije/dyad088.

Importance of missingness in baseline variables: A case study of the All of Us Research Program.

PLoS One. 2023 May 18;18(5):e0285848. doi: 10.1371/journal.pone.0285848. eCollection 2023.

Understanding the Predictors of Missing Location Data to Inform Smartphone Study Design: Observational Study.

JMIR Mhealth Uhealth. 2021 Nov 16;9(11):e28857. doi: 10.2196/28857.

Data Missing Not at Random in Mobile Health Research: Assessment of the Problem and a Case for Sensitivity Analyses.

J Med Internet Res. 2021 Jun 15;23(6):e26749. doi: 10.2196/26749.

Missing data in surveys: Key concepts, approaches, and applications.

Res Social Adm Pharm. 2022 Feb;18(2):2308-2316. doi: 10.1016/j.sapharm.2021.03.009. Epub 2021 Mar 19.

Survey Item Response Rates by Survey Modality, Language, and Sociodemographic Factors in a Large U.S. Cohort.

Cancer Epidemiol Biomarkers Prev. 2020 Apr;29(4):724-730. doi: 10.1158/1055-9965.EPI-19-0757. Epub 2020 Feb 17.

Questionnaire Breakoff and Item Nonresponse in Web-Based Questionnaires: Multilevel Analysis of Person-Level and Item Design Factors in a Birth Cohort.

J Med Internet Res. 2018 Dec 7;20(12):e11046. doi: 10.2196/11046.

Response burden and questionnaire length: is shorter better? A review and meta-analysis.

Value Health. 2011 Dec;14(8):1101-8. doi: 10.1016/j.jval.2011.06.003. Epub 2011 Aug 2.

Missing data: our view of the state of the art.

Psychol Methods. 2002 Jun;7(2):147-77.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用机器学习确定调查未完成的个体和方法学风险因素：来自美国千年队列研究的结果。

Characterizing individual and methodological risk factors for survey non-completion using machine learning: findings from the U.S. Millennium Cohort Study.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献