• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

电子健康研究中的缺失数据处理方法:模拟研究及面向非数学专业研究人员的教程

Missing data approaches in eHealth research: simulation study and a tutorial for nonmathematically inclined researchers.

作者信息

Blankers Matthijs, Koeter Maarten W J, Schippers Gerard M

机构信息

Arkin Academy, Amsterdam, The Netherlands.

出版信息

J Med Internet Res. 2010 Dec 19;12(5):e54. doi: 10.2196/jmir.1448.

DOI:10.2196/jmir.1448
PMID:21169167
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3057309/
Abstract

BACKGROUND

Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings.

OBJECTIVE

In this paper several statistical approaches to data "missingness" are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed.

METHODS

The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study.

RESULTS

In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen's d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%).

CONCLUSIONS

The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.

摘要

背景

缺失数据是电子健康研究中常见的麻烦事:难以预防,且可能使研究结果无效。

目的

本文在一项模拟研究中讨论并测试了几种处理数据“缺失”的统计方法。该分析纳入了基本方法(完全病例分析、均值插补和末次观察结转)和先进方法(期望最大化、回归插补和多重插补),并讨论了其优缺点。

方法

用于模拟的数据集来自一项前瞻性队列研究,该研究跟踪参与针对问题饮酒者的在线自助项目的参与者。它包含124个非正态分布的终点指标,即研究对象的每日饮酒量。在50%的病例中,对一个选定变量人为制造随机缺失(MAR)。通过进行自抽样模拟研究,计算使用不同插补方法获得的估计值的有效性、可靠性和覆盖范围。

结果

在进行的模拟研究中,使用多重插补技术得出了准确的结果。在4个测试的多重插补程序(NORM、MICE、Amelia II和SPSS MI)之间发现了差异。在测试的方法中,Amelia II表现优于其他方法,与参考值的偏差最小(科恩d值 = 0.06),参考置信区间的覆盖百分比最大(96%)。

结论

在分析存在缺失观察值的数据集时,使用多重插补可提高结果的有效性。一些常用方法(末次观察结转、完全病例分析)表现不佳,因此,我们建议不要使用这些方法。在一些广泛使用的统计软件的最新版本中,可以看到对多重插补数据集分析的支持不断增加,这使得数学能力稍弱的研究人员更容易使用多重插补。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7590/3057309/2ca81b3aa675/jmir_v12i5e54_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7590/3057309/789825768368/jmir_v12i5e54_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7590/3057309/2ca81b3aa675/jmir_v12i5e54_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7590/3057309/789825768368/jmir_v12i5e54_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7590/3057309/2ca81b3aa675/jmir_v12i5e54_fig2.jpg

相似文献

1
Missing data approaches in eHealth research: simulation study and a tutorial for nonmathematically inclined researchers.电子健康研究中的缺失数据处理方法:模拟研究及面向非数学专业研究人员的教程
J Med Internet Res. 2010 Dec 19;12(5):e54. doi: 10.2196/jmir.1448.
2
Outcome-sensitive multiple imputation: a simulation study.结果敏感多重填补:一项模拟研究。
BMC Med Res Methodol. 2017 Jan 9;17(1):2. doi: 10.1186/s12874-016-0281-5.
3
Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.仅使用辅助预测缺失变量的多重插补可能会因数据缺失而增加偏差。
BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9.
4
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.预后建模研究中缺失协变量数据处理技术的比较:一项模拟研究。
BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7.
5
Dealing with missing delirium assessments in prospective clinical studies of the critically ill: a simulation study and reanalysis of two delirium studies.处理危重症患者前瞻性临床研究中缺失的谵妄评估:一项模拟研究和两项谵妄研究的重新分析。
BMC Med Res Methodol. 2021 May 6;21(1):97. doi: 10.1186/s12874-021-01274-1.
6
Handling of Missing Outcome Data in Acute Stroke Trials: Advantages of Multiple Imputation Using Baseline and Postbaseline Variables.急性中风试验中缺失结局数据的处理:使用基线和基线后变量进行多重填补的优势
J Stroke Cerebrovasc Dis. 2018 Dec;27(12):3662-3669. doi: 10.1016/j.jstrokecerebrovasdis.2018.08.040. Epub 2018 Oct 6.
7
Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study.缺失协变量数据处理的填补方法在 Cox 比例风险模型拟合中的比较:重抽样研究。
BMC Med Res Methodol. 2010 Dec 31;10:112. doi: 10.1186/1471-2288-10-112.
8
Multiple imputation for non-response when estimating HIV prevalence using survey data.使用调查数据估计艾滋病毒流行率时对无应答情况的多重填补法
BMC Public Health. 2015 Oct 16;15:1059. doi: 10.1186/s12889-015-2390-1.
9
A wide range of missing imputation approaches in longitudinal data: a simulation study and real data analysis.多种缺失值插补方法在纵向数据分析中的应用:一项模拟研究与真实数据分析。
BMC Med Res Methodol. 2023 Jul 6;23(1):161. doi: 10.1186/s12874-023-01968-8.
10
Treatment of missing values with imputation for the analysis of otologic data.采用插补法处理缺失值以分析耳科数据。
Stud Health Technol Inform. 1999;68:428-31.

引用本文的文献

1
Timing of high-definition transcranial direct current stimulation to the nondominant primary motor cortex fails to modulate cortical hemodynamic activity and improve motor sequence learning.对非优势初级运动皮层进行高清经颅直流电刺激的时机未能调节皮层血流动力学活动并改善运动序列学习。
J Neuroeng Rehabil. 2025 Jan 31;22(1):17. doi: 10.1186/s12984-025-01546-7.
2
Harnessing Generalizable Real-World Ophthalmic Big Data: Descriptive Analysis of the Bodhya Eye Consortium Model for Collaborative Research.利用可推广的真实世界眼科大数据:菩提亚眼联合研究模型的描述性分析以开展合作研究
Online J Public Health Inform. 2024 Sep 30;16:e53370. doi: 10.2196/53370.
3

本文引用的文献

1
Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective.多元缺失数据问题的多重填补:数据分析师视角
Multivariate Behav Res. 1998 Oct 1;33(4):545-71. doi: 10.1207/s15327906mbr3304_5.
2
Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines.结合多次插补后预后建模研究兴趣的估计:当前实践和指南。
BMC Med Res Methodol. 2009 Jul 28;9:57. doi: 10.1186/1471-2288-9-57.
3
Adherence in internet interventions for anxiety and depression.焦虑和抑郁网络干预中的依从性。
Characteristics of central cortex and upper-limb flexors synchrony oxygenation during grasping in people with stroke: a controlled trial study protocol.
中风患者抓握过程中中央皮质与上肢屈肌同步氧合的特征:一项对照试验研究方案
Front Hum Neurosci. 2024 Aug 29;18:1409148. doi: 10.3389/fnhum.2024.1409148. eCollection 2024.
4
Consequences of Data Loss on Clinical Decision-Making in Continuous Glucose Monitoring: Retrospective Cohort Study.持续葡萄糖监测中数据丢失对临床决策的影响:回顾性队列研究
Interact J Med Res. 2024 Jul 31;13:e50849. doi: 10.2196/50849.
5
Digital Remote Monitoring Using an mHealth Solution for Survivors of Cancer: Protocol for a Pilot Observational Study.使用移动医疗解决方案对癌症幸存者进行数字远程监测:一项试点观察性研究方案。
JMIR Res Protoc. 2024 Apr 30;13:e52957. doi: 10.2196/52957.
6
Family-Based WhatsApp Intervention to Promote Healthy Eating Behaviors Among Amazonian School Children: Protocol for a Randomized Controlled Trial.基于家庭的WhatsApp干预措施促进亚马逊地区学童健康饮食行为:一项随机对照试验方案
JMIR Res Protoc. 2024 Feb 19;13:e54446. doi: 10.2196/54446.
7
Economic evaluation of preventive cognitive therapy versus care as usual in cognitive behavioral therapy responders.预防性认知疗法与常规护理对认知行为疗法应答者的经济学评估。
Front Psychiatry. 2024 Jan 10;14:1134071. doi: 10.3389/fpsyt.2023.1134071. eCollection 2023.
8
Cost-utility analysis of molnupiravir for high-risk, community-based adults with COVID-19: an economic evaluation of the PANORAMIC trial.Molnupiravir 治疗 COVID-19 高风险社区成年人的成本效用分析:PANORAMIC 试验的经济学评价。
Br J Gen Pract. 2024 Jul 25;74(745):e570-e579. doi: 10.3399/BJGP.2023.0444. Print 2024 Aug.
9
Informed Random Forest to Model Associations of Epidemiological Priors, Government Policies, and Public Mobility.用于对流行病学先验知识、政府政策和公众出行之间的关联进行建模的知情随机森林。
MDM Policy Pract. 2023 Dec 26;8(2):23814683231218716. doi: 10.1177/23814683231218716. eCollection 2023 Jul-Dec.
10
High Dietary Phosphorus Is Associated with Increased Breast Cancer Risk in a U.S. Cohort of Middle-Aged Women.高膳食磷与美国中年女性队列中乳腺癌风险增加相关。
Nutrients. 2023 Aug 25;15(17):3735. doi: 10.3390/nu15173735.
J Med Internet Res. 2009 Apr 24;11(2):e13. doi: 10.2196/jmir.1194.
4
Evaluating real-time internet therapy and online self-help for problematic alcohol consumers: a three-arm RCT protocol.评估针对有问题饮酒者的实时互联网治疗和在线自助:一项三臂随机对照试验方案。
BMC Public Health. 2009 Jan 14;9:16. doi: 10.1186/1471-2458-9-16.
5
Missing data analysis: making it work in the real world.缺失数据分析:使其在现实世界中发挥作用。
Annu Rev Psychol. 2009;60:549-76. doi: 10.1146/annurev.psych.58.110405.085530.
6
Missing data and the trouble with LOCF.缺失数据与末次观察结转的问题。
Evid Based Ment Health. 2008 Feb;11(1):3-5. doi: 10.1136/ebmh.11.1.3-a.
7
A cautionary note regarding count models of alcohol consumption in randomized controlled trials.关于随机对照试验中酒精消费计数模型的警示说明。
BMC Med Res Methodol. 2007 Feb 15;7:9. doi: 10.1186/1471-2288-7-9.
8
A primer on the use of modern missing-data methods in psychosomatic medicine research.心身医学研究中现代缺失数据方法的使用入门。
Psychosom Med. 2006 May-Jun;68(3):427-36. doi: 10.1097/01.psy.0000221275.75056.d8.
9
Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals.缺失的结局数据是否得到了充分处理?对主要医学期刊上发表的随机对照试验的综述。
Clin Trials. 2004;1(4):368-76. doi: 10.1191/1740774504cn032oa.
10
The law of attrition.磨损定律。
J Med Internet Res. 2005 Mar 31;7(1):e11. doi: 10.2196/jmir.7.1.e11.