• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

配对设计中使用插补方法进行推断的警示故事。

A cautionary tale on using imputation methods for inference in matched-pairs design.

机构信息

Faculty of Statistics, Institute of Mathematical Statistics and Applications in Industry, Technical University of Dortmund, Dortmund 44227, Germany.

出版信息

Bioinformatics. 2020 May 1;36(10):3099-3106. doi: 10.1093/bioinformatics/btaa082.

DOI:10.1093/bioinformatics/btaa082
PMID:32049320
Abstract

MOTIVATION

Imputation procedures in biomedical fields have turned into statistical practice, since further analyses can be conducted ignoring the former presence of missing values. In particular, non-parametric imputation schemes like the random forest have shown favorable imputation performance compared to the more traditionally used MICE procedure. However, their effect on valid statistical inference has not been analyzed so far. This article closes this gap by investigating their validity for inferring mean differences in incompletely observed pairs while opposing them to a recent approach that only works with the given observations at hand.

RESULTS

Our findings indicate that machine-learning schemes for (multiply) imputing missing values may inflate type I error or result in comparably low power in small-to-moderate matched pairs, even after modifying the test statistics using Rubin's multiple imputation rule. In addition to an extensive simulation study, an illustrative data example from a breast cancer gene study has been considered.

AVAILABILITY AND IMPLEMENTATION

The corresponding R-code can be accessed through the authors and the gene expression data can be downloaded at www.gdac.broadinstitute.org.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在生物医学领域,插补程序已经成为统计实践,因为可以忽略先前存在的缺失值进行进一步分析。特别是,与更传统使用的 MICE 程序相比,非参数插补方案(如随机森林)已显示出有利的插补性能。然而,它们对有效统计推断的影响尚未得到分析。本文通过调查它们在推断不完全观察对的均值差异时的有效性来填补这一空白,同时反对仅使用手头现有观测值的最近方法。

结果

我们的研究结果表明,对于(多重)插补缺失值的机器学习方案,即使使用 Rubin 的多重插补规则修改了检验统计量,也可能会导致Ⅰ型错误膨胀或在小到中等匹配对中产生可比低功效,甚至在修改了检验统计量后也是如此。除了广泛的模拟研究外,还考虑了一个来自乳腺癌基因研究的说明性数据示例。

可用性和实现

相应的 R 代码可以通过作者获得,基因表达数据可以在 www.gdac.broadinstitute.org 下载。

补充信息

补充资料可在 Bioinformatics 在线获取。

相似文献

1
A cautionary tale on using imputation methods for inference in matched-pairs design.配对设计中使用插补方法进行推断的警示故事。
Bioinformatics. 2020 May 1;36(10):3099-3106. doi: 10.1093/bioinformatics/btaa082.
2
Multiple imputation for missing values through conditional Semiparametric odds ratio models.通过条件半参数比值比模型对缺失值进行多重填补。
Biometrics. 2011 Sep;67(3):799-809. doi: 10.1111/j.1541-0420.2010.01538.x. Epub 2011 Jan 6.
3
On the multiple imputation variance estimator for control-based and delta-adjusted pattern mixture models.关于基于控制和增量调整模式混合模型的多重填补方差估计器
Biometrics. 2017 Dec;73(4):1379-1387. doi: 10.1111/biom.12702. Epub 2017 Apr 13.
4
Comparison of imputation variance estimators.插补方差估计量的比较。
Stat Methods Med Res. 2016 Dec;25(6):2541-2557. doi: 10.1177/0962280214526216. Epub 2014 Mar 28.
5
Bootstrap inference for multiple imputation under uncongeniality and misspecification.在不相容性和错误设定下多重填补的自助法推断
Stat Methods Med Res. 2020 Dec;29(12):3533-3546. doi: 10.1177/0962280220932189. Epub 2020 Jun 30.
6
The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model.基于最优机器学习的 Cox 比例风险模型缺失数据插补。
Front Public Health. 2021 Jul 5;9:680054. doi: 10.3389/fpubh.2021.680054. eCollection 2021.
7
Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值:一项模拟研究。
BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.
8
Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis.类别协变量在多重插补后逻辑回归模型中的显著性检验方法:功效和适用性分析。
BMC Med Res Methodol. 2017 Aug 22;17(1):129. doi: 10.1186/s12874-017-0404-7.
9
DNA microarray data imputation and significance analysis of differential expression.DNA微阵列数据插补与差异表达的显著性分析
Bioinformatics. 2005 Nov 15;21(22):4155-61. doi: 10.1093/bioinformatics/bti638. Epub 2005 Aug 23.
10
The multiple imputation method: a case study involving secondary data analysis.多重填补法:一项涉及二次数据分析的案例研究。
Nurse Res. 2015 May;22(5):13-9. doi: 10.7748/nr.22.5.13.e1319.

引用本文的文献

1
Effect of eplerenone in acute heart failure using a win ratio approach.依普利酮在急性心力衰竭中的疗效:采用获胜率方法评估
Clin Res Cardiol. 2024 Nov 20. doi: 10.1007/s00392-024-02578-0.
2
Analyzing the Effect of Imputation on Classification Performance under MCAR and MAR Missing Mechanisms.分析在完全随机缺失(MCAR)和随机缺失(MAR)缺失机制下插补对分类性能的影响。
Entropy (Basel). 2023 Mar 17;25(3):521. doi: 10.3390/e25030521.
3
Estimating Gaussian Copulas with Missing Data with and without Expert Knowledge.在有和没有专家知识的情况下对存在缺失数据的高斯Copula进行估计。
Entropy (Basel). 2022 Dec 19;24(12):1849. doi: 10.3390/e24121849.
4
Differential Effect of Vaginal Microbiota on Spontaneous Preterm Birth among Chinese Pregnant Women.中国孕妇阴道微生物群对自发性早产的差异影响。
Biomed Res Int. 2022 Dec 1;2022:3536108. doi: 10.1155/2022/3536108. eCollection 2022.
5
On the Relation between Prediction and Imputation Accuracy under Missing Covariates.缺失协变量情况下预测与插补准确性之间的关系
Entropy (Basel). 2022 Mar 9;24(3):386. doi: 10.3390/e24030386.
6
Ranking procedures for repeated measures designs with missing data: Estimation, testing and asymptotic theory.重复测量设计中缺失数据的排名程序:估计、检验和渐近理论。
Stat Methods Med Res. 2022 Jan;31(1):105-118. doi: 10.1177/09622802211046389. Epub 2021 Nov 29.