• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

模拟研究评估在线性回归中,当 Plasmode 模拟在估计最小二乘估计器的均方误差方面优于参数模拟时的情况。

Simulation study to evaluate when Plasmode simulation is superior to parametric simulation in estimating the mean squared error of the least squares estimator in linear regression.

机构信息

Department of Statistics, TU Dortmund University, Dortmund, North Rhine-Westphalia, Germany.

Division of Biostatistics, German Cancer Research Center, Heidelberg, Baden-Wuerttemberg, Germany.

出版信息

PLoS One. 2024 May 15;19(5):e0299989. doi: 10.1371/journal.pone.0299989. eCollection 2024.

DOI:10.1371/journal.pone.0299989
PMID:38748677
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11095703/
Abstract

Simulation is a crucial tool for the evaluation and comparison of statistical methods. How to design fair and neutral simulation studies is therefore of great interest for both researchers developing new methods and practitioners confronted with the choice of the most suitable method. The term simulation usually refers to parametric simulation, that is, computer experiments using artificial data made up of pseudo-random numbers. Plasmode simulation, that is, computer experiments using the combination of resampling feature data from a real-life dataset and generating the target variable with a known user-selected outcome-generating model, is an alternative that is often claimed to produce more realistic data. We compare parametric and Plasmode simulation for the example of estimating the mean squared error (MSE) of the least squares estimator (LSE) in linear regression. If the true underlying data-generating process (DGP) and the outcome-generating model (OGM) were known, parametric simulation would obviously be the best choice in terms of estimating the MSE well. However, in reality, both are usually unknown, so researchers have to make assumptions: in Plasmode simulation studies for the OGM, in parametric simulation for both DGP and OGM. Most likely, these assumptions do not exactly reflect the truth. Here, we aim to find out how assumptions deviating from the true DGP and the true OGM affect the performance of parametric and Plasmode simulations in the context of MSE estimation for the LSE and in which situations which simulation type is preferable. Our results suggest that the preferable simulation method depends on many factors, including the number of features, and on how and to what extent the assumptions of a parametric simulation differ from the true DGP. Also, the resampling strategy used for Plasmode influences the results. In particular, subsampling with a small sampling proportion can be recommended.

摘要

模拟是评估和比较统计方法的重要工具。因此,对于开发新方法的研究人员和面临选择最合适方法的从业者来说,如何设计公平和中立的模拟研究具有重要意义。术语“模拟”通常是指参数模拟,即使用由伪随机数组成的人工数据进行计算机实验。Plasmode 模拟,即使用从实际数据集重新采样特征数据的组合并使用已知用户选择的生成模型生成目标变量的计算机实验,是一种经常声称可以产生更真实数据的替代方法。我们将参数模拟和 Plasmode 模拟进行比较,以估计线性回归中最小二乘估计器(LSE)的均方误差(MSE)为例。如果真实的潜在数据生成过程(DGP)和生成模型(OGM)已知,那么从估计 MSE 的角度来看,参数模拟显然是最佳选择。然而,在现实中,这两者通常是未知的,因此研究人员必须做出假设:在 OGM 的 Plasmode 模拟研究中,在 DGP 和 OGM 的参数模拟中。很可能,这些假设并不完全反映事实。在这里,我们旨在找出与真实 DGP 和真实 OGM 的假设偏离如何影响 LSE 的 MSE 估计的参数模拟和 Plasmode 模拟的性能,以及在哪些情况下哪种模拟类型更可取。我们的结果表明,首选的模拟方法取决于许多因素,包括特征的数量,以及参数模拟的假设与真实 DGP 的差异程度。此外,Plasmode 中使用的重采样策略也会影响结果。特别是,可以推荐使用小采样比例的子采样。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/10530c5a496c/pone.0299989.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/b52920c56b10/pone.0299989.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/da5457b70414/pone.0299989.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/5e334af90b1d/pone.0299989.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/119938967401/pone.0299989.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/2dc4ce77b96f/pone.0299989.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/4b1d3a108b99/pone.0299989.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/1e708775c6a9/pone.0299989.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/b09ab2845a1a/pone.0299989.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/f27572429eb0/pone.0299989.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/4d6bf1b6f5a0/pone.0299989.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/a787c24c4879/pone.0299989.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/e429caadfb5a/pone.0299989.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/b402b460dbe7/pone.0299989.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/10530c5a496c/pone.0299989.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/b52920c56b10/pone.0299989.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/da5457b70414/pone.0299989.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/5e334af90b1d/pone.0299989.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/119938967401/pone.0299989.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/2dc4ce77b96f/pone.0299989.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/4b1d3a108b99/pone.0299989.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/1e708775c6a9/pone.0299989.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/b09ab2845a1a/pone.0299989.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/f27572429eb0/pone.0299989.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/4d6bf1b6f5a0/pone.0299989.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/a787c24c4879/pone.0299989.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/e429caadfb5a/pone.0299989.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/b402b460dbe7/pone.0299989.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/033b/11095703/10530c5a496c/pone.0299989.g014.jpg

相似文献

1
Simulation study to evaluate when Plasmode simulation is superior to parametric simulation in estimating the mean squared error of the least squares estimator in linear regression.模拟研究评估在线性回归中,当 Plasmode 模拟在估计最小二乘估计器的均方误差方面优于参数模拟时的情况。
PLoS One. 2024 May 15;19(5):e0299989. doi: 10.1371/journal.pone.0299989. eCollection 2024.
2
Longitudinal plasmode algorithms to evaluate statistical methods in realistic scenarios: an illustration applied to occupational epidemiology.纵向血浆算法在现实场景中评估统计方法:应用于职业流行病学的实例说明。
BMC Med Res Methodol. 2023 Oct 18;23(1):242. doi: 10.1186/s12874-023-02062-9.
3
Statistical plasmode simulations-Potentials, challenges and recommendations.统计等离子体模拟——潜力、挑战和建议。
Stat Med. 2024 Apr 30;43(9):1804-1825. doi: 10.1002/sim.10012. Epub 2024 Feb 14.
4
Segmented regression with errors in predictors: semi-parametric and parametric methods.预测变量存在误差的分段回归:半参数和参数方法
Stat Med. 1997;16(1-3):169-88. doi: 10.1002/(sici)1097-0258(19970130)16:2<169::aid-sim478>3.0.co;2-m.
5
Evaluating statistical analysis models for RNA sequencing experiments.评估 RNA 测序实验的统计分析模型。
Front Genet. 2013 Sep 17;4:178. doi: 10.3389/fgene.2013.00178. eCollection 2013.
6
Generation of parametric image of regional myocardial blood flow using H(2)(15)O dynamic PET and a linear least-squares method.使用H(2)(15)O动态正电子发射断层扫描和线性最小二乘法生成局部心肌血流参数图像。
J Nucl Med. 2005 Oct;46(10):1687-95.
7
Unbiased K-L estimator for the linear regression model.无偏 K-L 估计量在线性回归模型中的应用。
F1000Res. 2021 Aug 19;10:832. doi: 10.12688/f1000research.54990.1. eCollection 2021.
8
9
A wide range of missing imputation approaches in longitudinal data: a simulation study and real data analysis.多种缺失值插补方法在纵向数据分析中的应用:一项模拟研究与真实数据分析。
BMC Med Res Methodol. 2023 Jul 6;23(1):161. doi: 10.1186/s12874-023-01968-8.
10
A more reliable species richness estimator based on the Gamma-Poisson model.基于伽马-泊松模型的更可靠物种丰富度估计器。
PeerJ. 2023 Jan 6;11:e14540. doi: 10.7717/peerj.14540. eCollection 2023.

引用本文的文献

1
Simulation study to evaluate when Plasmode simulation is superior to parametric simulation in comparing classification methods on high-dimensional data.在高维数据上比较分类方法时,评估Plasmode模拟何时优于参数模拟的模拟研究。
PLoS One. 2025 Jun 2;20(6):e0322887. doi: 10.1371/journal.pone.0322887. eCollection 2025.
2
Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting.心理学方法学研究的模拟研究:规划、预注册和报告的标准化模板
Psychol Methods. 2024 Nov 14. doi: 10.1037/met0000695.

本文引用的文献

1
Statistical plasmode simulations-Potentials, challenges and recommendations.统计等离子体模拟——潜力、挑战和建议。
Stat Med. 2024 Apr 30;43(9):1804-1825. doi: 10.1002/sim.10012. Epub 2024 Feb 14.
2
Introduction to statistical simulations in health research.健康研究中的统计模拟简介。
BMJ Open. 2020 Dec 13;10(12):e039921. doi: 10.1136/bmjopen-2020-039921.
3
Using simulation studies to evaluate statistical methods.运用模拟研究评估统计方法。
Stat Med. 2019 May 20;38(11):2074-2102. doi: 10.1002/sim.8086. Epub 2019 Jan 16.
4
Population models and simulation methods: The case of the Spearman rank correlation.人口模型与模拟方法:以斯皮尔曼等级相关为例。
Br J Math Stat Psychol. 2017 Nov;70(3):347-367. doi: 10.1111/bmsp.12085. Epub 2017 Jan 31.
5
Subsampling versus bootstrapping in resampling-based model selection for multivariable regression.基于重采样的多变量回归模型选择中的子采样与自助法
Biometrics. 2016 Mar;72(1):272-80. doi: 10.1111/biom.12381. Epub 2015 Aug 19.
6
A plea for neutral comparison studies in computational sciences.呼吁在计算科学中进行中立的对比研究。
PLoS One. 2013 Apr 24;8(4):e61562. doi: 10.1371/journal.pone.0061562. Print 2013.