从随机对照试验中汇总个体参与者数据：探索潜在的信息损失。

Pooling individual participant data from randomized controlled trials: Exploring potential loss of information.

机构信息

Department of Neurology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.

Department of Neurology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands.

出版信息

PLoS One. 2020 May 12;15(5):e0232970. doi: 10.1371/journal.pone.0232970. eCollection 2020.

DOI:10.1371/journal.pone.0232970

PMID:32396543

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7217432/

Abstract

BACKGROUND

Pooling individual participant data to enable pooled analyses is often complicated by diversity in variables across available datasets. Therefore, recoding original variables is often necessary to build a pooled dataset. We aimed to quantify how much information is lost in this process and to what extent this jeopardizes validity of analyses results.

METHODS

Data were derived from a platform that was developed to pool data from three randomized controlled trials on the effect of treatment of cardiovascular risk factors on cognitive decline or dementia. We quantified loss of information using the R-squared of linear regression models with pooled variables as a function of their original variable(s). In case the R-squared was below 0.8, we additionally explored the potential impact of loss of information for future analyses. We did this second step by comparing whether the Beta coefficient of the predictor differed more than 10% when adding original or recoded variables as a confounder in a linear regression model. In a simulation we randomly sampled numbers, recoded those < = 1000 to 0 and those >1000 to 1 and varied the range of the continuous variable, the ratio of recoded zeroes to recoded ones, or both, and again extracted the R-squared from linear models to quantify information loss.

RESULTS

The R-squared was below 0.8 for 8 out of 91 recoded variables. In 4 cases this had a substantial impact on the regression models, particularly when a continuous variable was recoded into a discrete variable. Our simulation showed that the least information is lost when the ratio of recoded zeroes to ones is 1:1.

CONCLUSIONS

Large, pooled datasets provide great opportunities, justifying the efforts for data harmonization. Still, caution is warranted when using recoded variables which variance is explained limitedly by their original variables as this may jeopardize the validity of study results.

摘要

背景

为了实现汇总分析，将个体参与者数据汇总通常会因可用数据集之间变量的多样性而变得复杂。因此，通常需要对原始变量进行重新编码以构建汇总数据集。我们旨在量化在此过程中丢失了多少信息，以及这在多大程度上危及分析结果的有效性。

方法

数据来自一个平台，该平台旨在汇总三项关于治疗心血管风险因素对认知能力下降或痴呆影响的随机对照试验的数据。我们通过将汇总变量的线性回归模型的 R 平方作为其原始变量的函数来量化信息丢失。如果 R 平方低于 0.8，我们还会额外探讨信息丢失对未来分析的潜在影响。我们通过比较在添加原始或重新编码变量作为线性回归模型中的混杂因素时，预测因子的 Beta 系数是否相差超过 10%来进行第二步。在模拟中，我们随机抽样数字，将小于等于 1000 的数字重新编码为 0，将大于 1000 的数字重新编码为 1，并改变连续变量的范围、重新编码的 0 与重新编码的 1 的比例，或者同时改变这两个比例，然后从线性模型中提取 R 平方以量化信息丢失。

结果

91 个重新编码变量中有 8 个的 R 平方低于 0.8。在 4 种情况下，这对回归模型产生了重大影响，尤其是当连续变量被重新编码为离散变量时。我们的模拟表明，当重新编码的 0 与 1 的比例为 1:1 时，信息丢失最少。

结论

大型汇总数据集提供了巨大的机会，证明了数据协调的努力是合理的。但是，当使用原始变量解释其方差有限的重新编码变量时，需要谨慎，因为这可能会危及研究结果的有效性。

相似文献

Pooling individual participant data from randomized controlled trials: Exploring potential loss of information.从随机对照试验中汇总个体参与者数据：探索潜在的信息损失。

PLoS One. 2020 May 12;15(5):e0232970. doi: 10.1371/journal.pone.0232970. eCollection 2020.

Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives.随机对照试验中的亚组分析：量化假阳性和假阴性风险

Health Technol Assess. 2001;5(33):1-56. doi: 10.3310/hta5330.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Prediction of an outcome using trajectories estimated from a linear mixed model.使用从线性混合模型估计的轨迹预测结果。

J Biopharm Stat. 2009 Sep;19(5):779-90. doi: 10.1080/10543400903105174.

Regression Discontinuity Design: Simulation and Application in Two Cardiovascular Trials with Continuous Outcomes.回归断点设计：连续结果的两项心血管试验中的模拟与应用

Epidemiology. 2016 Jul;27(4):503-11. doi: 10.1097/EDE.0000000000000486.

Different Mortality Time Points in Critical Care Trials: Current Practice and Influence on Effect Estimates in Meta-Analyses.重症监护试验中不同的死亡时间点：当前实践及对荟萃分析中效应估计的影响。

Crit Care Med. 2016 Aug;44(8):e737-41. doi: 10.1097/CCM.0000000000001631.

Comparison of nuisance parameters in pediatric versus adult randomized trials: a meta-epidemiologic empirical evaluation.比较儿科与成人随机试验中的不良事件参数：一项荟萃流行病学实证评估。

BMC Med Res Methodol. 2018 Jan 10;18(1):7. doi: 10.1186/s12874-017-0456-8.

Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis.类别协变量在多重插补后逻辑回归模型中的显著性检验方法：功效和适用性分析。

BMC Med Res Methodol. 2017 Aug 22;17(1):129. doi: 10.1186/s12874-017-0404-7.

引用本文的文献

Alzheimer disease seen through the lens of sex and gender.从性别视角看阿尔茨海默病。

Nat Rev Neurol. 2025 May;21(5):235-249. doi: 10.1038/s41582-025-01071-0. Epub 2025 Apr 14.

Intraoperative use of the machine learning-derived nociception level monitor results in less pain in the first 90 min after surgery.术中使用机器学习衍生的伤害感受水平监测仪可减少术后前90分钟的疼痛。

Front Pain Res (Lausanne). 2023 Jan 9;3:1086862. doi: 10.3389/fpain.2022.1086862. eCollection 2022.

本文引用的文献

Comparison between continuous and discrete doses for model based designs in cancer dose finding.基于模型的癌症剂量发现中连续与离散剂量比较。

PLoS One. 2019 Jan 9;14(1):e0210139. doi: 10.1371/journal.pone.0210139. eCollection 2019.

Improving data sharing in research with context-free encoded missing data.通过无上下文编码缺失数据改进研究中的数据共享。

PLoS One. 2017 Sep 12;12(9):e0182362. doi: 10.1371/journal.pone.0182362. eCollection 2017.

Effect of long-term omega 3 polyunsaturated fatty acid supplementation with or without multidomain intervention on cognitive function in elderly adults with memory complaints (MAPT): a randomised, placebo-controlled trial.长期补充欧米伽 3 多不饱和脂肪酸联合或不联合多领域干预对有记忆主诉的老年人认知功能的影响（MAPT）：一项随机、安慰剂对照试验。

Lancet Neurol. 2017 May;16(5):377-389. doi: 10.1016/S1474-4422(17)30040-6. Epub 2017 Mar 27.

Effectiveness of a 6-year multidomain vascular care intervention to prevent dementia (preDIVA): a cluster-randomised controlled trial.6 年多领域血管护理干预预防痴呆的效果（preDIVA）：一项群组随机对照试验。

Lancet. 2016 Aug 20;388(10046):797-805. doi: 10.1016/S0140-6736(16)30950-3. Epub 2016 Jul 26.

Maelstrom Research guidelines for rigorous retrospective data harmonization.大漩涡研究严格回顾性数据协调指南。

Int J Epidemiol. 2017 Feb 1;46(1):103-105. doi: 10.1093/ije/dyw075.

ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data.ViPAR：一个用于研究数据虚拟合并与分析的软件平台。

Int J Epidemiol. 2016 Apr;45(2):408-416. doi: 10.1093/ije/dyv193. Epub 2015 Oct 8.

A 2 year multidomain intervention of diet, exercise, cognitive training, and vascular risk monitoring versus control to prevent cognitive decline in at-risk elderly people (FINGER): a randomised controlled trial.一项针对高危老年人的饮食、运动、认知训练和血管风险监测的 2 年多领域干预措施，以预防认知能力下降（FINGER）：一项随机对照试验。

Lancet. 2015 Jun 6;385(9984):2255-63. doi: 10.1016/S0140-6736(15)60461-5. Epub 2015 Mar 12.

DataSHIELD: taking the analysis to the data, not the data to the analysis.数据护盾：将分析带到数据那里，而不是把数据带到分析这边。

Int J Epidemiol. 2014 Dec;43(6):1929-44. doi: 10.1093/ije/dyu188. Epub 2014 Sep 26.

Is a cutoff of 10% appropriate for the change-in-estimate criterion of confounder identification?对于混杂因素识别的估计量变化准则，截断值为 10% 是否合适？

J Epidemiol. 2014;24(2):161-7. doi: 10.2188/jea.je20130062. Epub 2013 Dec 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从随机对照试验中汇总个体参与者数据：探索潜在的信息损失。

Pooling individual participant data from randomized controlled trials: Exploring potential loss of information.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献