Xiong Chengjie, Lu Ruijin, Wolk David, Shaw Leslie M, Gleason Carey E, Johnson Sterling C, Agboola Folasade, Schindler Suzanne E, Morris John C, Luo Jingqin
Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri, USA.
Knight Alzheimer Disease Research Center, Washington University School of Medicine, St. Louis, Missouri, USA.
Stat Med. 2025 Jul;44(15-17):e70200. doi: 10.1002/sim.70200.
A major challenge in biomedical research is that large sample sizes are necessary for sufficient statistical power to detect subtle but potentially important associations between biomarkers and clinical outcomes. Large sample sizes can be achieved by combining biomarker data from multiple studies, but because fluid biomarker platforms and imaging protocols often vary across studies, data from different studies must be bridged or harmonized. We conceptualize that, for a biomarker measured by different studies, a true and latent biomarker exists and underlies the different versions of the observed biomarker through a measurement error model. We then examine the true biological correlation of the latent biomarker with a standard clinical outcome by leveraging biomarker values from a subset of "bridging" samples or scans across studies. Because the true biological correlation with the clinical outcome is related to the correlations of the observed versions of the biomarker with the same clinical outcome and the intraclass correlation coefficient (ICC) of the biomarker across studies, we propose a general linear mixed effects model to estimate the true biological correlation by integrating these correlations estimated across the studies and the bridging cohorts. Our proposed model accounts for study heterogeneity through a random effect and allows both study-specific and the test-retest biomarker data in a joint model to estimate and infer on the true biological correlation. We apply the model to a real world multi-center biomarker study in Alzheimer disease to correlate concentrations of cerebrospinal fluid biomarkers with a standard functional and cognitive outcome. Our simulations and real world applications indicate that the proposed meta-analytic model leads to a bias of no more than 0.03 in the estimated biological correlation of a biomarker with a clinical outcome, even with small to mediocre ICC. When the ICC is large, only 10% of bridging samples may be needed to obtain unbiased estimates to the correlation with close to the nominal level of coverage from the proposed 95% CI estimates. Our proposed methodologies hence provide a novel approach to harmonize retrospectively obtained biomarker data across studies, offer guidance on size of the bridging samples when ICCs are known, and may also be used in a single study to account for batch effects.
生物医学研究中的一个主要挑战是,需要大样本量才能具备足够的统计效力,以检测生物标志物与临床结局之间细微但可能重要的关联。通过合并来自多项研究的生物标志物数据可以实现大样本量,但由于不同研究中的生物流体标志物平台和成像方案往往存在差异,因此必须对来自不同研究的数据进行衔接或统一。我们设想,对于不同研究测量的生物标志物,存在一个真实的潜在生物标志物,它通过测量误差模型构成观察到的生物标志物不同版本的基础。然后,我们通过利用来自跨研究的“衔接”样本或扫描子集的生物标志物值,来检验潜在生物标志物与标准临床结局之间的真实生物学相关性。由于与临床结局的真实生物学相关性与生物标志物观察版本与相同临床结局的相关性以及跨研究的生物标志物组内相关系数(ICC)有关,我们提出了一个通用线性混合效应模型,通过整合跨研究和衔接队列估计的这些相关性来估计真实生物学相关性。我们提出的模型通过随机效应考虑研究异质性,并允许在联合模型中纳入特定研究和重测生物标志物数据,以估计和推断真实生物学相关性。我们将该模型应用于一项关于阿尔茨海默病的真实世界多中心生物标志物研究,以关联脑脊液生物标志物浓度与标准功能和认知结局。我们的模拟和实际应用表明,即使ICC较小到中等,所提出的荟萃分析模型在估计生物标志物与临床结局的生物学相关性时,偏差也不超过0.03。当ICC较大时,可能只需要10%的衔接样本就能获得与所提出的95%置信区间估计接近名义覆盖水平的无偏相关性估计。因此,我们提出的方法提供了一种新颖的方法来统一跨研究回顾性获得的生物标志物数据,在已知ICC时为衔接样本的大小提供指导,并且也可用于单个研究以考虑批次效应。