Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, 21201, United States.
Department of Neurosurgery, University of Maryland School of Medicine, Baltimore, 21201, United States.
Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae090.
Recent years have witnessed a rise in the popularity of information integration without sharing of raw data. By leveraging and incorporating summary information from external sources, internal studies can achieve enhanced estimation efficiency and prediction accuracy. However, a noteworthy challenge in utilizing summary-level information is accommodating the inherent heterogeneity across diverse data sources. In this study, we delve into the issue of prior probability shift between two cohorts, wherein the difference of two data distributions depends on the outcome. We introduce a novel semi-parametric constrained optimization-based approach to integrate information within this framework, which has not been extensively explored in existing literature. Our proposed method tackles the prior probability shift by introducing the outcome-dependent selection function and effectively addresses the estimation uncertainty associated with summary information from the external source. Our approach facilitates valid inference even in the absence of a known variance-covariance estimate from the external source. Through extensive simulation studies, we observe the superiority of our method over existing ones, showcasing minimal estimation bias and reduced variance for both binary and continuous outcomes. We further demonstrate the utility of our method through its application in investigating risk factors related to essential hypertension, where the reduced estimation variability is observed after integrating summary information from an external data.
近年来,信息整合而不共享原始数据的做法越来越流行。通过利用和整合来自外部来源的汇总信息,内部研究可以提高估计效率和预测准确性。然而,利用汇总信息面临的一个挑战是如何适应来自不同数据源的固有异质性。在这项研究中,我们深入研究了两个队列之间的先验概率转移问题,其中两个数据分布的差异取决于结果。我们引入了一种新的半参数约束优化方法来解决这个框架内的信息整合问题,这在现有文献中还没有得到广泛探讨。我们的方法通过引入与结果相关的选择函数来解决先验概率转移问题,并有效地解决了来自外部源的汇总信息的估计不确定性。即使外部源没有已知的方差-协方差估计,我们的方法也能进行有效的推断。通过广泛的模拟研究,我们观察到我们的方法优于现有方法,在二进制和连续结果下,最小化了估计偏差和方差。我们还通过应用于研究与原发性高血压相关的风险因素来展示我们方法的实用性,在整合来自外部数据的汇总信息后,观察到了估计变异性的降低。