Huang Chiung-Yu, Qin Jing, Tsai Huei-Ting
Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, Maryland 21205.
Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892.
J Am Stat Assoc. 2016;111(514):787-799. doi: 10.1080/01621459.2015.1044090. Epub 2016 Aug 18.
With the rapidly increasing availability of data in the public domain, combining information from different sources to infer about associations or differences of interest has become an emerging challenge to researchers. This paper presents a novel approach to improve efficiency in estimating the survival time distribution by synthesizing information from the individual-level data with -year survival probabilities from external sources such as disease registries. While disease registries provide accurate and reliable overall survival statistics for the disease population, critical pieces of information that influence both choice of treatment and clinical outcomes usually are not available in the registry database. To combine with the published information, we propose to summarize the external survival information via a system of nonlinear population moments and estimate the survival time model using empirical likelihood methods. The proposed approach is more flexible than the conventional meta-analysis in the sense that it can automatically combine survival information for different subgroups and the information may be derived from different studies. Moreover, an extended estimator that allows for a different baseline risk in the aggregate data is also studied. Empirical likelihood ratio tests are proposed to examine whether the auxiliary survival information is consistent with the individual-level data. Simulation studies show that the proposed estimators yield a substantial gain in efficiency over the conventional partial likelihood approach. Two sets of data analysis are conducted to illustrate the methods and theory.
随着公共领域数据的快速增长,整合来自不同来源的信息以推断感兴趣的关联或差异已成为研究人员面临的新挑战。本文提出了一种新方法,通过将个体水平数据中的信息与疾病登记等外部来源的年度生存概率相结合,提高估计生存时间分布的效率。虽然疾病登记为疾病人群提供了准确可靠的总体生存统计数据,但影响治疗选择和临床结果的关键信息通常在登记数据库中无法获得。为了与已发表的信息相结合,我们建议通过非线性总体矩系统总结外部生存信息,并使用经验似然方法估计生存时间模型。所提出的方法比传统的荟萃分析更灵活,因为它可以自动整合不同亚组的生存信息,并且这些信息可能来自不同的研究。此外,还研究了一种允许汇总数据中存在不同基线风险的扩展估计量。提出了经验似然比检验,以检查辅助生存信息是否与个体水平数据一致。模拟研究表明,所提出的估计量比传统的偏似然方法在效率上有显著提高。进行了两组数据分析以说明方法和理论。