Xu Zhichao, Choi Jaihee, Sun Ryan
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Houston, 77030, Texas, USA.
Department of Statistics, Rice University, 6100 Main St., Houston, 77030, Texas, USA.
Stat Biosci. 2024 Jul 13. doi: 10.1007/s12561-024-09448-3.
Over the past decade, massive genetic compendiums such as the UK Biobank have gathered extensive genetic and phenotypic data that hold the potential to provide unparalleled insight into the genetic etiologies of various complex diseases. However, much of the disease information is collected as time-to-event outcomes in interval-censored form, and conventional tools for genetic association analysis are often not available for this type of data. For example, set-based inference for common and rare variants analysis is a fundamental investigation in germline genetics studies, but there is a lack of approaches that can perform set-based testing when the interval-censored outcome of interest is subject to the competing risk of another event. To address the need, this work proposes two set-based inference procedures for interval-censored data with competing risks, applicable to rare variants and general genotype sets as well. The interval-censored competing risks sequence kernel association test (crSKAT) is a variance components approach that is powerful when genetic variants in a set demonstrate heterogeneous signals. The interval-censored competing risks Burden (crBurden) test is more powerful when variant signals are homogeneous. Simulation studies show the superiority of the newly developed methods in comparison to ad-hoc alternatives, as evidenced by their ability to control the type I error rate and to improve power. The proposed tests are applied to the UK Biobank to search for genes associated with fracture risk while accounting for death as a competing outcome.
在过去十年中,诸如英国生物银行这样的大规模基因数据集已经收集了广泛的基因和表型数据,这些数据有可能为各种复杂疾病的遗传病因提供前所未有的见解。然而,许多疾病信息是以区间删失形式的事件发生时间结局来收集的,而传统的基因关联分析工具通常不适用于这类数据。例如,对常见和罕见变异分析的基于集合的推断是种系遗传学研究中的一项基本调查,但当感兴趣的区间删失结局受到另一事件的竞争风险影响时,缺乏能够进行基于集合检验的方法。为满足这一需求,这项工作提出了两种针对具有竞争风险的区间删失数据的基于集合的推断程序,它们也适用于罕见变异和一般基因型集合。区间删失竞争风险序列核关联检验(crSKAT)是一种方差成分方法,当集合中的基因变异表现出异质信号时,它具有强大的功效。区间删失竞争风险负担(crBurden)检验在变异信号同质时功效更强。模拟研究表明,与临时替代方法相比,新开发的方法具有优越性,这体现在它们能够控制I型错误率并提高功效。所提出的检验方法应用于英国生物银行,以寻找与骨折风险相关的基因,同时将死亡作为竞争结局加以考虑。