Division of Vaccine and Infectious Disease, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA.
Biostatistics. 2022 Dec 12;24(1):32-51. doi: 10.1093/biostatistics/kxab014.
Assessing disease comorbidity patterns in families represents the first step in gene mapping for diseases and is central to the practice of precision medicine. One way to evaluate the relative contributions of genetic risk factor and environmental determinants of a complex trait (e.g., Alzheimer's disease [AD]) and its comorbidities (e.g., cardiovascular diseases [CVD]) is through familial studies, where an initial cohort of subjects are recruited, genotyped for specific loci, and interviewed to provide extensive disease history in family members. Because of the retrospective nature of obtaining disease phenotypes in family members, the exact time of disease onset may not be available such that current status data or interval-censored data are observed. All existing methods for analyzing these family study data assume single event subject to right-censoring so are not applicable. In this article, we propose a semiparametric regression model for the family history data that assumes a family-specific random effect and individual random effects to account for the dependence due to shared environmental exposures and unobserved genetic relatedness, respectively. To incorporate multiple events, we jointly model the onset of the primary disease of interest and a secondary disease outcome that is subject to interval-censoring. We propose nonparametric maximum likelihood estimation and develop a stable Expectation-Maximization (EM) algorithm for computation. We establish the asymptotic properties of the resulting estimators and examine the performance of the proposed methods through simulation studies. Our application to a real world study reveals that the main contribution of comorbidity between AD and CVD is due to genetic factors instead of environmental factors.
评估家族中的疾病共病模式是为疾病进行基因定位的第一步,也是精准医学实践的核心。评估遗传风险因素和环境决定因素对复杂特征(例如阿尔茨海默病[AD])及其共病(例如心血管疾病[CVD])的相对贡献的一种方法是通过家族研究,在该研究中,最初招募了一组受试者,对特定基因座进行基因分型,并对其进行访谈,以提供家族成员的广泛病史。由于在家族成员中获得疾病表型是回顾性的,因此可能无法获得疾病发作的确切时间,因此观察到的是当前状态数据或区间删失数据。所有用于分析这些家族研究数据的现有方法都假设单个事件受到右删失的限制,因此不适用。在本文中,我们提出了一种针对家族史数据的半参数回归模型,该模型假设家族特有的随机效应和个体随机效应,以分别解释由于共同环境暴露和未观察到的遗传相关性而导致的依赖性。为了合并多个事件,我们共同对感兴趣的主要疾病的发病和受区间删失影响的次要疾病结局进行建模。我们提出了非参数最大似然估计,并开发了稳定的期望最大化(EM)算法进行计算。我们确立了所得估计量的渐近性质,并通过模拟研究检验了所提出方法的性能。我们对真实世界研究的应用表明,AD 和 CVD 之间的共病的主要贡献归因于遗传因素,而不是环境因素。