Lee Annie J, Marder Karen, Alcalay Roy N, Mejia-Santana Helen, Orr-Urtreger Avi, Giladi Nir, Bressman Susan, Wang Yuanjia
Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, U.S.A.
Department of Neurology, College of Physicians and Surgeons, Columbia University, New York, NY, U.S.A.
Stat Med. 2017 Sep 30;36(22):3533-3546. doi: 10.1002/sim.7376. Epub 2017 Jun 27.
In genetic epidemiological studies, family history data are collected on relatives of study participants and used to estimate the age-specific risk of disease for individuals who carry a causal mutation. However, a family member's genotype data may not be collected because of the high cost of in-person interview to obtain blood sample or death of a relative. Previously, efficient nonparametric genotype-specific risk estimation in censored mixture data has been proposed without considering covariates. With multiple predictive risk factors available, risk estimation requires a multivariate model to account for additional covariates that may affect disease risk simultaneously. Therefore, it is important to consider the role of covariates in genotype-specific distribution estimation using family history data. We propose an estimation method that permits more precise risk prediction by controlling for individual characteristics and incorporating interaction effects with missing genotypes in relatives, and thus, gene-gene interactions and gene-environment interactions can be handled within the framework of a single model. We examine performance of the proposed methods by simulations and apply them to estimate the age-specific cumulative risk of Parkinson's disease (PD) in carriers of the LRRK2 G2019S mutation using first-degree relatives who are at genetic risk for PD. The utility of estimated carrier risk is demonstrated through designing a future clinical trial under various assumptions. Such sample size estimation is seen in the Huntington's disease literature using the length of abnormal expansion of a CAG repeat in the HTT gene but is less common in the PD literature. Copyright © 2017 John Wiley & Sons, Ltd.
在遗传流行病学研究中,会收集研究参与者亲属的家族史数据,并用于估计携带致病突变个体的疾病年龄特异性风险。然而,由于获取血样的面对面访谈成本高昂或亲属死亡,可能无法收集家庭成员的基因型数据。此前,有人提出了在截尾混合数据中进行有效的非参数基因型特异性风险估计,但未考虑协变量。当有多个预测风险因素时,风险估计需要一个多变量模型来考虑可能同时影响疾病风险的其他协变量。因此,在使用家族史数据进行基因型特异性分布估计时,考虑协变量的作用很重要。我们提出了一种估计方法,通过控制个体特征并纳入亲属中缺失基因型的交互作用,从而允许更精确的风险预测,这样,基因-基因相互作用和基因-环境相互作用就可以在一个单一模型的框架内处理。我们通过模拟检验了所提出方法的性能,并将其应用于使用有帕金森病(PD)遗传风险的一级亲属来估计LRRK2 G2019S突变携带者帕金森病的年龄特异性累积风险。通过在各种假设下设计未来的临床试验,证明了估计携带者风险的效用。这种样本量估计在亨廷顿舞蹈症文献中可见,其使用HTT基因中CAG重复异常扩增的长度,但在帕金森病文献中不太常见。版权所有© 2017约翰威立父子有限公司。