Lin D Y
Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599-7420, USA.
Genet Epidemiol. 2004 May;26(4):255-64. doi: 10.1002/gepi.10317.
Exploring the associations between haplotypes and disease phenotypes is an important step toward the discovery of genes that influence complex human diseases. When unrelated subjects are sampled, haplotypes are often ambiguous because of the unknown gametic phase of the measured sites along a chromosome. We consider cohort studies of unrelated subjects which collect data on potentially censored ages of onset of disease along with unphased genotypes and possibly time-varying environmental factors. We formulate the effects of haplotypes and environmental variables on the time to disease occurrence through a semiparametric Cox proportional hazards model, which can accommodate a variety of genetic mechanisms as well as gene-environment interactions. We develop a simple and fast expectation-maximization algorithm to maximize the likelihood for the relative risks and other parameters based on the observable data of unphased genotypes and potentially censored ages of onset. The resultant estimators are consistent, efficient, and asymptotically normal. Simulation studies show that, for practical situations, the parameter estimators are virtually unbiased, the association tests maintain type I errors near nominal levels, the confidence intervals have proper coverage probabilities, and the efficiency loss due to unknown gametic phase is small.
探索单倍型与疾病表型之间的关联是发现影响复杂人类疾病基因的重要一步。当对无亲缘关系的个体进行抽样时,由于沿着染色体测量位点的配子相未知,单倍型往往不明确。我们考虑对无亲缘关系个体的队列研究,这些研究收集关于疾病发病年龄可能被截尾的数据以及未分型的基因型,还有可能随时间变化的环境因素。我们通过半参数Cox比例风险模型来阐述单倍型和环境变量对疾病发生时间的影响,该模型可以适应多种遗传机制以及基因 - 环境相互作用。我们开发了一种简单快速的期望最大化算法,以基于未分型基因型的可观测数据和可能被截尾的发病年龄来最大化相对风险及其他参数的似然性。所得估计量是一致的、有效的且渐近正态的。模拟研究表明,在实际情况下,参数估计量几乎无偏,关联检验的I型错误维持在接近名义水平,置信区间具有适当的覆盖概率,并且由于未知配子相导致的效率损失很小。