Zeng D, Lin D Y, Avery C L, North K E, Bray M S
Department of Biostatistics, CB# 7420, University of North Carolina, Chapel Hill, NC 27599-7420, USA.
Biostatistics. 2006 Jul;7(3):486-502. doi: 10.1093/biostatistics/kxj021. Epub 2006 Feb 24.
Estimating the effects of haplotypes on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A haplotype is a specific sequence of nucleotides on the same chromosome of an individual and can only be measured indirectly through the genotype. We consider cohort studies which collect genotype data on a subset of cohort members through case-cohort or nested case-control sampling. We formulate the effects of haplotypes and possibly time-varying environmental variables on the age of onset through a broad class of semiparametric regression models. We construct appropriate nonparametric likelihoods, which involve both finite- and infinite-dimensional parameters. The corresponding nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Consistent variance-covariance estimators are provided, and efficient and reliable numerical algorithms are developed. Simulation studies demonstrate that the asymptotic approximations are accurate in practical settings and that case-cohort and nested case-control designs are highly cost-effective. An application to a major cardiovascular study is provided.
估计单倍型对疾病发病年龄的影响是发现影响复杂人类疾病基因的重要一步。单倍型是个体同一条染色体上核苷酸的特定序列,只能通过基因型间接测量。我们考虑通过病例队列或巢式病例对照抽样收集队列成员子集基因型数据的队列研究。我们通过一类广泛的半参数回归模型来阐述单倍型以及可能随时间变化的环境变量对发病年龄的影响。我们构建了适当的非参数似然函数,其中涉及有限维和无限维参数。相应的非参数极大似然估计量被证明是一致的、渐近正态的且渐近有效的。提供了一致的方差协方差估计量,并开发了高效可靠的数值算法。模拟研究表明,在实际情况下渐近近似是准确的,并且病例队列和巢式病例对照设计具有很高的成本效益。还给出了一个在一项重大心血管研究中的应用。