Lin D Y
Department of Biostatistics, University of North Carolina, CB#7420, Chapel Hill, NC, 27599-7420, USA,
Lifetime Data Anal. 2014 Jan;20(1):16-22. doi: 10.1007/s10985-013-9262-8. Epub 2013 May 31.
Genetic data are now collected frequently in clinical studies and epidemiological cohort studies. For a large study, it may be prohibitively expensive to genotype all study subjects, especially with the next-generation sequencing technology. Two-phase sampling, such as case-cohort and nested case-control sampling, is cost-effective in such settings but entails considerable analysis challenges, especially if efficient estimators are desired. Another type of missing data arises when the investigators are interested in the haplotypes or the genetic markers that are not on the genotyping platform used for the current study. Valid and efficient analysis of such missing data is also interesting and challenging. This article provides an overview of these issues and outlines some directions for future research.
目前,遗传数据在临床研究和流行病学队列研究中经常被收集。对于大型研究而言,对所有研究对象进行基因分型可能成本过高,尤其是采用新一代测序技术时。两阶段抽样,如病例队列抽样和巢式病例对照抽样,在此类情况下具有成本效益,但会带来相当大的分析挑战,特别是在需要高效估计量时。当研究人员对当前研究使用的基因分型平台上未有的单倍型或遗传标记感兴趣时,就会出现另一种类型的缺失数据。对此类缺失数据进行有效且高效的分析也很有趣且具有挑战性。本文概述了这些问题,并概述了一些未来研究的方向。