Li Yan, Li Zhaohai, Graubard Barry I
Department of Mathematics, University of Texas at Arlington, TX 76019, USA.
Ann Hum Genet. 2011 Nov;75(6):732-41. doi: 10.1111/j.1469-1809.2011.00680.x.
In population-based household surveys, for example, the National Health and Nutrition Examination Survey (NHANES), blood-related individuals are often sampled from the same household. Therefore, genetic data collected from national household surveys are often correlated due to two levels of clustering (correlation) with one induced by the multistage geographical cluster sampling, and the other induced by biological inheritance among multiple participants within the same sampled household. In this paper, we develop efficient statistical methods that consider the weighting effect induced by the differential selection probabilities in complex sample designs, as well as the clustering (correlation) effects described above. We examine and compare the magnitude of each level of clustering effects under different scenarios and identify the scenario under which the clustering effect induced by one level dominates the other. The proposed method is evaluated via Monte Carlo simulation studies and illustrated using the Hispanic Health and Nutrition Survey (HHANES) with simulated genotype data.
例如,在基于人群的家庭调查中,如美国国家健康与营养检查调查(NHANES),与血液相关的个体通常从同一家庭中抽样。因此,从全国性家庭调查中收集的基因数据往往存在相关性,这是由于两级聚类(相关性)造成的,一级是由多阶段地理聚类抽样引起的,另一级是由同一抽样家庭中多个参与者之间的生物遗传引起的。在本文中,我们开发了有效的统计方法,这些方法考虑了复杂样本设计中差异选择概率引起的加权效应,以及上述聚类(相关性)效应。我们研究并比较了不同情况下各级聚类效应的大小,并确定了其中一级聚类效应占主导地位的情况。通过蒙特卡罗模拟研究对所提出的方法进行了评估,并使用西班牙裔健康与营养调查(HHANES)及模拟基因型数据进行了说明。