Naj Adam C, Park Yo Son, Beaty Terri H
John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA.
Methods Mol Biol. 2012;850:119-50. doi: 10.1007/978-1-61779-555-8_8.
Beyond calculating parameter estimates to characterize the distribution of genetic features of populations (frequencies of mutations in various regions of the genome, allele frequencies, measures of Hardy-Weinberg disequilibrium), genetic epidemiology aims to identify correlations between genetic variants and phenotypic traits, with considerable emphasis placed on finding genetic variants that increase susceptibility to disease and disease-related traits. However, determining correlation alone does not suffice: genetic variants common in an isolated ethnic group with a high burden of a given disease may show relatively high correlation with disease but, as markers of ethnicity, these may not necessarily have any functional role in disease. To establish a causal relationship between genetic variants and disease (or disease-related traits), proper statistical analyses of human data must incorporate epidemiologic approaches to examining sets of families or unrelated individuals with information available on individuals' disease status or related traits.Through different analytical approaches, statistical analysis of human data can answer several important questions about the relationship between genes and disease: 1. Does the disease tend to cluster in families more than expected by chance alone? 2. Does the disease appear to follow a particular genetic model of transmission in families? 3. Do variants at a particular genetic marker tend to cosegregate with disease in families? 4. Do specific genetic markers tend to be carried more frequently by those with disease than by those without, in a given population (or across families)? The first question can be examined using studies of familial aggregation or correlation. An ancillary question: "how much of the susceptibility to disease (or variation in disease-related traits) might be accounted for by genetic factors?" is typically answered by estimating heritability, the proportion of disease susceptibility or trait variation attributable to genetics. The second question can be formally tested using pedigrees for which disease affection status or trait values are available through a modeling approach known as segregation analysis. The third question can be answered with data on pedigrees with affected members and genotype information at markers of interest, using linkage analysis. The fourth question is answerable using genotype information at markers on unrelated affected and unaffected individuals and/or families with affected and unaffected members. All of these questions can also be explored for quantitative (or continuously distributed) traits by examining variation in trait values between family members or between unrelated individuals. While each of these questions and the analytical approaches for answering them is explored extensively in subsequent chapters (heritability in Chapters 9 and 10, segregation in Chapter 12, linkage in Chapters 13-17, and association in Chapters 18-21 and 23), this chapter focuses on statistical methods to answer questions of familial aggregation.
除了计算参数估计值以描述人群遗传特征的分布(基因组各区域的突变频率、等位基因频率、哈迪-温伯格不平衡度量)之外,遗传流行病学旨在确定基因变异与表型特征之间的相关性,相当重视寻找增加疾病易感性和疾病相关特征的基因变异。然而,仅确定相关性是不够的:在患有特定疾病负担较高的孤立族群中常见的基因变异,可能与疾病显示出相对较高的相关性,但作为族群标记,这些变异不一定在疾病中具有任何功能作用。为了建立基因变异与疾病(或疾病相关特征)之间的因果关系,对人类数据进行适当的统计分析必须纳入流行病学方法,以研究有个体疾病状态或相关特征信息的家庭组或无关个体。通过不同的分析方法,对人类数据的统计分析可以回答几个关于基因与疾病关系的重要问题:1. 疾病在家族中的聚集是否比仅由偶然因素预期的更为频繁?2. 疾病在家族中是否似乎遵循特定的遗传传递模式?3. 特定基因标记处的变异在家族中是否倾向于与疾病共分离?4. 在给定人群(或跨家族)中,患有疾病的个体携带特定基因标记的频率是否往往高于未患病个体?第一个问题可以通过家族聚集或相关性研究来检验。一个辅助问题:“疾病易感性(或疾病相关特征的变异)中有多少可能由遗传因素解释?”通常通过估计遗传力来回答,遗传力是疾病易感性或性状变异中可归因于遗传的比例。第二个问题可以使用通过称为分离分析的建模方法可获得疾病患病状态或性状值的家系进行正式检验。第三个问题可以使用有关患病成员的家系数据以及感兴趣标记处的基因型信息,通过连锁分析来回答。第四个问题可以使用无关患病和未患病个体以及/或有患病和未患病成员的家庭中标记处的基因型信息来回答。通过检查家庭成员之间或无关个体之间性状值的变异,所有这些问题也可以针对定量(或连续分布)性状进行探讨。虽然这些问题中的每一个以及回答它们的分析方法将在后续章节中广泛探讨(第9章和第10章讨论遗传力,第12章讨论分离,第13 - 17章讨论连锁,第18 - 21章和第23章讨论关联),但本章重点关注回答家族聚集问题的统计方法。