Naj Adam C, Beaty Terri H
Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, 229 Blockley Hall, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, 229 Blockley Hall, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
Methods Mol Biol. 2017;1666:133-169. doi: 10.1007/978-1-4939-7274-6_8.
In addition to characterizing the distribution of genetic features of populations (mutation and allele frequencies; measures of Hardy-Weinberg equilibrium), genetic epidemiology and statistical genetics aim to explore and define the role of genomic variation in risk of disease or variation in traits of interest. To facilitate this kind of exploration, genetic epidemiology and statistical genetics address a series of questions: 1. Does the disease tend to cluster in families more than expected by chance alone? 2. Does the disease appear to follow a particular genetic model of transmission in families? 3. Does variation at a particular genomic position tend to cosegregate with disease in families? 4. Do specific genetic variants tend to be carried more frequently by those with disease than by those without these variants in a given population (or across families)? The first question can be examined using studies of familial aggregation or correlation. An ancillary question: "how much of the susceptibility to disease (or variation in disease-related traits) might be accounted for by genetic factors?" is typically answered by estimating heritability, the proportion of variance in a trait or in risk to a disease attributable to genetics. The second question can be formally tested using pedigrees for which disease affection status or trait values are available through a modeling approach known as segregation analysis. The third question can be answered with data on genomic markers in pedigrees with affected members informative for linkage, where meiotic cross-over events are estimated or assessed. The fourth question is answerable using genotype data on genomic markers on unrelated affected and unaffected individuals and/or families with affected members and unaffected members. All of these questions can also be explored for quantitative (or continuously distributed) traits by examining variation in trait values between family members or between unrelated individuals. While each of these questions and the analytical approaches for answering them is explored extensively in subsequent chapters (heritability in Chapters 8 and 9 ; segregation in Chapter 12 ; linkage in Chapters 13 - 17 ; and association in Chapters 18 - 20 ), this chapter focuses on statistical methods to address questions of familial aggregation of qualitative phenotypes (e.g., disease status) or quantitative phenotypes.While studies exploring genotype-phenotype correlations are arguably the most important and common type of statistical genetic study performed, these studies are performed under the assumption that genetic contributors at least partially explain risk of a disease or a trait of interest. This may not always be the case, especially with diseases or traits known to be strongly influenced by environmental factors. For this reason, before any of the last three questions described above can be answered, it is important to ask first whether the disease clusters among family members more than unrelated persons, as this constitutes evidence of a possible heritable contribution to disease, justifying the pursuit of studies answering the other questions. In this chapter, the underlying principles of familial aggregation studies are addressed to provide an understanding and set of analytical tools to help answer the question if diseases or traits of interest are likely to be heritable and therefore justify subsequent statistical genetic studies to identify specific genetic causes.
除了描述人群遗传特征的分布情况(突变和等位基因频率;哈迪-温伯格平衡的度量)外,遗传流行病学和统计遗传学旨在探索和界定基因组变异在疾病风险或感兴趣性状变异中的作用。为便于此类探索,遗传流行病学和统计遗传学提出了一系列问题:1. 该疾病在家族中的聚集程度是否高于仅由偶然因素导致的预期?2. 该疾病在家族中是否似乎遵循特定的遗传传递模式?3. 特定基因组位置的变异在家族中是否倾向于与疾病共分离?4. 在给定人群(或跨家族)中,患有疾病的个体携带特定基因变异的频率是否往往高于未患这些变异的个体?第一个问题可通过家族聚集或相关性研究来检验。一个附带问题:“疾病易感性(或疾病相关性状的变异)中有多少可能由遗传因素导致?”通常通过估计遗传力来回答,遗传力是指性状或疾病风险中可归因于遗传的方差比例。第二个问题可通过使用系谱进行正式检验,对于这些系谱,可通过一种称为分离分析的建模方法获得疾病患病状态或性状值。第三个问题可通过在有患病成员且可提供连锁信息的系谱中使用基因组标记数据来回答,其中减数分裂交叉事件可被估计或评估。第四个问题可通过对无关的患病和未患病个体以及/或有患病成员和未患病成员的家族的基因组标记进行基因型数据来回答。所有这些问题也可通过检查家庭成员之间或无关个体之间性状值的变异来探索定量(或连续分布)性状。虽然这些问题以及回答它们的分析方法将在后续章节中广泛探讨(第8章和第9章讨论遗传力;第12章讨论分离;第13 - 17章讨论连锁;第18 - 20章讨论关联),但本章重点关注用于解决定性表型(如疾病状态)或定量表型家族聚集问题的统计方法。虽然探索基因型 - 表型相关性的研究可以说是进行的最重要和最常见的统计遗传研究类型,但这些研究是在遗传因素至少部分解释疾病或感兴趣性状风险的假设下进行的。情况可能并非总是如此,特别是对于已知受环境因素强烈影响的疾病或性状。因此,在回答上述最后三个问题中的任何一个之前,首先询问该疾病在家庭成员中的聚集程度是否高于无关个体非常重要,因为这构成了疾病可能存在遗传贡献的证据,从而证明进行回答其他问题的研究是合理的。在本章中,将探讨家族聚集研究的基本原理,以提供理解和一套分析工具,帮助回答感兴趣的疾病或性状是否可能具有遗传性,从而证明后续进行统计遗传研究以确定具体遗传原因的合理性。