Conomos Matthew P, Reiner Alexander P, Weir Bruce S, Thornton Timothy A
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
Department of Epidemiology, University of Washington, Seattle, WA 98195, USA; Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
Am J Hum Genet. 2016 Jan 7;98(1):127-48. doi: 10.1016/j.ajhg.2015.11.022.
Genealogical inference from genetic data is essential for a variety of applications in human genetics. In genome-wide and sequencing association studies, for example, accurate inference on both recent genetic relatedness, such as family structure, and more distant genetic relatedness, such as population structure, is necessary for protection against spurious associations. Distinguishing familial relatedness from population structure with genotype data, however, is difficult because both manifest as genetic similarity through the sharing of alleles. Existing approaches for inference on recent genetic relatedness have limitations in the presence of population structure, where they either (1) make strong and simplifying assumptions about population structure, which are often untenable, or (2) require correct specification of and appropriate reference population panels for the ancestries in the sample, which might be unknown or not well defined. Here, we propose PC-Relate, a model-free approach for estimating commonly used measures of recent genetic relatedness, such as kinship coefficients and IBD sharing probabilities, in the presence of unspecified structure. PC-Relate uses principal components calculated from genome-screen data to partition genetic correlations among sampled individuals due to the sharing of recent ancestors and more distant common ancestry into two separate components, without requiring specification of the ancestral populations or reference population panels. In simulation studies with population structure, including admixture, we demonstrate that PC-Relate provides accurate estimates of genetic relatedness and improved relationship classification over widely used approaches. We further demonstrate the utility of PC-Relate in applications to three ancestrally diverse samples that vary in both size and genealogical complexity.
从遗传数据进行系谱推断对于人类遗传学中的各种应用至关重要。例如,在全基因组和测序关联研究中,准确推断近期的遗传相关性(如家庭结构)以及更遥远的遗传相关性(如群体结构)对于防止虚假关联是必要的。然而,利用基因型数据区分家族相关性和群体结构很困难,因为两者都通过等位基因共享表现为遗传相似性。现有的推断近期遗传相关性的方法在存在群体结构的情况下存在局限性,它们要么(1)对群体结构做出强有力且简化的假设,而这些假设往往站不住脚,要么(2)需要为样本中的祖先正确指定并使用合适的参考群体面板,而这些可能是未知的或定义不明确的。在此,我们提出了PC-Relate,这是一种无模型方法,用于在存在未指定结构的情况下估计常用的近期遗传相关性度量,如亲缘系数和IBD共享概率。PC-Relate利用从基因组筛选数据计算出的主成分,将由于近期祖先共享和更遥远的共同祖先导致的抽样个体间的遗传相关性划分为两个独立的成分,而无需指定祖先群体或参考群体面板。在包括混合情况的群体结构模拟研究中,我们证明PC-Relate能提供比广泛使用的方法更准确的遗传相关性估计和改进的关系分类。我们进一步证明了PC-Relate在应用于三个祖先来源不同、大小和系谱复杂性各异的样本中的实用性。