Hinrichs Anthony L, Culverhouse Robert C, Suarez Brian K
Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA.
Department of Medicine and Division of Biostatistics, Washington University School of Medicine, St. Louis, MO 63110, USA.
BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S17. doi: 10.1186/1753-6561-8-S1-S17. eCollection 2014.
The ideal genetic analysis of family data would include whole genome sequence on all family members. A strategy of combining sequence data from a subset of key individuals with inexpensive, genome-wide association study (GWAS) chip genotypes on all individuals to infer sequence level genotypes throughout the families has been suggested as a highly accurate alternative. This strategy was followed by the Genetic Analysis Workshop 18 data providers. We examined the quality of the imputation to identify potential consequences of this strategy by comparing discrepancies between GWAS genotype calls and imputed calls for the same variants. Overall, the inference and imputation process worked very well. However, we find that discrepancies occurred at an increased rate when imputation was used to infer missing data in sequenced individuals. Although this may be an artifact of this particular instantiation of these analytic methods, there may be general genetic or algorithmic reasons to avoid trying to fill in missing sequence data. This is especially true given the risk of false positives and reduction in power for family-based transmission tests when founders are incorrectly imputed as heterozygotes. Finally, we note a higher rate of discrepancies when unsequenced individuals are inferred using sequenced individuals from other pedigrees drawn from the same admixed population.
对家系数据进行理想的基因分析应包括所有家庭成员的全基因组序列。有人提出了一种策略,即将来自关键个体子集的序列数据与所有个体的廉价全基因组关联研究(GWAS)芯片基因型相结合,以推断整个家系的序列水平基因型,这是一种高度准确的替代方法。遗传分析研讨会18的数据提供者采用了这种策略。我们通过比较相同变异的GWAS基因型调用与推断基因型调用之间的差异,检查了推断的质量,以确定该策略的潜在后果。总体而言,推断和归因过程运行得非常好。然而,我们发现,当使用归因来推断测序个体中的缺失数据时,差异出现的频率增加。尽管这可能是这些分析方法的这种特定实例化的人为产物,但可能存在一些普遍的遗传或算法原因,以避免尝试填充缺失的序列数据。考虑到当奠基者被错误地推断为杂合子时,基于家系的传递检验出现假阳性的风险和效能降低,情况尤其如此。最后,我们注意到,当使用来自同一混合人群的其他家系的测序个体来推断未测序个体时,差异发生率更高。