Nafikov Rafael A, Nato Alejandro Q, Sohi Harkirat, Wang Bowen, Brown Lisa, Horimoto Andrea R, Vardarajan Badri N, Barral Sandra M, Tosto Giuseppe, Mayeux Richard P, Thornton Timothy A, Blue Elizabeth, Wijsman Ellen M
Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington.
Department of Statistics, University of Washington, Seattle, Washington.
Genet Epidemiol. 2018 Sep;42(6):500-515. doi: 10.1002/gepi.22133. Epub 2018 Jun 3.
Multipoint linkage analysis is an important approach for localizing disease-associated loci in pedigrees. Linkage analysis, however, is sensitive to misspecification of marker allele frequencies. Pedigrees from recently admixed populations are particularly susceptible to this problem because of the challenge of accurately accounting for population structure. Therefore, increasing emphasis on use of multiethnic samples in genetic studies requires reevaluation of best practices, given data currently available. Typical strategies have been to compute allele frequencies from the sample, or to use marker allele frequencies determined by admixture proportions averaged over the entire sample. However, admixture proportions vary among pedigrees and throughout the genome in a family-specific manner. Here, we evaluate several approaches to model admixture in linkage analysis, providing different levels of detail about ancestral origin. To perform our evaluations, for specification of marker allele frequencies, we used data on 67 Caribbean Hispanic admixed families from the Alzheimer's Disease Sequencing Project. Our results show that choice of admixture model has an effect on the linkage analysis results. Variant-specific admixture proportions, computed for individual families, provide the most detailed regional admixture estimates, and, as such, are the most appropriate allele frequencies for linkage analysis. This likely decreases the number of false-positive results, and is straightforward to implement.
多点连锁分析是在系谱中定位疾病相关基因座的重要方法。然而,连锁分析对标记等位基因频率的错误设定很敏感。由于准确考虑群体结构存在挑战,来自近期混合群体的系谱尤其容易出现这个问题。因此,鉴于目前可获得的数据,在遗传研究中越来越强调使用多民族样本,这就需要重新评估最佳实践方法。典型的策略是从样本中计算等位基因频率,或者使用根据整个样本平均混合比例确定的标记等位基因频率。然而,混合比例在不同系谱之间以及在一个家族特定的全基因组范围内都会有所不同。在这里,我们评估了几种在连锁分析中对混合进行建模的方法,这些方法提供了关于祖先起源的不同详细程度。为了进行我们的评估,对于标记等位基因频率的设定,我们使用了来自阿尔茨海默病测序项目的67个加勒比西班牙裔混合家族的数据。我们的结果表明,混合模型的选择对连锁分析结果有影响。为各个家族计算的特定变异混合比例提供了最详细的区域混合估计,因此,是连锁分析中最合适的等位基因频率。这可能会减少假阳性结果的数量,并且实施起来很简单。