Department of Evolutionary Anthropology, Duke University, Durham, NC.
Department of Genetics, University of North Carolina, Chapel Hill, NC.
Mol Biol Evol. 2023 Apr 4;40(4). doi: 10.1093/molbev/msad074.
Gene flow between previously differentiated populations during the founding of an admixed or hybrid population has the potential to introduce adaptive alleles into the new population. If the adaptive allele is common in one source population, but not the other, then as the adaptive allele rises in frequency in the admixed population, genetic ancestry from the source containing the adaptive allele will increase nearby as well. Patterns of genetic ancestry have therefore been used to identify post-admixture positive selection in humans and other animals, including examples in immunity, metabolism, and animal coloration. A common method identifies regions of the genome that have local ancestry "outliers" compared with the distribution across the rest of the genome, considering each locus independently. However, we lack theoretical models for expected distributions of ancestry under various demographic scenarios, resulting in potential false positives and false negatives. Further, ancestry patterns between distant sites are often not independent. As a result, current methods tend to infer wide genomic regions containing many genes as under selection, limiting biological interpretation. Instead, we develop a deep learning object detection method applied to images generated from local ancestry-painted genomes. This approach preserves information from the surrounding genomic context and avoids potential pitfalls of user-defined summary statistics. We find the method is robust to a variety of demographic misspecifications using simulated data. Applied to human genotype data from Cabo Verde, we localize a known adaptive locus to a single narrow region compared with multiple or long windows obtained using two other ancestry-based methods.
在混合或杂交种群的形成过程中,先前分化的种群之间的基因流动有可能将适应性等位基因引入新种群。如果适应性等位基因在一个来源种群中很常见,但在另一个种群中不常见,那么随着适应性等位基因在混合种群中的频率上升,来自含有适应性等位基因的来源种群的遗传祖先也会在附近增加。因此,遗传祖先模式已被用于识别人类和其他动物的混合后正选择,包括免疫、代谢和动物颜色等方面的例子。一种常见的方法是识别与整个基因组分布相比,基因组中具有局部遗传祖先“异常值”的区域,每个基因座独立考虑。然而,我们缺乏各种人口统计场景下预期遗传祖先分布的理论模型,导致潜在的假阳性和假阴性。此外,遥远位点之间的遗传祖先模式往往不是独立的。因此,当前的方法往往会推断包含许多基因的广泛基因组区域受到选择的影响,从而限制了生物学解释。相反,我们开发了一种应用于局部遗传祖先绘制基因组生成的图像的深度学习目标检测方法。这种方法保留了来自周围基因组背景的信息,并避免了用户定义的汇总统计数据的潜在陷阱。我们发现,该方法在使用模拟数据进行各种人口统计误指定时具有鲁棒性。将其应用于佛得角的人类基因型数据,与使用另外两种基于遗传祖先的方法获得的多个或长窗口相比,我们将一个已知的适应性基因座定位到单个狭窄区域。