O'Reilly Gabe D, Manlik Oliver, Vardeh Sandra, Sinclair Jennifer, Cannell Belinda, Lawler Zachary P, Sherwin William B
Evolution and Ecology Research Centre, School of Biological Earth and Environmental Science University of New South Wales Sydney New South Wales Australia.
Department of Bioinformatics University of North Carolina at Charlotte Charlotte North Carolina USA.
Ecol Evol. 2024 Jul 23;14(7):e11561. doi: 10.1002/ece3.11561. eCollection 2024 Jul.
The fixation index, , has been a staple measure to detect selection, or departures from random mating in populations. However, current Next Generation Sequencing (NGS) cannot easily estimate , in multi-locus gene families that contain multiple loci having similar or identical arrays of variant sequences of ≥1 kilobase (kb), which differ at multiple positions. In these families, high-quality short-read NGS data typically identify variants, but not the genomic location, which is required to calculate (based on locus-specific observed and expected heterozygosity). Thus, to assess assortative mating, or selection on heterozygotes, from NGS of multi-locus gene families, we need a method that does not require knowledge of which variants are alleles at which locus in the genome. We developed such a method. Like , our novel measure, , is based on the principle that positive assortative mating, or selection against heterozygotes, and some other processes reduce within-individual variability relative to the population. We demonstrate high accuracy of on a wide range of simulated scenarios and two datasets from natural populations of penguins and dolphins. is important because multi-locus gene families are often involved in assortative mating or selection on heterozygotes. is particularly useful for multi-locus gene families, such as toll-like receptors, the major histocompatibility complex in animals, homeobox genes in fungi and self-incompatibility genes in plants.
固定指数( )一直是检测种群中选择或偏离随机交配的主要指标。然而,目前的下一代测序(NGS)在多基因座基因家族中难以轻松估计 ,这些家族包含多个具有≥1千碱基(kb)的相似或相同变异序列阵列的基因座,且在多个位置存在差异。在这些家族中,高质量的短读长NGS数据通常能识别变异,但无法确定基因组位置,而计算 (基于基因座特异性观察到的和预期的杂合度)需要基因组位置信息。因此,为了从多基因座基因家族的NGS数据中评估选型交配或对杂合子的选择,我们需要一种不需要知道哪些变异是基因组中哪个基因座上等位基因的方法。我们开发了这样一种方法。与 一样,我们的新指标 基于这样的原理:正向选型交配、对杂合子的选择以及其他一些过程会降低个体内部相对于种群的变异性。我们在广泛的模拟场景以及来自企鹅和海豚自然种群的两个数据集上证明了 的高精度。 很重要,因为多基因座基因家族通常参与选型交配或对杂合子的选择。 对于多基因座基因家族特别有用,例如Toll样受体、动物中的主要组织相容性复合体、真菌中的同源异型盒基因以及植物中的自交不亲和基因。