School of Aquatic and Fishery Sciences, University of Washington, 1122 NE Boat Street, Box 355020, Seattle, WA, 98195-5020, USA.
Mol Ecol Resour. 2017 Jul;17(4):656-669. doi: 10.1111/1755-0998.12613. Epub 2016 Nov 20.
Whole-genome duplications have occurred in the recent ancestors of many plants, fish, and amphibians, resulting in a pervasiveness of paralogous loci and the potential for both disomic and tetrasomic inheritance in the same genome. Paralogs can be difficult to reliably genotype and are often excluded from genotyping-by-sequencing (GBS) analyses; however, removal requires paralogs to be identified which is difficult without a reference genome. We present a method for identifying paralogs in natural populations by combining two properties of duplicated loci: (i) the expected frequency of heterozygotes exceeds that for singleton loci, and (ii) within heterozygotes, observed read ratios for each allele in GBS data will deviate from the 1:1 expected for singleton (diploid) loci. These deviations are often not apparent within individuals, particularly when sequence coverage is low; but, we postulated that summing allele reads for each locus over all heterozygous individuals in a population would provide sufficient power to detect deviations at those loci. We identified paralogous loci in three species: Chinook salmon (Oncorhynchus tshawytscha) which retains regions with ongoing residual tetrasomy on eight chromosome arms following a recent whole-genome duplication, mountain barberry (Berberis alpina) which has a large proportion of paralogs that arose through an unknown mechanism, and dusky parrotfish (Scarus niger) which has largely rediploidized following an ancient whole-genome duplication. Importantly, this approach only requires the genotype and allele-specific read counts for each individual, information which is readily obtained from most GBS analysis pipelines.
全基因组复制在许多植物、鱼类和两栖动物的近代祖先中发生过,导致了大量的同源基因座的存在,并且在同一个基因组中可能存在二倍体和四倍体的遗传。同源基因座很难可靠地进行基因分型,通常会从基因分型测序(GBS)分析中排除;然而,在没有参考基因组的情况下,需要识别同源基因座,这是很困难的。我们提出了一种在自然种群中识别同源基因座的方法,该方法结合了重复基因座的两个特性:(i)杂合子的预期频率超过了单倍型基因座的频率,(ii)在杂合子中,GBS 数据中每个等位基因的观测读比值将偏离单倍型(二倍体)基因座的 1:1 预期。这些偏差在个体内部通常不明显,特别是当序列覆盖率较低时;但是,我们假设在种群中的所有杂合个体中对每个基因座的等位基因读数求和,将为检测这些基因座的偏差提供足够的能力。我们在三个物种中识别了同源基因座:奇努克鲑鱼(Oncorhynchus tshawytscha),在最近的全基因组复制后,八个染色体臂上仍保留着正在进行的残余四倍体区域;山地山茱萸(Berberis alpina),其大部分同源基因座是通过未知机制产生的;还有黑鲷鱼(Scarus niger),在一次古老的全基因组复制后,大部分已经重新成为二倍体。重要的是,这种方法只需要每个个体的基因型和等位基因特异性的读数值,这些信息很容易从大多数 GBS 分析流程中获得。