Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA.
Biology Department, Colorado State University, Fort Collins, CO 80521, USA.
G3 (Bethesda). 2021 Aug 7;11(8). doi: 10.1093/g3journal/jkab170.
With the rapid rise in availability of high-quality genomes for closely related species, methods for orthology inference that incorporate synteny are increasingly useful. Polyploidy perturbs the 1:1 expected frequencies of orthologs between two species, complicating the identification of orthologs. Here we present a method of ortholog inference, Ploidy-aware Syntenic Orthologous Networks Identified via Collinearity (pSONIC). We demonstrate the utility of pSONIC using four species in the cotton tribe (Gossypieae), including one allopolyploid, and place between 75% and 90% of genes from each species into nearly 32,000 orthologous groups, 97% of which consist of at most singletons or tandemly duplicated genes-58.8% more than comparable methods that do not incorporate synteny. We show that 99% of singleton gene groups follow the expected tree topology and that our ploidy-aware algorithm recovers 97.5% identical groups when compared to splitting the allopolyploid into its two respective subgenomes, treating each as separate "species."
随着高质量近缘物种基因组可用性的迅速增加,结合基因共线性的同源基因推断方法越来越有用。多倍体打乱了两个物种之间预期的 1:1 同源基因频率,使同源基因的鉴定变得复杂。在这里,我们提出了一种同源基因推断方法,即通过共线性识别的多倍体感知共线性同源基因网络(pSONIC)。我们使用棉花族(棉属)中的四个物种(包括一个异源多倍体)来演示 pSONIC 的实用性,并将每个物种的基因中的 75%到 90%放入近 32000 个直系同源基因群中,其中 97%由最多单基因或串联重复基因组成——比不结合基因共线性的可比方法多 58.8%。我们表明,99%的单基因群遵循预期的树拓扑结构,并且当我们将异源多倍体分成两个各自的亚基因组,将每个亚基因组视为单独的“物种”时,我们的多倍体感知算法会恢复 97.5%相同的基因群。