Ruggiero Robert P, Bourgeois Yann, Boissinot Stéphane
New York University Abu DhabiAbu Dhabi, United Arab Emirates.
Front Genet. 2017 Apr 13;8:44. doi: 10.3389/fgene.2017.00044. eCollection 2017.
Vertebrate genomes differ considerably in size and structure. Among the features that show the most variation is the abundance of Long Interspersed Nuclear Elements (LINEs). Mammalian genomes contain 100,000s LINEs that belong to a single clade, L1, and in most species a single family is usually active at a time. In contrast, non-mammalian vertebrates (fish, amphibians and reptiles) contain multiple active families, belonging to several clades, but each of them is represented by a small number of recently inserted copies. It is unclear why vertebrate genomes harbor such drastic differences in LINE composition. To address this issue, we conducted whole genome resequencing to investigate the population genomics of LINEs across 13 genomes of the lizard sampled from two geographically and genetically distinct populations in the Eastern Florida and the Gulf Atlantic regions of the United States. We used the Mobile Element Locator Tool to identify and genotype polymorphic insertions from five major clades of LINEs (CR1, L1, L2, RTE and R4) and the 41 subfamilies that constitute them. Across these groups we found large variation in the frequency of polymorphic insertions and the observed length distributions of these insertions, suggesting these groups vary in their activity and how frequently they successfully generate full-length, potentially active copies. Though we found an abundance of polymorphic insertions (over 45,000) most of these were observed at low frequencies and typically appeared as singletons. Site frequency spectra for most LINEs showed a significant shift toward low frequency alleles compared to the spectra observed for total genomic single nucleotide polymorphisms. Using Tajima's D, and the mean number of pairwise differences in LINE insertion polymorphisms, we found evidence that negative selection is acting on LINE families in a length-dependent manner, its effects being stronger in the larger Eastern Florida population. Our results suggest that a large effective population size and negative selection limit the expansion of polymorphic LINE insertions across these populations and that the probability of LINE polymorphisms reaching fixation is extremely low.
脊椎动物的基因组在大小和结构上有很大差异。表现出最大变异的特征之一是长散布核元件(LINEs)的丰度。哺乳动物基因组包含属于单个进化枝L1的数万个LINEs,并且在大多数物种中,通常一次只有一个家族活跃。相比之下,非哺乳动物脊椎动物(鱼类、两栖动物和爬行动物)包含多个活跃家族,属于几个进化枝,但每个家族都由少量最近插入的拷贝代表。目前尚不清楚为什么脊椎动物基因组在LINE组成上存在如此巨大的差异。为了解决这个问题,我们进行了全基因组重测序,以研究从美国佛罗里达州东部和墨西哥湾大西洋地区两个地理和遗传上不同的种群中采样的13个蜥蜴基因组中LINEs的群体基因组学。我们使用移动元件定位工具来识别和基因分型来自LINEs的五个主要进化枝(CR1、L1、L2、RTE和R4)及其组成的41个亚家族的多态性插入。在这些群体中,我们发现多态性插入的频率和这些插入的观察长度分布存在很大差异,这表明这些群体在其活性以及它们成功产生全长、潜在活性拷贝的频率方面存在差异。尽管我们发现了大量的多态性插入(超过45,000个),但其中大多数是在低频下观察到的,并且通常表现为单拷贝。与全基因组单核苷酸多态性观察到的频谱相比,大多数LINEs的位点频率频谱显示出向低频等位基因的显著偏移。使用 Tajima's D和LINE插入多态性的成对差异平均数,我们发现有证据表明负选择以长度依赖的方式作用于LINE家族,其影响在较大的佛罗里达州东部种群中更强。我们的结果表明,大的有效种群大小和负选择限制了多态性LINE插入在这些种群中的扩展,并且LINE多态性达到固定的概率极低。