Animal Breeding and Genomics Centre, Wageningen University & Research, Droevendaalsesteeg 1, Wageningen, 6708PB, The Netherlands.
Netherlands Institute of Ecology (NIOO-KNAW), Droevendaalsesteeg 10, Wageningen, 6708PB, The Netherlands.
BMC Genomics. 2018 Mar 13;19(1):195. doi: 10.1186/s12864-018-4577-1.
Understanding variation in genome structure is essential to understand phenotypic differences within populations and the evolutionary history of species. A promising form of this structural variation is copy number variation (CNV). CNVs can be generated by different recombination mechanisms, such as non-allelic homologous recombination, that rely on specific characteristics of the genome architecture. These structural variants can therefore be more abundant at particular genes ultimately leading to variation in phenotypes under selection. Detailed characterization of CNVs therefore can reveal evolutionary footprints of selection and provide insight in their contribution to phenotypic variation in wild populations.
Here we use genotypic data from a long-term population of great tits (Parus major), a widely studied passerine bird in ecology and evolution, to detect CNVs and identify genomic features prevailing within these regions. We used allele intensities and frequencies from high-density SNP array data from 2,175 birds. We detected 41,029 CNVs concatenated into 8,008 distinct CNV regions (CNVRs). We successfully validated 93.75% of the CNVs tested by qPCR, which were sampled at different frequencies and sizes. A mother-daughter family structure allowed for the evaluation of the inheritance of a number of these CNVs. Thereby, only CNVs with 40 probes or more display segregation in accordance with Mendelian inheritance, suggesting a high rate of false negative calls for smaller CNVs. As CNVRs are a coarse-grained map of CNV loci, we also inferred the frequency of coincident CNV start and end breakpoints. We observed frequency-dependent enrichment of these breakpoints at homologous regions, CpG sites and AT-rich intervals. A gene ontology enrichment analyses showed that CNVs are enriched in genes underpinning neural, cardiac and ion transport pathways.
Great tit CNVs are present in almost half of the genes and prominent at repetitive-homologous and regulatory regions. Although overlapping genes under selection, the high number of false negatives make neutrality or association tests on CNVs detected here difficult. Therefore, CNVs should be further addressed in the light of their false negative rate and architecture to improve the comprehension of their association with phenotypes and evolutionary history.
了解基因组结构的变异对于理解群体内的表型差异和物种的进化历史至关重要。这种结构变异的一种有前途的形式是拷贝数变异 (CNV)。CNVs 可以通过不同的重组机制产生,例如非等位基因同源重组,这依赖于基因组结构的特定特征。因此,这些结构变体可以在特定基因中更为丰富,最终导致选择下的表型变异。对 CNVs 的详细特征描述可以揭示选择的进化足迹,并提供对其在野生种群中表型变异的贡献的洞察。
在这里,我们使用来自长尾山雀(Parus major)长期种群的基因型数据,这是生态学和进化研究中广泛研究的雀形目鸟类,来检测 CNVs 并识别这些区域内普遍存在的基因组特征。我们使用来自 2175 只鸟的高密度 SNP 阵列数据的等位基因强度和频率。我们检测到 41029 个 CNVs 拼接成 8008 个不同的 CNV 区域 (CNVR)。通过 qPCR 成功验证了 93.75%的测试 CNVs,这些 CNVs 的采样频率和大小不同。一个母子家庭结构允许评估这些 CNVs 的一些遗传情况。因此,只有具有 40 个或更多探针的 CNV 才会按照孟德尔遗传规律分离,这表明对于较小的 CNV 存在较高的假阴性率。由于 CNVR 是 CNV 位点的粗粒度图谱,我们还推断了 CNV 起始和结束断点的重合频率。我们观察到这些断点在同源区域、CpG 位点和富含 AT 的间隔中与频率相关的富集。基因本体富集分析表明,CNVs 在支持神经、心脏和离子转运途径的基因中富集。
长尾山雀的 CNVs 存在于近一半的基因中,并且在重复同源和调节区域中很突出。尽管与选择下的重叠基因有关,但由于假阴性率较高,因此很难对这里检测到的 CNVs 进行中性或关联测试。因此,应该根据它们的假阴性率和结构进一步解决 CNVs 问题,以提高对它们与表型和进化历史关联的理解。