Vicente-Salvador David, Puig Marta, Gayà-Vidal Magdalena, Pacheco Sarai, Giner-Delgado Carla, Noguera Isaac, Izquierdo David, Martínez-Fundichely Alexander, Ruiz-Herrera Aurora, Estivill Xavier, Aguado Cristina, Lucas-Lledó José Ignacio, Cáceres Mario
Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.
Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, (Barcelona), Spain.
Hum Mol Genet. 2017 Feb 1;26(3):567-581. doi: 10.1093/hmg/ddw415.
The growing catalogue of structural variants in humans often overlooks inversions as one of the most difficult types of variation to study, even though they affect phenotypic traits in diverse organisms. Here, we have analysed in detail 90 inversions predicted from the comparison of two independently assembled human genomes: the reference genome (NCBI36/HG18) and HuRef. Surprisingly, we found that two thirds of these predictions (62) represent errors either in assembly comparison or in one of the assemblies, including 27 misassembled regions in HG18. Next, we validated 22 of the remaining 28 potential polymorphic inversions using different PCR techniques and characterized their breakpoints and ancestral state. In addition, we determined experimentally the derived allele frequency in Europeans for 17 inversions (DAF = 0.01-0.80), as well as the distribution in 14 worldwide populations for 12 of them based on the 1000 Genomes Project data. Among the validated inversions, nine have inverted repeats (IRs) at their breakpoints, and two show nucleotide variation patterns consistent with a recurrent origin. Conversely, inversions without IRs have a unique origin and almost all of them show deletions or insertions at the breakpoints in the derived allele mediated by microhomology sequences, which highlights the importance of mechanisms like FoSTeS/MMBIR in the generation of complex rearrangements in the human genome. Finally, we found several inversions located within genes and at least one candidate to be positively selected in Africa. Thus, our study emphasizes the importance of careful analysis and validation of large-scale genomic predictions to extract reliable biological conclusions.
人类结构变异的目录不断增加,常常忽视倒位是最难研究的变异类型之一,尽管它们会影响多种生物的表型特征。在这里,我们详细分析了通过比较两个独立组装的人类基因组(参考基因组(NCBI36/HG18)和HuRef)预测出的90个倒位。令人惊讶的是,我们发现这些预测中有三分之二(62个)代表组装比较或其中一个组装中的错误,包括HG18中的27个错误组装区域。接下来,我们使用不同的PCR技术验证了其余28个潜在多态性倒位中的22个,并对其断点和祖先状态进行了表征。此外,我们通过实验确定了17个倒位在欧洲人中的衍生等位基因频率(DAF = 0.01 - 0.80),并根据千人基因组计划数据确定了其中12个在14个全球人群中的分布。在经过验证的倒位中,有9个在其断点处具有反向重复序列(IRs),有2个显示出与反复起源一致的核苷酸变异模式。相反,没有IRs的倒位具有独特的起源,并且几乎所有倒位在由微同源序列介导的衍生等位基因的断点处都显示出缺失或插入,这突出了FoSTeS/MMBIR等机制在人类基因组复杂重排产生中的重要性。最后,我们发现了几个位于基因内的倒位以及至少一个在非洲受到正选择的候选倒位。因此,我们的研究强调了仔细分析和验证大规模基因组预测以提取可靠生物学结论的重要性。