Glunčić Matko, Barić Domjan, Paar Vladimir
Faculty of Science, University of Zagreb, Zagreb 10000, Croatia.
Department of Mathematical, Physical and Chemical Sciences, Croatian Academy of Sciences and Arts, Zagreb 10000, Croatia.
Bioinform Adv. 2024 Nov 28;4(1):vbae191. doi: 10.1093/bioadv/vbae191. eCollection 2024.
Tandem monomeric units, integral components of eukaryotic genomes, form higher-order repeat (HOR) structures that play crucial roles in maintaining chromosome integrity and regulating gene expression and protein abundance. Given their significant influence on processes such as evolution, chromosome segregation, and disease, developing a sensitive and automated tool for identifying HORs across diverse genomic sequences is essential.
In this study, we applied the GRMhor (Global Repeat Map hor) algorithm to analyse the centromeric region of chromosome 20 in three individual human genomes, as well as in the centromeric regions of three higher primates. In all three human genomes, we identified six distinct HOR arrays, which revealed significantly greater differences in the number of canonical and variant copies, as well as in their overall structure, than would be expected given the 99.9% genetic similarity among humans. Furthermore, our analysis of higher primate genomes, which revealed entirely different HOR sequences, indicates a much larger genomic divergence between humans and higher primates than previously recognized. These results underscore the suitability of the GRMhor algorithm for studying specificities in individual genomes, particularly those involving repetitive monomers in centromere structure, which is essential for proper chromosome segregation during cell division, while also highlighting its utility in exploring centromere evolution and other repetitive genomic regions.
Source code and example binaries freely available for download at github.com/gluncic/GRM2023.
串联单体单元作为真核生物基因组的组成部分,形成了高阶重复(HOR)结构,这些结构在维持染色体完整性、调节基因表达和蛋白质丰度方面发挥着关键作用。鉴于它们对进化、染色体分离和疾病等过程有重大影响,开发一种灵敏且自动化的工具来识别不同基因组序列中的HOR至关重要。
在本研究中,我们应用GRMhor(全局重复图谱HOR)算法分析了三个人类个体基因组中20号染色体的着丝粒区域,以及三种高等灵长类动物的着丝粒区域。在所有三个人类基因组中,我们识别出六个不同的HOR阵列,这些阵列显示出在标准拷贝数和变异拷贝数及其整体结构上的差异,比考虑到人类之间99.9%的遗传相似性所预期的要大得多。此外,我们对高等灵长类动物基因组的分析揭示了完全不同的HOR序列,这表明人类与高等灵长类动物之间的基因组差异比之前认识到的要大得多。这些结果强调了GRMhor算法适用于研究个体基因组中的特异性,特别是那些涉及着丝粒结构中重复单体的特异性,这对于细胞分裂过程中正确染色体分离至关重要,同时也突出了其在探索着丝粒进化和其他重复基因组区域方面的实用性。
源代码和示例二进制文件可在github.com/gluncic/GRM2023上免费下载。