Faculty of Science, University of Zagreb, 10000, Zagreb, Croatia.
Algebra University College, Ilica 242, 10000, Zagreb, Croatia.
Sci Rep. 2019 Sep 2;9(1):12629. doi: 10.1038/s41598-019-49022-2.
The centromere is important for segregation of chromosomes during cell division in eukaryotes. Its destabilization results in chromosomal missegregation, aneuploidy, hallmarks of cancers and birth defects. In primate genomes centromeres contain tandem repeats of ~171 bp alpha satellite DNA, commonly organized into higher order repeats (HORs). In spite of crucial importance, satellites have been understudied because of gaps in sequencing - genomic "black holes". Bioinformatical studies of genomic sequences open possibilities to revolutionize understanding of repetitive DNA datasets. Here, using robust (Global Repeat Map) algorithm we identified in hg38 sequence of human chromosome 21 complete ensemble of alpha satellite HORs with six long repeat units (≥20 mers), five of them novel. Novel 33mer HOR has the longest HOR unit identified so far among all somatic chromosomes and novel 23mer reverse HOR is distant far from the centromere. Also, we discovered that for hg38 assembly the 33mer sequences in chromosomes 21, 13, 14, and 22 are 100% identical but nearby gaps are present; that seems to require an additional more precise sequencing. Chromosome 21 is of significant interest for deciphering the molecular base of Down syndrome and of aneuploidies in general. Since the chromosome identifier probes are largely based on the detection of higher order alpha satellite repeats, distinctions between alpha satellite HORs in chromosomes 21 and 13 here identified might lead to a unique chromosome 21 probe in molecular cytogenetics, which would find utility in diagnostics. It is expected that its complete sequence analysis will have profound implications for understanding pathogenesis of diseases and development of new therapeutic approaches.
着丝粒对于真核生物细胞分裂过程中染色体的分离非常重要。其不稳定性会导致染色体错误分离、非整倍体,这也是癌症和出生缺陷的标志。在灵长类基因组中,着丝粒包含串联重复的约 171bpα卫星 DNA,通常组织成更高阶的重复(HORs)。尽管着丝粒非常重要,但由于测序中的空白(基因组“黑洞”),卫星一直未得到充分研究。对基因组序列的生物信息学研究为理解重复 DNA 数据集开辟了新的可能性。在这里,我们使用强大的(Global Repeat Map)算法,在 hg38 人类染色体 21 序列中识别出了完整的α卫星 HOR 集合,其中包含六个长重复单元(≥20 个核苷酸),其中五个是新的。新的 33mer HOR 是迄今为止在所有体细胞染色体中发现的最长的 HOR 单元,而新的 23mer 反向 HOR 则远离着丝粒。此外,我们还发现,对于 hg38 组装,染色体 21、13、14 和 22 中的 33mer 序列完全相同,但附近存在缺口;这似乎需要更精确的测序。染色体 21 对于解析唐氏综合征和一般的非整倍体的分子基础具有重要意义。由于染色体标识符探针主要基于高阶α卫星重复的检测,因此在此识别的染色体 21 和 13 之间的α卫星 HOR 之间的区别可能会导致分子细胞遗传学中独特的染色体 21 探针,这在诊断中会有实际应用。预计其完整序列分析将对理解疾病的发病机制和开发新的治疗方法产生深远影响。