McGinty Ryan, Lyskova Alisa, Mirkin Sergei M
Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States.
Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow 119234, Russia.
Nucleic Acids Res. 2025 Jun 20;53(12). doi: 10.1093/nar/gkaf619.
Mirror DNA repeats were found in genomic DNA several decades ago, but their role and the mechanisms leading to their abundance have remained a mystery. The only firmly established functional property was that the subset of long homopurine-homopyrimidine mirror repeats (H-motifs) can form a triple-helical DNA secondary structure (H-DNA). Here, we analyzed the sequence content of mirror repeats in the telomere-to-telomere human genome sequence. Our findings suggest that long mirror repeats in genomic DNA originate exclusively from the expansion of simple tandem repeats (STRs). Strikingly, long H-motifs are highly overrepresented compared to all other mirror repeats and STRs. We hypothesize that long H-motif STRs could be particularly expansion-prone owing to H-DNA-mediated genome instability, pointing to the length at which this structure becomes a significant hindrance.
几十年前就在基因组DNA中发现了镜像DNA重复序列,但其作用以及导致其大量存在的机制一直是个谜。唯一确定的功能特性是长同嘌呤-同嘧啶镜像重复序列(H-基序)子集能够形成三螺旋DNA二级结构(H-DNA)。在此,我们分析了端粒到端粒的人类基因组序列中镜像重复序列的序列内容。我们的研究结果表明,基因组DNA中的长镜像重复序列完全源自简单串联重复序列(STR)的扩增。引人注目的是,与所有其他镜像重复序列和STR相比,长H-基序的比例过高。我们推测,由于H-DNA介导的基因组不稳定性,长H-基序STR可能特别容易扩增,这表明该结构成为重大障碍的长度。