Maestri Simone, Scalzo Davide, Damaggio Gianluca, Zobel Martina, Besusso Dario, Cattaneo Elena
Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy.
INGM, Istituto Nazionale Genetica Molecolare 'Romeo ed Enrica Invernizzi', Street Francesco Sforza, 35, 20122, Milan, Italy.
Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1155.
The accurate characterization of triplet repeats, especially the overrepresented CAG repeats, is increasingly relevant for several reasons. First, germline expansion of CAG repeats above a gene-specific threshold causes multiple neurodegenerative disorders; for instance, Huntington's disease (HD) is triggered by >36 CAG repeats in the huntingtin (HTT) gene. Second, extreme expansions up to 800 CAG repeats have been found in specific cell types affected by the disease. Third, synonymous single nucleotide variants within the CAG repeat stretch influence the age of disease onset. Thus, new sequencing-based protocols that profile both the length and the exact nucleotide sequence of triplet repeats are crucial. Various strategies to enrich the target gene over the background, along with sequencing platforms and bioinformatic pipelines, are under development. This review discusses the concepts, challenges, and methodological opportunities for analyzing triplet repeats, using HD as a case study. Starting with traditional approaches, we will explore how sequencing-based methods have evolved to meet increasing scientific demands. We will also highlight experimental and bioinformatic challenges, aiming to provide a guide for accurate triplet repeat characterization for diagnostic and therapeutic purposes.
由于多种原因,对三联体重复序列,尤其是过度富集的CAG重复序列进行准确表征变得越来越重要。首先,CAG重复序列在基因特异性阈值以上的生殖系扩增会导致多种神经退行性疾病;例如,亨廷顿舞蹈病(HD)由亨廷顿蛋白(HTT)基因中>36个CAG重复序列引发。其次,在受该疾病影响的特定细胞类型中发现了高达800个CAG重复序列的极端扩增。第三,CAG重复序列区域内的同义单核苷酸变异会影响疾病发病年龄。因此,能够同时分析三联体重复序列长度和精确核苷酸序列的基于测序的新方案至关重要。目前正在开发各种在背景中富集目标基因的策略,以及测序平台和生物信息学流程。本综述以HD为例,讨论分析三联体重复序列的概念、挑战和方法学机遇。从传统方法开始,我们将探讨基于测序的方法是如何发展以满足不断增长的科学需求的。我们还将强调实验和生物信息学方面的挑战,旨在为准确表征三联体重复序列以用于诊断和治疗目的提供指导。