Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA.
Hum Mol Genet. 2013 Aug 15;22(16):3227-38. doi: 10.1093/hmg/ddt176. Epub 2013 Apr 16.
In Huntington's disease (HD), the size of the expanded HTT CAG repeat mutation is the primary driver of the processes that determine age at onset of motor symptoms. However, correlation of cellular biochemical parameters also extends across the normal repeat range, supporting the view that the CAG repeat represents a functional polymorphism with dominant effects determined by the longer allele. A central challenge to defining the functional consequences of this single polymorphism is the difficulty of distinguishing its subtle effects from the multitude of other sources of biological variation. We demonstrate that an analytical approach based upon continuous correlation with CAG size was able to capture the modest (∼21%) contribution of the repeat to the variation in genome-wide gene expression in 107 lymphoblastoid cell lines, with alleles ranging from 15 to 92 CAGs. Furthermore, a mathematical model from an iterative strategy yielded predicted CAG repeat lengths that were significantly positively correlated with true CAG allele size and negatively correlated with age at onset of motor symptoms. Genes negatively correlated with repeat size were also enriched in a set of genes whose expression were CAG-correlated in human HD cerebellum. These findings both reveal the relatively small, but detectable impact of variation in the CAG allele in global data in these peripheral cells and provide a strategy for building multi-dimensional data-driven models of the biological network that drives the HD disease process by continuous analysis across allelic panels of neuronal cells vulnerable to the dominant effects of the HTT CAG repeat.
在亨廷顿病(HD)中,扩展的 HTT CAG 重复突变的大小是决定运动症状发病年龄的主要驱动因素。然而,细胞生化参数的相关性也延伸到正常重复范围,支持 CAG 重复代表具有显性效应的功能性多态性,由较长的等位基因决定的观点。定义这种单一多态性的功能后果的一个核心挑战是,从众多其他来源的生物学变异中区分其微妙影响的困难。我们证明,基于与 CAG 大小连续相关的分析方法能够捕捉到重复对 107 个淋巴母细胞系中全基因组基因表达变异的适度(约 21%)贡献,等位基因范围从 15 到 92 个 CAG。此外,来自迭代策略的数学模型产生了预测的 CAG 重复长度,这些长度与真实的 CAG 等位基因大小呈显著正相关,与运动症状发病年龄呈负相关。与重复大小呈负相关的基因也在一组基因中富集,这些基因在人类 HD 小脑的 CAG 相关性基因中表达。这些发现既揭示了在这些外周细胞的全局数据中,CAG 等位基因变异的相对较小但可检测的影响,又提供了一种策略,通过对易受 HTT CAG 重复显性效应影响的神经元细胞的等位基因面板进行连续分析,构建驱动 HD 疾病过程的生物学网络的多维数据驱动模型。