School of Mathematical Sciences, Monash University, Clayton, Victoria, Australia.
PLoS One. 2012;7(3):e33565. doi: 10.1371/journal.pone.0033565. Epub 2012 Mar 30.
Glial fibrillary acidic protein (GFAP) is an intermediate filament (IF) protein specific to central nervous system (CNS) astrocytes. It has been the subject of intense interest due to its association with neurodegenerative diseases, and because of growing evidence that IF proteins not only modulate cellular structure, but also cellular function. Moreover, GFAP has a family of splicing isoforms apparently more complex than that of other CNS IF proteins, consistent with it possessing a range of functional and structural roles. The gene consists of 9 exons, and to date all isoforms associated with 3' end splicing have been identified from modifications within intron 7, resulting in the generation of exon 7a (GFAPδ/ε) and 7b (GFAPκ). To better understand the nature and functional significance of variation in this region, we used a Bayesian multiple change-point approach to identify conserved regions. This is the first successful application of this method to a single gene--it has previously only been used in whole-genome analyses. We identified several highly or moderately conserved regions throughout the intron 7/7a/7b regions, including untranslated regions and regulatory features, consistent with the biology of GFAP. Several putative unconfirmed features were also identified, including a possible new isoform. We then integrated multiple computational analyses on both the DNA and protein sequences from the mouse, rat and human, showing that the major isoform, GFAPα, has highly conserved structure and features across the three species, whereas the minor isoforms GFAPδ/ε and GFAPκ have low conservation of structure and features at the distal 3' end, both relative to each other and relative to GFAPα. The overall picture suggests distinct and tightly regulated functions for the 3' end isoforms, consistent with complex astrocyte biology. The results illustrate a computational approach for characterising splicing isoform families, using both DNA and protein sequences.
胶质纤维酸性蛋白(GFAP)是一种中间丝(IF)蛋白,特异性表达于中枢神经系统(CNS)星形胶质细胞。由于它与神经退行性疾病有关,而且越来越多的证据表明 IF 蛋白不仅调节细胞结构,还调节细胞功能,因此它一直是研究的热点。此外,GFAP 具有一系列剪接异构体,显然比其他 CNS IF 蛋白更为复杂,这与其具有一系列功能和结构作用相一致。该基因由 9 个外显子组成,迄今为止,与 3'端剪接相关的所有异构体都是通过 7 号内含子内的修饰而识别的,从而产生 7a 外显子(GFAPδ/ε)和 7b 外显子(GFAPκ)。为了更好地理解该区域变异的性质和功能意义,我们使用贝叶斯多重变化点方法来识别保守区域。这是该方法首次成功应用于单个基因——此前它仅用于全基因组分析。我们在整个 7/7a/7b 内含子区域鉴定了几个高度或中度保守的区域,包括非翻译区和调节特征,这与 GFAP 的生物学一致。还鉴定了几个可能未经证实的特征,包括一个可能的新异构体。然后,我们整合了来自小鼠、大鼠和人类的 DNA 和蛋白质序列的多种计算分析,结果表明,主要异构体 GFAPα 在这三个物种中具有高度保守的结构和特征,而次要异构体 GFAPδ/ε 和 GFAPκ 在 3'端具有较低的结构和特征保守性,相对于彼此和相对于 GFAPα 都是如此。总体情况表明,3'端异构体具有独特而严格调节的功能,与复杂的星形胶质细胞生物学一致。该结果说明了一种使用 DNA 和蛋白质序列来描述剪接异构体家族的计算方法。