Center for Evolutionary Medicine and Informatics, The Biodesign Institute and School of Life Sciences, Arizona State University, USA.
Mol Biol Evol. 2011 Jan;28(1):533-42. doi: 10.1093/molbev/msq221. Epub 2010 Aug 19.
Collagen type I alpha 1 (COL1a1), which encodes the primary subunit of type I collagen, the main structural and most abundant protein in vertebrates, harbors hundreds of mutations linked to human diseases like osteoporosis and osteogenesis imperfecta. Previous studies have attempted to predict the phenotypic severity associated with type I collagen mutations, yet an evolutionary analysis that compares historical and recent selective pressures, including across noncoding regions, has never been conducted. Here, we use a comparative genomic and species evolutionary analysis representing ∼450 My of vertebrate history to investigate functional constraints associated with both exons and introns of the >17-kb COL1a1 gene. We find that although the COL1a1 amino acid sequence is highly conserved, there are both spatial and temporal signatures of varying selective constraint across protein domains. Furthermore, sites of high evolutionary constraint significantly correlate with the location of disease-associated mutations, the latter of which also cluster with respect to specific severity classes typically categorized in clinical studies. Finally, we find that COL1a1 introns are significantly short in length with high GC content, patterns that are shared across highly diverged vertebrates, and which may be a signature of strong stabilizing selection for high COL1a1 gene expression. In conclusion, although previous studies focused on COL1a1 coding regions, the current results implicate introns as areas of high selective constraint and targets of bone-related phenotypic variation. From a broader perspective, our comparative evolutionary approach provides further resolution to models predicting mutations associated with bone-related function and disease severity.
胶原蛋白 I 型 α1 链(COL1a1)编码 I 型胶原蛋白的主要亚基,是脊椎动物中主要的结构蛋白和最丰富的蛋白,携带有数百种与骨质疏松症和成骨不全症等人类疾病相关的突变。先前的研究试图预测与 I 型胶原蛋白突变相关的表型严重程度,但从未进行过比较历史和近期选择压力(包括非编码区域)的进化分析。在这里,我们使用代表约 4.5 亿年脊椎动物历史的比较基因组学和物种进化分析,来研究与 COL1a1 基因的外显子和内含子都相关的功能约束。我们发现,尽管 COL1a1 氨基酸序列高度保守,但在蛋白质结构域中存在空间和时间上的选择压力变化的特征。此外,高进化约束的位点与疾病相关突变的位置显著相关,后者也与通常在临床研究中分类的特定严重程度类别相关。最后,我们发现 COL1a1 内含子的长度非常短,GC 含量高,这种模式在高度分化的脊椎动物中是共享的,这可能是 COL1a1 基因高表达的强烈稳定选择的特征。总之,尽管先前的研究集中在 COL1a1 编码区,但目前的结果表明内含子是高度选择约束的区域,也是与骨骼表型变异相关的靶点。从更广泛的角度来看,我们的比较进化方法为预测与骨骼相关功能和疾病严重程度相关的突变的模型提供了进一步的分辨率。