Fuller F, Boedtker H
Biochemistry. 1981 Feb 17;20(4):996-1006. doi: 10.1021/bi00507a054.
Three pro-alpha 1 collagen cDNA clones, pCg1, pCg26, and pCg54, and two pro-alpha 2 collagen cDNA clones, pCg 13 and pCg45, were subjected to extensive DNA sequence determination. The combined sequences specified the amino acid sequences for chicken pro-alpha 1 and pro-alpha 2 type I collagens starting at residue 814 in the collagen triple-helical region and continuing to the procollagen C-termini as determined by the first in-phase termination codon. Thus, the sequences of 272 pro-alpha 1 C-terminal, 260 pro-alpha 2 C-terminal, 201 pro-alpha 1 helical, and 201 pro-alpha 2 helical amino acids were established. In addition, the sequences of several hundred nucleotides corresponding to noncoding regions of both procollagen mRNAs were determined. In total, 1589 pro-alpha 1 base pairs and 1691 pro-alpha 2 base pairs were sequenced, corresponding to approximately one-third of the total length of each mRNA. Both procollagen mRNA sequences have a high G+C content. The pro-alpha 1 mRNA is 75% G+C in the helical coding region sequenced and 61% G&C in the C-terminal coding region while the pro-alpha 2 mRNA is 60% and 48% G+C, respectively, in these regions. The dinucleotide sequence pCG occurs at a higher frequence in both sequences than is normally found in vertebrate DNAs and is approximately 5 times more frequent in the pro-alpha 1 sequence than in the pro-alpha 2 sequence. Nucleotide homology in the helical coding regions is very limited given that these sequences code for the repeating Gly-X-Y tripeptide in a region where X and Y residues are 50% conserved. These differences are clearly reflected in the preferred codon usages of the two mRNAs.
对三个原α1胶原蛋白cDNA克隆pCg1、pCg26和pCg54以及两个原α2胶原蛋白cDNA克隆pCg13和pCg45进行了广泛的DNA序列测定。通过第一个同相终止密码子确定,合并后的序列确定了鸡原α1和原α2 I型胶原蛋白从胶原蛋白三螺旋区域的第814位残基开始到前胶原C末端的氨基酸序列。因此,确定了272个原α1 C末端、260个原α2 C末端、201个原α1螺旋和201个原α2螺旋氨基酸的序列。此外,还测定了与两种前胶原mRNA非编码区相对应的数百个核苷酸的序列。总共对1589个原α1碱基对和1691个原α2碱基对进行了测序,分别约占每个mRNA全长的三分之一。两种前胶原mRNA序列的G+C含量都很高。在测序的螺旋编码区,原α1 mRNA的G+C含量为75%,在C末端编码区为61%,而原α2 mRNA在这些区域的G+C含量分别为60%和48%。二核苷酸序列pCG在两个序列中出现的频率都高于脊椎动物DNA中的正常频率,并且在原α1序列中的出现频率比在原α2序列中高约5倍。鉴于这些序列在X和Y残基50%保守的区域编码重复的Gly-X-Y三肽,螺旋编码区的核苷酸同源性非常有限。这些差异清楚地反映在两种mRNA的偏好密码子使用上。