Casadei Raffaella, Strippoli Pierluigi, D'Addabbo Pietro, Canaider Silvia, Lenzi Luca, Vitale Lorenza, Giannone Sandra, Frabetti Flavia, Facchin Federica, Carinci Paolo, Zannotti Maria
Center for Research into Molecular Genetics Fondazione CARISBO, Institute of Histology and General Embryology, University of Bologna, Via Belmeloro, 8-40126 Bologna, Italy.
Gene. 2003 Dec 4;321:185-93. doi: 10.1016/s0378-1119(03)00835-7.
The amino acid sequence of gene products is routinely deduced from the nucleotide sequence of the relative cloned cDNA, according to the rules for recognition of start codon (first-AUG rule, optimal sequence context) and the genetic code. From this prediction stem most subsequent types of product analysis, although all standard methods for cDNA cloning are affected by a potential inability to effectively clone the 5' region of mRNA. Revision by bioinformatics and cloning methods of 109 known genes located on human chromosome 21 (HC 21) shows that 60 mRNAs lack any in-frame stop upstream of the first-AUG, and that in five cases (DSCR1, KIAA0184, KIAA0539, SON, and TFF3) the coding region at the 5' end was incompletely characterized in the original descriptions. We describe the respective consequences for genomic annotation, domain and ortholog identification, and functional experiments design. We have also analyzed the sequences of 13,124 human mRNAs (RefSeq databank), discovering that in 6448 cases (49%), an in-frame stop codon is present upstream of the initiation codon, while in the other 6676 mRNAs (51%), identification of additional bases at the mRNA 5' region could well reveal some new upstream in-frame AUG codons in the optimal context. Proportionally to the HC 21 data, about 550 known human genes might thus be affected by this 5' end mRNA artifact.
基因产物的氨基酸序列通常是根据相对克隆的cDNA的核苷酸序列,按照起始密码子识别规则(首个AUG规则、最佳序列上下文)和遗传密码推导出来的。后续大多数类型的产物分析都基于此预测,尽管所有用于cDNA克隆的标准方法都可能受到无法有效克隆mRNA 5'区域的潜在影响。通过生物信息学和克隆方法对位于人类21号染色体(HC 21)上的109个已知基因进行修订后发现,60种mRNA在首个AUG上游没有任何框内终止密码子,并且在5种情况下(DSCR1、KIAA0184、KIAA0539、SON和TFF3),原始描述中5'端的编码区域特征不完全。我们描述了对基因组注释、结构域和直系同源物鉴定以及功能实验设计的各自影响。我们还分析了13124个人类mRNA(RefSeq数据库)的序列,发现6448例(49%)在起始密码子上游存在框内终止密码子,而在其他6676种mRNA(51%)中,在mRNA 5'区域鉴定额外的碱基很可能会揭示一些处于最佳上下文中的新的上游框内AUG密码子。与HC 21数据成比例,约550个人类已知基因可能因此受到这种5'端mRNA假象的影响。