Hurowitz Evan H, Brown Patrick O
Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305-5307, USA.
Genome Biol. 2003;5(1):R2. doi: 10.1186/gb-2003-5-1-r2. Epub 2003 Dec 22.
Although the protein-coding sequences in the Saccharomyces cerevisiae genome have been studied and annotated extensively, much less is known about the extent and characteristics of the untranslated regions of yeast mRNAs.
We developed a 'Virtual Northern' method, using DNA microarrays for genome-wide systematic analysis of mRNA lengths. We used this method to measure mRNAs corresponding to 84% of the annotated open reading frames (ORFs) in the S. cerevisiae genome, with high precision and accuracy (measurement errors +/- 6-7%). We found a close linear relationship between mRNA lengths and the lengths of known or predicted translated sequences; mRNAs were typically around 300 nucleotides longer than the translated sequences. Analysis of genes deviating from that relationship identified ORFs with annotation errors, ORFs that appear not to be bona fide genes, and potentially novel genes. Interestingly, we found that systematic differences in the total length of the untranslated sequences in mRNAs were related to the functions of the encoded proteins.
The Virtual Northern method provides a practical and efficient method for genome-scale analysis of transcript lengths. Approximately 12-15% of the yeast genome is represented in untranslated sequences of mRNAs. A systematic relationship between the lengths of the untranslated regions in yeast mRNAs and the functions of the proteins they encode may point to an important regulatory role for these sequences.
尽管酿酒酵母基因组中的蛋白质编码序列已得到广泛研究和注释,但对于酵母mRNA非翻译区的范围和特征却知之甚少。
我们开发了一种“虚拟Northern”方法,利用DNA微阵列对mRNA长度进行全基因组系统分析。我们使用该方法高精度、准确地测量了酿酒酵母基因组中84%注释开放阅读框(ORF)对应的mRNA(测量误差为±6-7%)。我们发现mRNA长度与已知或预测的翻译序列长度之间存在密切的线性关系;mRNA通常比翻译序列长约300个核苷酸。对偏离该关系的基因进行分析,鉴定出存在注释错误的ORF、似乎不是真正基因的ORF以及潜在的新基因。有趣的是,我们发现mRNA中非翻译序列总长度的系统差异与所编码蛋白质的功能有关。
“虚拟Northern”方法为转录本长度的基因组规模分析提供了一种实用且高效的方法。酵母基因组中约12-15%由mRNA的非翻译序列代表。酵母mRNA非翻译区长度与它们所编码蛋白质功能之间的系统关系可能表明这些序列具有重要的调控作用。