Suppr超能文献

解决莱茵衣藻基因组中的遗留问题:大量上游开放阅读框的功能意义

Tying Down Loose Ends in the Chlamydomonas Genome: Functional Significance of Abundant Upstream Open Reading Frames.

作者信息

Cross Frederick R

机构信息

The Rockefeller University, New York, New York 10065

出版信息

G3 (Bethesda). 2015 Dec 23;6(2):435-46. doi: 10.1534/g3.115.023119.

Abstract

The Chlamydomonas genome has been sequenced, assembled, and annotated to produce a rich resource for genetics and molecular biology in this well-studied model organism. The annotated genome is very rich in open reading frames upstream of the annotated coding sequence ('uORFs'): almost three quarters of the assigned transcripts have at least one uORF, and frequently more than one. This is problematic with respect to the standard 'scanning' model for eukaryotic translation initiation. These uORFs can be grouped into three classes: class 1, initiating in-frame with the coding sequence (CDS) (thus providing a potential in-frame N-terminal extension); class 2, initiating in the 5' untranslated sequences (5UT) and terminating out-of-frame in the CDS; and class 3, initiating and terminating within the 5UT. Multiple bioinformatics criteria (including analysis of Kozak consensus sequence agreement and BLASTP comparisons to the closely related Volvox genome, and statistical comparison to cds and to random sequence controls) indicate that of ∼4000 class 1 uORFs, approximately half are likely in vivo translation initiation sites. The proposed resulting N-terminal extensions in many cases will sharply alter the predicted biochemical properties of the encoded proteins. These results suggest significant modifications in ∼2000 of the ∼20,000 transcript models with respect to translation initiation and encoded peptides. In contrast, class 2 uORFs may be subject to purifying selection, and the existent ones (surviving selection) are likely inefficiently translated. Class 3 uORFs are found in more than half of transcripts, frequently multiple times per transcript; however, they are remarkably similar to random sequence expectations with respect to size, number, and composition, and therefore may in most cases be selectively neutral.

摘要

莱茵衣藻的基因组已完成测序、组装和注释,为这个经过充分研究的模式生物的遗传学和分子生物学提供了丰富的资源。注释基因组在注释编码序列(“上游开放阅读框”,uORFs)上游的开放阅读框非常丰富:几乎四分之三的指定转录本至少有一个uORF,而且通常不止一个。这对于真核生物翻译起始的标准“扫描”模型来说是个问题。这些uORFs可分为三类:第1类,与编码序列(CDS)框内起始(从而提供潜在的框内N端延伸);第2类,在5'非翻译序列(5UT)中起始并在CDS中框外终止;第3类,在5UT内起始和终止。多个生物信息学标准(包括对Kozak共有序列一致性的分析、与密切相关的团藻基因组的BLASTP比较,以及与cds和随机序列对照的统计比较)表明,在约4000个第1类uORFs中,大约一半可能是体内翻译起始位点。在许多情况下,预测的N端延伸将显著改变编码蛋白的预测生化特性。这些结果表明,在约20,000个转录本模型中,约2000个在翻译起始和编码肽方面有显著修饰。相比之下,第2类uORFs可能受到纯化选择,而现存的(经过选择存活下来的)可能翻译效率低下。第3类uORFs在一半以上的转录本中出现,每个转录本经常出现多次;然而,它们在大小、数量和组成方面与随机序列预期非常相似,因此在大多数情况下可能是选择性中性的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75c2/4751561/91ed5a136af2/435f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验