Li Qian-Ru, Carvunis Anne-Ruxandra, Yu Haiyuan, Han Jing-Dong J, Zhong Quan, Simonis Nicolas, Tam Stanley, Hao Tong, Klitgord Niels J, Dupuy Denis, Mou Danny, Wapinski Ilan, Regev Aviv, Hill David E, Cusick Michael E, Vidal Marc
Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA.
Genome Res. 2008 Aug;18(8):1294-303. doi: 10.1101/gr.076661.108. Epub 2008 May 23.
Accurately defining the coding potential of an organism, i.e., all protein-encoding open reading frames (ORFs) or "ORFeome," is a prerequisite to fully understand its biology. ORFeome annotation involves iterative computational predictions from genome sequences combined with experimental verifications. Here we reexamine a set of Saccharomyces cerevisiae "orphan" ORFs recently removed from the original ORFeome annotation due to lack of conservation across evolutionarily related yeast species. We show that many orphan ORFs produce detectable transcripts and/or translated products in various functional genomics and proteomics experiments. By combining a naïve Bayes model that predicts the likelihood of an ORF to encode a functional product with experimental verification of strand-specific transcripts, we argue that orphan ORFs should still remain candidates for functional ORFs. In support of this model, interstrain intraspecies genome sequence variation is lower across orphan ORFs than in intergenic regions, indicating that orphan ORFs endure functional constraints and resist deleterious mutations. We conclude that ORFs should be evaluated based on multiple levels of evidence and not be removed from ORFeome annotation solely based on low sequence conservation in other species. Rather, such ORFs might be important for micro-evolutionary divergence between species.
准确界定生物体的编码潜能,即所有编码蛋白质的开放阅读框(ORF)或“ORF组”,是全面理解其生物学特性的先决条件。ORF组注释涉及从基因组序列进行迭代计算预测并结合实验验证。在这里,我们重新审视了一组酿酒酵母“孤儿”ORF,这些ORF最近因在进化相关酵母物种中缺乏保守性而从原始ORF组注释中被剔除。我们表明,许多孤儿ORF在各种功能基因组学和蛋白质组学实验中产生可检测的转录本和/或翻译产物。通过结合预测ORF编码功能产物可能性的朴素贝叶斯模型与链特异性转录本的实验验证,我们认为孤儿ORF仍应作为功能性ORF的候选者。为支持该模型,跨孤儿ORF的种内菌株间基因组序列变异低于基因间区域,表明孤儿ORF承受功能限制并抵抗有害突变。我们得出结论,应基于多个证据层面评估ORF,而不应仅基于在其他物种中的低序列保守性就从ORF组注释中剔除。相反,此类ORF可能对物种间的微进化差异很重要。