Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA.
Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA.
Nucleic Acids Res. 2020 Feb 20;48(3):1029-1042. doi: 10.1093/nar/gkz734.
Traditional annotation of protein-encoding genes relied on assumptions, such as one open reading frame (ORF) encodes one protein and minimal lengths for translated proteins. With the serendipitous discoveries of translated ORFs encoded upstream and downstream of annotated ORFs, from alternative start sites nested within annotated ORFs and from RNAs previously considered noncoding, it is becoming clear that these initial assumptions are incorrect. The findings have led to the realization that genetic information is more densely coded and that the proteome is more complex than previously anticipated. As such, interest in the identification and characterization of the previously ignored 'dark proteome' is increasing, though we note that research in eukaryotes and bacteria has largely progressed in isolation. To bridge this gap and illustrate exciting findings emerging from studies of the dark proteome, we highlight recent advances in both eukaryotic and bacterial cells. We discuss progress in the detection of alternative ORFs as well as in the understanding of functions and the regulation of their expression and posit questions for future work.
传统的蛋白质编码基因注释依赖于一些假设,例如一个开放阅读框(ORF)编码一个蛋白质,以及翻译蛋白质的最小长度。随着在注释的 ORF 上游和下游、从注释的 ORF 内嵌套的替代起始位点以及以前被认为是非编码的 RNA 中意外发现的翻译 ORF,很明显这些初始假设是不正确的。这些发现使人们认识到遗传信息编码更加密集,蛋白质组比以前预期的更加复杂。因此,人们越来越关注鉴定和描述以前被忽视的“暗蛋白质组”,尽管我们注意到真核生物和细菌的研究在很大程度上是孤立进行的。为了弥合这一差距,并展示暗蛋白质组研究中出现的令人兴奋的发现,我们重点介绍了真核生物和细菌细胞中最近的进展。我们讨论了检测替代 ORF 的进展,以及对其表达功能和调控的理解,并提出了未来工作的问题。