Frith Martin C, Forrest Alistair R, Nourbakhsh Ehsan, Pang Ken C, Kai Chikatoshi, Kawai Jun, Carninci Piero, Hayashizaki Yoshihide, Bailey Timothy L, Grimmond Sean M
Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Japan.
PLoS Genet. 2006 Apr;2(4):e52. doi: 10.1371/journal.pgen.0020052. Epub 2006 Apr 28.
Short proteins play key roles in cell signalling and other processes, but their abundance in the mammalian proteome is unknown. Current catalogues of mammalian proteins exhibit an artefactual discontinuity at a length of 100 aa, so that protein abundance peaks just above this length and falls off sharply below it. To clarify the abundance of short proteins, we identify proteins in the FANTOM collection of mouse cDNAs by analysing synonymous and non-synonymous substitutions with the computer program CRITICA. This analysis confirms that there is no real discontinuity at length 100. Roughly 10% of mouse proteins are shorter than 100 aa, although the majority of these are variants of proteins longer than 100 aa. We identify many novel short proteins, including a "dark matter" subset containing ones that lack detectable homology to other known proteins. Translation assays confirm that some of these novel proteins can be translated and localised to the secretory pathway.
短蛋白质在细胞信号传导及其他过程中发挥着关键作用,但其在哺乳动物蛋白质组中的丰度尚不清楚。目前的哺乳动物蛋白质目录在长度为100个氨基酸处呈现出人为的不连续性,以至于蛋白质丰度在略高于此长度时达到峰值,而在低于此长度时则急剧下降。为了阐明短蛋白质的丰度,我们通过使用计算机程序CRITICA分析同义替换和非同义替换,在小鼠cDNA的FANTOM集合中鉴定蛋白质。该分析证实,在长度100处不存在真正的不连续性。大约10%的小鼠蛋白质短于100个氨基酸,尽管其中大多数是长于100个氨基酸的蛋白质的变体。我们鉴定出许多新型短蛋白质,包括一个“暗物质”子集,其中包含与其他已知蛋白质缺乏可检测同源性的蛋白质。翻译实验证实,其中一些新型蛋白质可以被翻译并定位于分泌途径。