Klee Eric W, Carlson Daniel F, Fahrenkrug Scott C, Ekker Stephen C, Ellis Lynda B M
Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN 55455, USA.
Nucleic Acids Res. 2004 Feb 27;32(4):1414-21. doi: 10.1093/nar/gkh286. Print 2004.
The proteins processed by the secretory pathway (secretome) are critical players in the development of multi-cellular eukaryotic organisms but have yet to be comprehensively studied at the genomic level. In this study, we use the Target P algorithm to predict human (13-20% of proteins found in individual datasets) and Fugu (14%) secretomes based on analysis of their nearly complete proteomes. We combine internal processing with prediction software to automate secreted protein identification and overcome one of the major challenges associated with EST data: identification of the minority of clones that encode N-terminally-complete proteins. We discuss the use of these methods to predict secreted proteins in EST-based consensus sequence sets, and we validate these predictions using an assay for cell-free cotranslational translocation. Analysis of TIGR Porcine Gene Index 4.0 as a test dataset resulted in the identification of 352 N-terminally-complete, putative secreted proteins. In functional agreement with our predictions, 34 of 40 (85%) of these cDNAs were verified to be cotranslationally translocated in an in vitro translation system. The methods developed here are specifically designed to accept partial open reading frames and improve secreted protein predictions in eukaryotic transcriptomes, and are valuable for the analysis and annotation of eukaryotic EST databases.
通过分泌途径加工的蛋白质(分泌组)是多细胞真核生物发育中的关键参与者,但尚未在基因组水平上进行全面研究。在本研究中,我们基于对人类(在各个数据集中发现的蛋白质的13 - 20%)和河豚(14%)几乎完整蛋白质组的分析,使用Target P算法预测其分泌组。我们将内部处理与预测软件相结合,以自动识别分泌蛋白,并克服与EST数据相关的主要挑战之一:识别编码N端完整蛋白质的少数克隆。我们讨论了使用这些方法预测基于EST的共有序列集中的分泌蛋白,并使用无细胞共翻译转运测定法验证了这些预测。将TIGR猪基因索引4.0作为测试数据集进行分析,结果鉴定出352个N端完整的推定分泌蛋白。与我们的预测在功能上一致,这些cDNA中有40个中的34个(85%)在体外翻译系统中被验证为共翻译转运。这里开发的方法专门设计用于接受部分开放阅读框,并改进真核转录组中分泌蛋白的预测,对于真核EST数据库的分析和注释很有价值。