Georg-August University of Goettingen, Institute for Microbiology, Department of Bioinformatics, D-37077 Goettingen, Germany.
Plant Cell. 2011 Apr;23(4):1556-72. doi: 10.1105/tpc.111.084095. Epub 2011 Apr 12.
In the postgenomic era, accurate prediction tools are essential for identification of the proteomes of cell organelles. Prediction methods have been developed for peroxisome-targeted proteins in animals and fungi but are missing specifically for plants. For development of a predictor for plant proteins carrying peroxisome targeting signals type 1 (PTS1), we assembled more than 2500 homologous plant sequences, mainly from EST databases. We applied a discriminative machine learning approach to derive two different prediction methods, both of which showed high prediction accuracy and recognized specific targeting-enhancing patterns in the regions upstream of the PTS1 tripeptides. Upon application of these methods to the Arabidopsis thaliana genome, 392 gene models were predicted to be peroxisome targeted. These predictions were extensively tested in vivo, resulting in a high experimental verification rate of Arabidopsis proteins previously not known to be peroxisomal. The prediction methods were able to correctly infer novel PTS1 tripeptides, which even included novel residues. Twenty-three newly predicted PTS1 tripeptides were experimentally confirmed, and a high variability of the plant PTS1 motif was discovered. These prediction methods will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants.
在后基因组时代,准确的预测工具对于鉴定细胞器官的蛋白质组至关重要。已经开发出了用于动物和真菌的过氧化物酶体靶向蛋白的预测方法,但专门针对植物的方法却还没有。为了开发用于预测携带过氧化物酶体靶向信号类型 1(PTS1)的植物蛋白的预测器,我们组装了 2500 多个同源植物序列,主要来自 EST 数据库。我们应用了一种有区别的机器学习方法来得出两种不同的预测方法,这两种方法都显示出了很高的预测准确性,并识别出了 PTS1 三肽上游区域的特定靶向增强模式。将这些方法应用于拟南芥基因组时,预测了 392 个基因模型为过氧化物酶体靶向。这些预测在体内进行了广泛的测试,导致先前未知为过氧化物酶体的拟南芥蛋白的实验验证率很高。预测方法能够正确推断新的 PTS1 三肽,甚至包括新的残基。23 个新预测的 PTS1 三肽被实验证实,并且发现了植物 PTS1 基序的高度可变性。这些预测方法将有助于鉴定低丰度和应激诱导的过氧化物酶体蛋白,并定义拟南芥和农业上重要的作物植物的整个过氧化物酶体蛋白质组。