Wang Jue, Wang Yejun, Gao Caiji, Jiang Liwen, Guo Dianjing
School of Life Sciences and State Key Lab of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong.
Department of Medical Genetics, Shenzhen University Health Science Center, Shenzhen, China.
PLoS One. 2017 Jan 3;12(1):e0168912. doi: 10.1371/journal.pone.0168912. eCollection 2017.
Well-defined motifs often make it easy to investigate protein function and localization. In plants, peroxisomal proteins are guided to peroxisomes mainly by a conserved type 1 (PTS1) or type 2 (PTS2) targeting signal, and the PTS1 motif is commonly used for peroxisome targeting protein prediction. Currently computational prediction of peroxisome targeted PTS1-type proteins are mostly based on the 3 amino acids PTS1 motif and the adjacent sequence which is less than 14 amino acid residue in length. The potential contribution of the adjacent sequences beyond this short region has never been well investigated in plants. In this work, we develop a bi-profile Bayesian SVM method to extract and learn position-based amino acid features for both PTS1 motifs and their extended adjacent sequences in plants. Our proposed model outperformed other implementations with similar applications and achieved the highest accuracy of 93.6% and 92.6% for Arabidosis and other plant species respectively. A large scale analysis for Arabidopsis, Rice, Maize, Potato, Wheat, and Soybean proteome was conducted using the proposed model and a batch of candidate PTS1 proteins were predicted. The DNA segments corresponding to the C-terminal sequences of 9 selected candidates were cloned and transformed into Arabidopsis for experimental validation, and 5 of them demonstrated peroxisome targeting.
明确的基序通常便于研究蛋白质的功能和定位。在植物中,过氧化物酶体蛋白主要通过保守的1型(PTS1)或2型(PTS2)靶向信号被引导至过氧化物酶体,PTS1基序通常用于过氧化物酶体靶向蛋白的预测。目前,过氧化物酶体靶向的PTS1型蛋白的计算预测大多基于3个氨基酸的PTS1基序及其长度小于14个氨基酸残基的相邻序列。在植物中,这个短区域之外的相邻序列的潜在作用从未得到充分研究。在这项工作中,我们开发了一种双轮廓贝叶斯支持向量机方法,以提取和学习植物中PTS1基序及其延伸的相邻序列基于位置的氨基酸特征。我们提出的模型优于其他具有类似应用的方法,在拟南芥和其他植物物种中分别达到了93.6%和92.6%的最高准确率。使用所提出的模型对拟南芥、水稻、玉米、马铃薯、小麦和大豆的蛋白质组进行了大规模分析,并预测了一批候选PTS1蛋白。克隆了与9个选定候选物的C末端序列相对应的DNA片段,并将其转化到拟南芥中进行实验验证,其中5个显示出过氧化物酶体靶向作用。