Jiao Shihu, Zou Quan
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China.
Comput Struct Biotechnol J. 2022 Jun 8;20:2921-2927. doi: 10.1016/j.csbj.2022.06.002. eCollection 2022.
Plant vacuoles are the most important organelles for plant growth, development, and defense, and they play an important role in many types of stress responses. An important function of vacuole proteins is the transport of various classes of amino acids, ions, sugars, and other molecules. Accurate identification of vacuole proteins is crucial for revealing their biological functions. Several automatic and rapid computational tools have been proposed for the subcellular localization of proteins. Regrettably, they are not specific for the identification of plant vacuole proteins. To the best of our knowledge, there is only one computational software specifically trained for plant vacuolar proteins. Although its accuracy is acceptable, the prediction performance and stability of this method in practical applications can still be improved. Hence, in this study, a new predictor named iPVP-DRLF was developed to identify plant vacuole proteins specifically and effectively. This prediction software is designed using the light gradient boosting machine (LGBM) algorithm and hybrid features composed of classic sequence features and deep representation learning features. iPVP-DRLF achieved fivefold cross-validation and independent test accuracy values of 88.25 % and 87.16 %, respectively, both outperforming previous state-of-the-art predictors. Moreover, the blind dataset test results also showed that the performance of iPVP-DRLF was significantly better than the existing tools. The results of comparative experiments confirmed that deep representation learning features have an advantage over other classic sequence features in the identification of plant vacuole proteins. We believe that iPVP-DRLF would serve as an effective computational technique for plant vacuole protein prediction and facilitate related future research. The online server is freely accessible at https://lab.malab.cn/~acy/iPVP-DRLF. In addition, the source code and datasets are also accessible at https://github.com/jiaoshihu/iPVP-DRLF.
植物液泡是植物生长、发育和防御过程中最重要的细胞器,它们在多种应激反应中发挥着重要作用。液泡蛋白的一个重要功能是运输各类氨基酸、离子、糖类和其他分子。准确鉴定液泡蛋白对于揭示其生物学功能至关重要。已经提出了几种自动且快速的计算工具用于蛋白质的亚细胞定位。遗憾的是,它们并非专门用于鉴定植物液泡蛋白。据我们所知,仅有一款专门针对植物液泡蛋白进行训练的计算软件。尽管其准确性尚可,但该方法在实际应用中的预测性能和稳定性仍可提高。因此,在本研究中,开发了一种名为iPVP-DRLF的新型预测工具,以专门且有效地鉴定植物液泡蛋白。该预测软件是使用轻梯度提升机(LGBM)算法以及由经典序列特征和深度表征学习特征组成的混合特征设计而成。iPVP-DRLF在五折交叉验证和独立测试中的准确率分别达到了88.25%和87.16%,均优于先前的最先进预测工具。此外,盲数据集测试结果也表明iPVP-DRLF的性能明显优于现有工具。对比实验结果证实,在植物液泡蛋白的鉴定中,深度表征学习特征比其他经典序列特征更具优势。我们相信iPVP-DRLF将成为一种有效的植物液泡蛋白预测计算技术,并推动未来相关研究。可通过https://lab.malab.cn/~acy/iPVP-DRLF免费访问在线服务器。此外,源代码和数据集也可在https://github.com/jiaoshihu/iPVP-DRLF获取。