School of Computer Science and Technology, Hainan University, Haikou 570228, China.
School of Computer Science and Technology, Hainan University, Haikou 570228, China.
Int J Biol Macromol. 2024 Oct;277(Pt 3):134317. doi: 10.1016/j.ijbiomac.2024.134317. Epub 2024 Jul 31.
Plant vacuoles, play a crucial role in maintaining cellular stability, adapting to environmental changes, and responding to external pressures. The accurate identification of vacuolar proteins (PVPs) is crucial for understanding the biosynthetic mechanisms of intracellular vacuoles and the adaptive mechanisms of plants. In order to more accurately identify vacuole proteins, this study developed a new predictive model PEL-PVP based on ESM-2. Through this study, the feasibility and effectiveness of using advanced pre-training models and fine-tuning techniques for bioinformatics tasks were demonstrated, providing new methods and ideas for plant vacuolar protein research. In addition, previous datasets for vacuolar proteins were balanced, but imbalance is more closely related to the actual situation. Therefore, this study constructed an imbalanced dataset UB-PVP from the UniProt database,helping the model better adapt to the complexity and uncertainty in real environments, thereby improving the model's generalization ability and practicality. The experimental results show that compared with existing recognition techniques, achieving significant improvements in multiple indicators, with 6.08 %, 13.51 %, 11.9 %, and 5 % improvements in ACC, SP, MCC, and AUC, respectively. The accuracy reaches 94.59 %, significantly higher than the previous best model GraphIdn. This provides an efficient and precise tool for the study of plant vacuole proteins.
植物液泡在维持细胞稳定性、适应环境变化和应对外部压力方面起着至关重要的作用。准确识别液泡蛋白(PVPs)对于理解细胞内液泡的生物合成机制和植物的适应机制至关重要。为了更准确地识别液泡蛋白,本研究基于 ESM-2 开发了一种新的预测模型 PEL-PVP。通过这项研究,展示了使用先进的预训练模型和微调技术进行生物信息学任务的可行性和有效性,为植物液泡蛋白研究提供了新的方法和思路。此外,以前的液泡蛋白数据集是平衡的,但不平衡与实际情况更密切相关。因此,本研究从 UniProt 数据库构建了一个不平衡数据集 UB-PVP,帮助模型更好地适应真实环境中的复杂性和不确定性,从而提高模型的泛化能力和实用性。实验结果表明,与现有识别技术相比,该模型在多个指标上都取得了显著的改进,ACC、SP、MCC 和 AUC 分别提高了 6.08%、13.51%、11.9%和 5%。准确率达到 94.59%,明显高于之前最好的模型 GraphIdn。这为植物液泡蛋白的研究提供了一个高效、精确的工具。