Wenzhou University of Technology, 325000 Wenzhou, China.
Central South University, 410083 Changsha, China.
J Chem Inf Model. 2024 Apr 8;64(7):2798-2806. doi: 10.1021/acs.jcim.3c00868. Epub 2023 Aug 29.
Plant small secretory peptides (SSPs) play an important role in the regulation of biological processes in plants. Accurately predicting SSPs enables efficient exploration of their functions. Traditional experimental verification methods are very reliable and accurate, but they require expensive equipment and a lot of time. The method of machine learning speeds up the prediction process of SSPs, but the instability of feature extraction will also lead to further limitations of this type of method. Therefore, this paper proposes a new feature-correction-based model for SSP recognition in plants, abbreviated as SE-SSP. The model mainly includes the following three advantages: First, the use of transformer encoders can better reveal implicit features. Second, design a feature correction module suitable for sequences, named 2-D SENET, to adaptively adjust the features to obtain a more robust feature representation. Third, stack multiple linear modules to further dig out the deep information on the sample. At the same time, the training based on a contrastive learning strategy can alleviate the problem of sparse samples. We construct experiments on publicly available data sets, and the results verify that our model shows an excellent performance. The proposed model can be used as a convenient and effective SSP prediction tool in the future. Our data and code are publicly available at https://github.com/wrab12/SE-SSP/.
植物小分子分泌肽 (SSP) 在植物的生物过程调节中发挥着重要作用。准确预测 SSP 能够有效地探索其功能。传统的实验验证方法非常可靠和准确,但需要昂贵的设备和大量的时间。机器学习方法加快了 SSP 预测的进程,但特征提取的不稳定性也会进一步限制这种方法的应用。因此,本文提出了一种新的基于特征校正的植物 SSP 识别模型,简称 SE-SSP。该模型主要具有以下三个优点:首先,使用 Transformer 编码器可以更好地揭示隐含特征。其次,设计了一个适合序列的特征校正模块,称为 2-D SENET,自适应调整特征,以获得更稳健的特征表示。第三,堆叠多个线性模块,进一步挖掘样本中的深层信息。同时,基于对比学习策略的训练可以缓解样本稀疏的问题。我们在公开可用的数据集上进行了实验,结果验证了我们的模型具有出色的性能。该模型可作为未来方便有效的 SSP 预测工具。我们的数据和代码可在 https://github.com/wrab12/SE-SSP/ 上获取。