Yue Yishan, Fan Henghui, Zhao Jianping, Xia Junfeng
College of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang, China.
Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China.
PeerJ Comput Sci. 2025 Mar 18;11:e2733. doi: 10.7717/peerj-cs.2733. eCollection 2025.
Plant miRNA encoded peptides (miPEPs), which are short peptides derived from small open reading frames within primary miRNAs, play a crucial role in regulating diverse plant traits. Plant miPEPs identification is challenging due to limitations in the available number of known miPEPs for training. Existing prediction methods rely on manually encoded features, including miPEPPred-FRL, to infer plant miPEPs. Recent advances in deep learning modeling of protein sequences provide an opportunity to improve the representation of key features, leveraging large datasets of protein sequences. In this study, we propose an accurate prediction model, called pLM4PEP, which integrates ESM2 peptide embedding with machine learning methods. Our model not only demonstrates precise identification capabilities for plant miPEPs, but also achieves remarkable results across diverse datasets that include other bioactive peptides. The source codes, datasets of pLM4PEP are available at https://github.com/xialab-ahu/pLM4PEP.
植物微小RNA编码肽(miPEPs)是从初级微小RNA中的小开放阅读框衍生而来的短肽,在调节多种植物性状中起着关键作用。由于用于训练的已知miPEPs数量有限,植物miPEPs的鉴定具有挑战性。现有的预测方法依赖于手动编码的特征,包括miPEPPred-FRL,来推断植物miPEPs。蛋白质序列深度学习建模的最新进展为利用大量蛋白质序列数据集改进关键特征的表示提供了机会。在本研究中,我们提出了一种准确的预测模型,称为pLM4PEP,它将ESM2肽嵌入与机器学习方法相结合。我们的模型不仅展示了对植物miPEPs的精确识别能力,而且在包括其他生物活性肽的不同数据集上也取得了显著成果。pLM4PEP的源代码和数据集可在https://github.com/xialab-ahu/pLM4PEP获取。