Suppr超能文献

基于原型监督对比学习的多功能治疗性肽的鉴定

Identification of Multi-functional Therapeutic Peptides Based on Prototypical Supervised Contrastive Learning.

作者信息

Niu Sitong, Fan Henghui, Wang Fei, Yang Xiaomei, Xia Junfeng

机构信息

College of Mathematics and System sciences, Xinjiang University, Urumqi, 830046, Xinjiang, China.

Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.

出版信息

Interdiscip Sci. 2025 Jun;17(2):332-343. doi: 10.1007/s12539-024-00674-3. Epub 2024 Dec 23.

Abstract

High-throughput sequencing has exponentially increased peptide sequences, necessitating a computational method to identify multi-functional therapeutic peptides (MFTP) from their sequences. However, existing computational methods are challenged by class imbalance, particularly in learning effective sequence representations. To address this, we propose PSCFA, a prototypical supervised contrastive learning with a feature augmentation method for MFTP prediction. We employ a two-stage training scheme to train the feature extractor and the classifier respectively, underpinned by the principle that better feature representation boosts classification accuracy. In the first stage, we utilize a prototypical supervised contrastive learning strategy to enhance the uniformity of feature space distribution, ensuring that the characteristics of samples within the same category are tightly clustered while those from different categories are more dispersed. In the second stage, a feature augmentation strategy that focuses on infrequent labels (tail labels) is used to refine the learning process of the classifier. We use a prototype-based variational autoencoder to capture semantic links among common labels (head labels) and their prototypes. This knowledge is then transferred to tail labels, generating enhanced features for classifier training. The experiments prove that the PSCFA method significantly outperforms existing methods for MFTP prediction, making a significant advancement in therapeutic peptide identification.

摘要

高通量测序呈指数级增加了肽序列,因此需要一种计算方法来从其序列中识别多功能治疗性肽(MFTP)。然而,现有的计算方法面临着类别不平衡的挑战,尤其是在学习有效的序列表示方面。为了解决这个问题,我们提出了PSCFA,一种用于MFTP预测的具有特征增强方法的原型监督对比学习方法。我们采用两阶段训练方案分别训练特征提取器和分类器,其依据是更好的特征表示能提高分类准确率这一原则。在第一阶段,我们利用原型监督对比学习策略来增强特征空间分布的均匀性,确保同一类别的样本特征紧密聚类,而不同类别的样本特征更加分散。在第二阶段,使用一种关注罕见标签(尾部标签)的特征增强策略来优化分类器的学习过程。我们使用基于原型的变分自编码器来捕捉常见标签(头部标签)与其原型之间的语义联系。然后将这些知识转移到尾部标签,为分类器训练生成增强特征。实验证明,PSCFA方法在MFTP预测方面显著优于现有方法,在治疗性肽识别方面取得了重大进展。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验