Yao Yalin, Chen Hao, Wang Jianxin, Wang Yeru
School of Information, Beijing Forestry University, Beijing 100083, China.
Risk Assessment Division 1, China National Center for Food Safety Risk Assessment, Beijing 100022, China.
Microorganisms. 2025 Jul 10;13(7):1635. doi: 10.3390/microorganisms13071635.
Virulence factors (VFs), produced by pathogens, facilitate pathogenic microorganisms to invade, colonize, and damage the host cells. Accurate VF identification advances pathogenic mechanism understanding and provides novel anti-virulence targets. Existing models primarily utilize protein sequence features while overlooking the systematic protein-protein interaction (PPI) information, despite pathogenesis typically resulting from coordinated protein-protein actions. Moreover, a severe imbalance exists between virulence and non-virulence proteins, which causes existing models trained on balanced datasets by sampling to fail in incorporating proteins' inherent distributional characteristics, thus restricting generalization to real-world imbalanced data. To address these challenges, we propose a novel Generative and Contrastive self-supervised learning framework for Virulence Factor identification (GC-VF) that transforms VF identification into an imbalanced node classification task on graphs generated from PPI networks. The framework encompasses two core modules: the generative attribute reconstruction module learns attribute space representations via feature reconstruction, capturing intrinsic data patterns and reducing noise; the local contrastive learning module employs node-level contrastive learning to precisely capture local features and contextual information, avoiding global aggregation losses while ensuring node representations truly reflect inherent characteristics. Comprehensive benchmark experiments demonstrate that GC-VF outperforms baseline methods on naturally imbalanced datasets, exhibiting higher accuracy and stability, as well as providing a potential solution for accurate VF identification.
病原体产生的毒力因子(VFs)有助于致病微生物侵入、定殖并损害宿主细胞。准确识别毒力因子有助于深入了解致病机制,并提供新的抗毒力靶点。现有模型主要利用蛋白质序列特征,却忽略了系统的蛋白质-蛋白质相互作用(PPI)信息,尽管发病机制通常是由协调的蛋白质-蛋白质作用引起的。此外,毒力蛋白和无毒力蛋白之间存在严重失衡,这导致通过采样在平衡数据集上训练的现有模型无法纳入蛋白质的固有分布特征,从而限制了对现实世界不平衡数据的泛化能力。为应对这些挑战,我们提出了一种用于毒力因子识别的新型生成式和对比式自监督学习框架(GC-VF),该框架将毒力因子识别转化为基于PPI网络生成的图上的不平衡节点分类任务。该框架包含两个核心模块:生成式属性重构模块通过特征重构学习属性空间表示,捕捉内在数据模式并减少噪声;局部对比学习模块采用节点级对比学习精确捕捉局部特征和上下文信息,避免全局聚合损失,同时确保节点表示真正反映固有特征。全面的基准实验表明,GC-VF在自然不平衡数据集上优于基线方法,具有更高的准确性和稳定性,为准确识别毒力因子提供了潜在的解决方案。
IEEE J Biomed Health Inform. 2025-1
Front Plant Sci. 2025-6-19
Med Image Anal. 2024-5
Neural Netw. 2025-7-15
Quant Imaging Med Surg. 2025-6-6
Pharmaceuticals (Basel). 2023-11-15
IEEE Trans Pattern Anal Mach Intell. 2023-9