• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PHIStruct:使用结构感知蛋白质嵌入在低序列相似性设置下改进噬菌体-宿主相互作用预测。

PHIStruct: improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings.

作者信息

Gonzales Mark Edward M, Ureta Jennifer C, Shrestha Anish M S

机构信息

Bioinformatics Lab, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila 1004, Philippines.

College of Computer Studies, De La Salle University, Manila 1004, Philippines.

出版信息

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf016.

DOI:10.1093/bioinformatics/btaf016
PMID:39804673
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11783280/
Abstract

MOTIVATION

Recent computational approaches for predicting phage-host interaction have explored the use of sequence-only protein language models to produce embeddings of phage proteins without manual feature engineering. However, these embeddings do not directly capture protein structure information and structure-informed signals related to host specificity.

RESULTS

We present PHIStruct, a multilayer perceptron that takes in structure-aware embeddings of receptor-binding proteins, generated via the structure-aware protein language model SaProt, and then predicts the host from among the ESKAPEE genera. Compared against recent tools, PHIStruct exhibits the best balance of precision and recall, with the highest and most stable F1 score across a wide range of confidence thresholds and sequence similarity settings. The margin in performance is most pronounced when the sequence similarity between the training and test sets drops below 40%, wherein, at a relatively high-confidence threshold of above 50%, PHIStruct presents a 7%-9% increase in class-averaged F1 over machine learning tools that do not directly incorporate structure information, as well as a 5%-6% increase over BLASTp.

AVAILABILITY AND IMPLEMENTATION

The data and source code for our experiments and analyses are available at https://github.com/bioinfodlsu/PHIStruct.

摘要

动机

近期用于预测噬菌体-宿主相互作用的计算方法探索了使用仅基于序列的蛋白质语言模型来生成噬菌体蛋白质的嵌入表示,而无需手动进行特征工程。然而,这些嵌入表示并未直接捕捉蛋白质结构信息以及与宿主特异性相关的结构信息信号。

结果

我们提出了PHIStruct,这是一种多层感知器,它接收通过结构感知蛋白质语言模型SaProt生成的受体结合蛋白的结构感知嵌入表示,然后从ESKAPEE菌属中预测宿主。与近期的工具相比,PHIStruct在精度和召回率之间展现出了最佳平衡,在广泛的置信度阈值和序列相似性设置下具有最高且最稳定的F1分数。当训练集和测试集之间的序列相似性降至40%以下时,性能差异最为明显,其中,在高于50%的相对高置信度阈值下,与未直接纳入结构信息的机器学习工具相比,PHIStruct在类别平均F1上提高了7%-9%,与BLASTp相比提高了5%-6%。

可用性和实现方式

我们实验和分析的数据及源代码可在https://github.com/bioinfodlsu/PHIStruct获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/4b486cb122dd/btaf016f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/171f806ba809/btaf016f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/3f8a6efba801/btaf016f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/78d0d33e8870/btaf016f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/efb771726d02/btaf016f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/2985c5e3cefb/btaf016f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/5baa8a4694c5/btaf016f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/22b9fa7d1830/btaf016f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/4b486cb122dd/btaf016f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/171f806ba809/btaf016f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/3f8a6efba801/btaf016f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/78d0d33e8870/btaf016f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/efb771726d02/btaf016f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/2985c5e3cefb/btaf016f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/5baa8a4694c5/btaf016f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/22b9fa7d1830/btaf016f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/4b486cb122dd/btaf016f8.jpg

相似文献

1
PHIStruct: improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings.PHIStruct:使用结构感知蛋白质嵌入在低序列相似性设置下改进噬菌体-宿主相互作用预测。
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf016.
2
Protein embeddings improve phage-host interaction prediction.蛋白质嵌入可提高噬菌体-宿主相互作用预测。
PLoS One. 2023 Jul 24;18(7):e0289030. doi: 10.1371/journal.pone.0289030. eCollection 2023.
3
CaLMPhosKAN: prediction of general phosphorylation sites in proteins via fusion of codon aware embeddings with amino acid aware embeddings and wavelet-based Kolmogorov-Arnold network.CaLMPhosKAN:通过将密码子感知嵌入与氨基酸感知嵌入以及基于小波的柯尔莫哥洛夫 - 阿诺德网络融合来预测蛋白质中的一般磷酸化位点
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf124.
4
Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation.Meta-iPVP:一种基于序列的元预测器,用于使用有效的特征表示来改进噬菌体衣壳蛋白的预测。
J Comput Aided Mol Des. 2020 Oct;34(10):1105-1116. doi: 10.1007/s10822-020-00323-z. Epub 2020 Jun 16.
5
Learned protein embeddings for machine learning.机器学习的深度学习蛋白质嵌入。
Bioinformatics. 2018 Aug 1;34(15):2642-2648. doi: 10.1093/bioinformatics/bty178.
6
Protein language models can capture protein quaternary state.蛋白质语言模型可以捕捉蛋白质四级结构。
BMC Bioinformatics. 2023 Nov 14;24(1):433. doi: 10.1186/s12859-023-05549-w.
7
SAFPred: synteny-aware gene function prediction for bacteria using protein embeddings.SAFPred:利用蛋白质嵌入进行细菌的基因功能预测
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae328.
8
Assessing the role of evolutionary information for enhancing protein language model embeddings.评估进化信息在增强蛋白质语言模型嵌入中的作用。
Sci Rep. 2024 Sep 5;14(1):20692. doi: 10.1038/s41598-024-71783-8.
9
BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.生物概念向量:在大规模上创建和评估基于文献的生物医学概念嵌入。
PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.
10
PhageDPO: A machine-learning based computational framework for identifying phage depolymerases.噬菌体解聚酶预测工具(PhageDPO):一种基于机器学习的用于识别噬菌体解聚酶的计算框架。
Comput Biol Med. 2025 Apr;188:109836. doi: 10.1016/j.compbiomed.2025.109836. Epub 2025 Feb 13.

引用本文的文献

1
Microbial Technologies Enhanced by Artificial Intelligence for Healthcare Applications.用于医疗保健应用的人工智能增强微生物技术。
Microb Biotechnol. 2025 Mar;18(3):e70131. doi: 10.1111/1751-7915.70131.

本文引用的文献

1
Bilingual language model for protein sequence and structure.用于蛋白质序列和结构的双语语言模型。
NAR Genom Bioinform. 2024 Nov 15;6(4):lqae150. doi: 10.1093/nargab/lqae150. eCollection 2024 Dec.
2
Prediction of antibiotic resistance mechanisms using a protein language model.利用蛋白质语言模型预测抗生素耐药机制。
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae550.
3
Prediction of Klebsiella phage-host specificity at the strain level.预测克雷伯氏菌噬菌体在菌株水平上的宿主特异性。
Nat Commun. 2024 May 22;15(1):4355. doi: 10.1038/s41467-024-48675-6.
4
Large-scale genomic survey with deep learning-based method reveals strain-level phage specificity determinants.基于深度学习的大规模基因组调查揭示了菌株水平噬菌体特异性决定因素。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae017.
5
Advances in phage-host interaction prediction: in silico method enhances the development of phage therapies.噬菌体-宿主相互作用预测的进展:计算方法促进噬菌体疗法的发展。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae117.
6
Exploiting lung adaptation and phage steering to clear pan-resistant Pseudomonas aeruginosa infections in vivo.利用肺适应性和噬菌体导向清除体内泛耐药铜绿假单胞菌感染。
Nat Commun. 2024 Feb 20;15(1):1547. doi: 10.1038/s41467-024-45785-z.
7
Efficient screening of adsorbed receptors for phage LP31 and identification of receptor-binding protein.噬菌体LP31吸附受体的高效筛选及受体结合蛋白的鉴定
Microbiol Spectr. 2023 Sep 20;11(5):e0260423. doi: 10.1128/spectrum.02604-23.
8
Protein embeddings improve phage-host interaction prediction.蛋白质嵌入可提高噬菌体-宿主相互作用预测。
PLoS One. 2023 Jul 24;18(7):e0289030. doi: 10.1371/journal.pone.0289030. eCollection 2023.
9
Revisiting ESKAPE Pathogens: virulence, resistance, and combating strategies focusing on quorum sensing.重新审视 ESKAPE 病原体:关注群体感应的毒力、耐药性和防治策略。
Front Cell Infect Microbiol. 2023 Jun 29;13:1159798. doi: 10.3389/fcimb.2023.1159798. eCollection 2023.
10
Antimicrobial Resistance (AMR).抗微生物药物耐药性(AMR)。
Br J Biomed Sci. 2023 Jun 28;80:11387. doi: 10.3389/bjbs.2023.11387. eCollection 2023.