Suppr超能文献

PHIStruct:使用结构感知蛋白质嵌入在低序列相似性设置下改进噬菌体-宿主相互作用预测。

PHIStruct: improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings.

作者信息

Gonzales Mark Edward M, Ureta Jennifer C, Shrestha Anish M S

机构信息

Bioinformatics Lab, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila 1004, Philippines.

College of Computer Studies, De La Salle University, Manila 1004, Philippines.

出版信息

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf016.

Abstract

MOTIVATION

Recent computational approaches for predicting phage-host interaction have explored the use of sequence-only protein language models to produce embeddings of phage proteins without manual feature engineering. However, these embeddings do not directly capture protein structure information and structure-informed signals related to host specificity.

RESULTS

We present PHIStruct, a multilayer perceptron that takes in structure-aware embeddings of receptor-binding proteins, generated via the structure-aware protein language model SaProt, and then predicts the host from among the ESKAPEE genera. Compared against recent tools, PHIStruct exhibits the best balance of precision and recall, with the highest and most stable F1 score across a wide range of confidence thresholds and sequence similarity settings. The margin in performance is most pronounced when the sequence similarity between the training and test sets drops below 40%, wherein, at a relatively high-confidence threshold of above 50%, PHIStruct presents a 7%-9% increase in class-averaged F1 over machine learning tools that do not directly incorporate structure information, as well as a 5%-6% increase over BLASTp.

AVAILABILITY AND IMPLEMENTATION

The data and source code for our experiments and analyses are available at https://github.com/bioinfodlsu/PHIStruct.

摘要

动机

近期用于预测噬菌体-宿主相互作用的计算方法探索了使用仅基于序列的蛋白质语言模型来生成噬菌体蛋白质的嵌入表示,而无需手动进行特征工程。然而,这些嵌入表示并未直接捕捉蛋白质结构信息以及与宿主特异性相关的结构信息信号。

结果

我们提出了PHIStruct,这是一种多层感知器,它接收通过结构感知蛋白质语言模型SaProt生成的受体结合蛋白的结构感知嵌入表示,然后从ESKAPEE菌属中预测宿主。与近期的工具相比,PHIStruct在精度和召回率之间展现出了最佳平衡,在广泛的置信度阈值和序列相似性设置下具有最高且最稳定的F1分数。当训练集和测试集之间的序列相似性降至40%以下时,性能差异最为明显,其中,在高于50%的相对高置信度阈值下,与未直接纳入结构信息的机器学习工具相比,PHIStruct在类别平均F1上提高了7%-9%,与BLASTp相比提高了5%-6%。

可用性和实现方式

我们实验和分析的数据及源代码可在https://github.com/bioinfodlsu/PHIStruct获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cbc/11783280/171f806ba809/btaf016f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验