Gonzales Mark Edward M, Ureta Jennifer C, Shrestha Anish M S
Bioinformatics Lab, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila 1004, Philippines.
College of Computer Studies, De La Salle University, Manila 1004, Philippines.
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf016.
Recent computational approaches for predicting phage-host interaction have explored the use of sequence-only protein language models to produce embeddings of phage proteins without manual feature engineering. However, these embeddings do not directly capture protein structure information and structure-informed signals related to host specificity.
We present PHIStruct, a multilayer perceptron that takes in structure-aware embeddings of receptor-binding proteins, generated via the structure-aware protein language model SaProt, and then predicts the host from among the ESKAPEE genera. Compared against recent tools, PHIStruct exhibits the best balance of precision and recall, with the highest and most stable F1 score across a wide range of confidence thresholds and sequence similarity settings. The margin in performance is most pronounced when the sequence similarity between the training and test sets drops below 40%, wherein, at a relatively high-confidence threshold of above 50%, PHIStruct presents a 7%-9% increase in class-averaged F1 over machine learning tools that do not directly incorporate structure information, as well as a 5%-6% increase over BLASTp.
The data and source code for our experiments and analyses are available at https://github.com/bioinfodlsu/PHIStruct.
近期用于预测噬菌体-宿主相互作用的计算方法探索了使用仅基于序列的蛋白质语言模型来生成噬菌体蛋白质的嵌入表示,而无需手动进行特征工程。然而,这些嵌入表示并未直接捕捉蛋白质结构信息以及与宿主特异性相关的结构信息信号。
我们提出了PHIStruct,这是一种多层感知器,它接收通过结构感知蛋白质语言模型SaProt生成的受体结合蛋白的结构感知嵌入表示,然后从ESKAPEE菌属中预测宿主。与近期的工具相比,PHIStruct在精度和召回率之间展现出了最佳平衡,在广泛的置信度阈值和序列相似性设置下具有最高且最稳定的F1分数。当训练集和测试集之间的序列相似性降至40%以下时,性能差异最为明显,其中,在高于50%的相对高置信度阈值下,与未直接纳入结构信息的机器学习工具相比,PHIStruct在类别平均F1上提高了7%-9%,与BLASTp相比提高了5%-6%。
我们实验和分析的数据及源代码可在https://github.com/bioinfodlsu/PHIStruct获取。