Suppr超能文献

使用具有连续和不连续氨基酸信息的XGBoost预测甲型流感病毒与人的蛋白质-蛋白质相互作用

Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information.

作者信息

Li Binghua, Li Xin, Li Xiaoyu, Wang Li, Lu Jun, Wang Jia

机构信息

College of Informatics, Huazhong Agricultural University, Wuhan, China.

Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.

出版信息

PeerJ. 2025 Jan 30;13:e18863. doi: 10.7717/peerj.18863. eCollection 2025.

Abstract

Influenza A virus (IAV) has the characteristics of high infectivity and high pathogenicity, which makes IAV infection a serious public health threat. Identifying protein-protein interactions (PPIs) between IAV and human proteins is beneficial for understanding the mechanism of viral infection and designing antiviral drugs. In this article, we developed a sequence-based machine learning method for predicting PPI. First, we applied a new negative sample construction method to establish a high-quality IAV-human PPI dataset. Then we used conjoint triad (CT) and Moran autocorrelation (Moran) to encode biologically relevant features. The joint consideration utilizing the complementary information between contiguous and discontinuous amino acids provides a more comprehensive description of PPI information. After comparing different machine learning models, the eXtreme Gradient Boosting (XGBoost) model was determined as the final model for the prediction. The model achieved an accuracy of 96.89%, precision of 98.79%, recall of 94.85%, F1-score of 96.78%. Finally, we successfully identified 3,269 potential target proteins. Gene ontology (GO) and pathway analysis showed that these genes were highly associated with IAV infection. The analysis of the PPI network further revealed that the predicted proteins were classified as core proteins within the human protein interaction network. This study may encourage the identification of potential targets for the discovery of more effective anti-influenza drugs. The source codes and datasets are available at https://github.com/HVPPIlab/IVA-Human-PPI/.

摘要

甲型流感病毒(IAV)具有高传染性和高致病性的特点,这使得IAV感染成为严重的公共卫生威胁。识别IAV与人类蛋白质之间的蛋白质-蛋白质相互作用(PPI),有助于理解病毒感染机制并设计抗病毒药物。在本文中,我们开发了一种基于序列的机器学习方法来预测PPI。首先,我们应用一种新的负样本构建方法建立了一个高质量的IAV-人类PPI数据集。然后我们使用三联体(CT)和莫兰自相关(Moran)来编码生物学相关特征。联合考虑连续和不连续氨基酸之间的互补信息,能更全面地描述PPI信息。在比较不同的机器学习模型后,确定极端梯度提升(XGBoost)模型为预测的最终模型。该模型的准确率为96.89%,精确率为98.79%,召回率为94.85%,F1分数为96.78%。最后,我们成功识别出3269个潜在的靶蛋白。基因本体(GO)和通路分析表明,这些基因与IAV感染高度相关。PPI网络分析进一步揭示,预测的蛋白质在人类蛋白质相互作用网络中被归类为核心蛋白质。本研究可能会促进对潜在靶点的识别,以发现更有效的抗流感药物。源代码和数据集可在https://github.com/HVPPIlab/IVA-Human-PPI/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b614/11787804/a8b840d8034a/peerj-13-18863-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验