Ray Sumanta, Alberuni Syed, Schönhuth Alexander
Data Science Unit, The West Bengal National University of Juridical Sciences, Kolkata, West Bengal, India.
Genome Data Science, University of Bielefeld, Bielefeld, Germany.
PLoS One. 2025 Sep 25;20(9):e0332794. doi: 10.1371/journal.pone.0332794. eCollection 2025.
The COVID-19 pandemic has demanded urgent and accelerated action toward developing effective therapeutic strategies. Drug repurposing models (in silico) are in high demand and require accurate and reliable molecular interaction data. While experimentally verified viral-host interaction data (SARS-CoV-2-human interactions published on April 30, 2020) provide an invaluable resource, these datasets include only a limited number of high-confidence interactions. Here, we extend these resources using a deep learning-based multiview graph neural network approach, coupled with optimal transport-based integration. Our comprehensive validation strategy confirms 472 high-confidence predicted interactions between 280 host proteins and 27 SARS-CoV-2 proteins. The proposed model demonstrates robust predictive performance, achieving ROC-AUC scores of 85.9% (PPI network), 83.5% (GO similarity network), and 83.1% (sequence similarity network), with corresponding average precision scores of 86.4%, 82.8%, and 82.3% on independent test sets. Comparative evaluation shows that our multiview approach consistently outperforms conventional single-view and baseline graph learning methods. The model combines features derived from protein sequences, gene ontology terms, and physical interaction information to improve interaction prediction. Furthermore, we systematically map the predicted host factors to FDA-approved drugs and identify several candidates, including lenalidomide and pirfenidone, which have established or emerging roles in COVID-19 therapy. Overall, our framework provides comprehensive and accurate predictions of SARS-CoV-2-host protein interactions and represents a valuable resource for drug repurposing efforts.
2019年冠状病毒病(COVID-19)大流行要求针对开发有效的治疗策略采取紧急且加速的行动。药物重新利用模型(计算机模拟)的需求很高,并且需要准确可靠的分子相互作用数据。虽然经过实验验证的病毒-宿主相互作用数据(2020年4月30日发布的严重急性呼吸综合征冠状病毒2-人类相互作用数据)提供了宝贵的资源,但这些数据集仅包含数量有限的高可信度相互作用。在此,我们使用基于深度学习的多视图图神经网络方法,并结合基于最优传输的整合,扩展了这些资源。我们全面的验证策略确认了280种宿主蛋白与27种严重急性呼吸综合征冠状病毒2蛋白之间的472种高可信度预测相互作用。所提出的模型展示了强大的预测性能,在独立测试集上,蛋白质-蛋白质相互作用(PPI)网络的受试者工作特征曲线下面积(ROC-AUC)得分达到85.9%,基因本体(GO)相似性网络的得分达到83.5%,序列相似性网络的得分达到83.1%,相应的平均精度得分分别为86.4%、82.8%和82.3%。比较评估表明,我们的多视图方法始终优于传统的单视图和基线图学习方法。该模型结合了从蛋白质序列、基因本体术语和物理相互作用信息中衍生的特征,以改善相互作用预测。此外,我们系统地将预测的宿主因子映射到美国食品药品监督管理局(FDA)批准的药物上,并识别出几种候选药物,包括来那度胺和吡非尼酮,它们在COVID-19治疗中已发挥或正在发挥作用。总体而言,我们的框架提供了对严重急性呼吸综合征冠状病毒2-宿主蛋白相互作用的全面且准确的预测,是药物重新利用研究的宝贵资源。