Madan Sumit, Demina Victoria, Stapf Marcus, Ernst Oliver, Fröhlich Holger
Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757 Sankt Augustin, Germany.
Institute of Computer Science, University of Bonn, 53115 Bonn, Germany.
Patterns (N Y). 2022 Jul 31;3(9):100551. doi: 10.1016/j.patter.2022.100551. eCollection 2022 Sep 9.
Prediction and understanding of virus-host protein-protein interactions (PPIs) have relevance for the development of novel therapeutic interventions. In addition, virus-like particles open novel opportunities to deliver therapeutics to targeted cell types and tissues. Given our incomplete knowledge of PPIs on the one hand and the cost and time associated with experimental procedures on the other, we here propose a deep learning approach to predict virus-host PPIs. Our method (Siamese Tailored deep sequence Embedding of Proteins [STEP]) is based on recent deep protein sequence embedding techniques, which we integrate into a Siamese neural network. After showing the state-of-the-art performance of STEP on external datasets, we apply it to two use cases, severe acute respiratory syndrome coronavirus 2 and John Cunningham polyomavirus, to predict virus-host PPIs. Altogether our work highlights the potential of deep sequence embedding techniques originating from the field of NLP as well as explainable artificial intelligence methods for the analysis of biological sequences.
预测和理解病毒与宿主之间的蛋白质-蛋白质相互作用(PPI)对于开发新型治疗干预措施具有重要意义。此外,病毒样颗粒为将治疗药物递送至靶向细胞类型和组织提供了新的机会。鉴于一方面我们对PPI的了解尚不完整,另一方面实验程序相关的成本和时间问题,我们在此提出一种深度学习方法来预测病毒-宿主PPI。我们的方法(蛋白质的暹罗定制深度序列嵌入[STEP])基于最近的深度蛋白质序列嵌入技术,我们将其集成到暹罗神经网络中。在展示了STEP在外部数据集上的先进性能后,我们将其应用于两个案例,即严重急性呼吸综合征冠状病毒2和约翰·坎宁安多瘤病毒,以预测病毒-宿主PPI。总的来说,我们的工作突出了源自自然语言处理领域的深度序列嵌入技术以及用于生物序列分析的可解释人工智能方法的潜力。