Tsukiyama Sho, Kurata Hiroyuki
Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
Comput Struct Biotechnol J. 2022;20:5564-5573. doi: 10.1016/j.csbj.2022.10.012. Epub 2022 Oct 8.
Viral infections represent a major health concern worldwide. The alarming rate at which SARS-CoV-2 spreads, for example, led to a worldwide pandemic. Viruses incorporate genetic material into the host genome to hijack host cell functions such as the cell cycle and apoptosis. In these viral processes, protein-protein interactions (PPIs) play critical roles. Therefore, the identification of PPIs between humans and viruses is crucial for understanding the infection mechanism and host immune responses to viral infections and for discovering effective drugs. Experimental methods including mass spectrometry-based proteomics and yeast two-hybrid assays are widely used to identify human-virus PPIs, but these experimental methods are time-consuming, expensive, and laborious. To overcome this problem, we developed a novel computational predictor, named cross-attention PHV, by implementing two key technologies of the cross-attention mechanism and a one-dimensional convolutional neural network (1D-CNN). The cross-attention mechanisms were very effective in enhancing prediction and generalization abilities. Application of 1D-CNN to the word2vec-generated feature matrices reduced computational costs, thus extending the allowable length of protein sequences to 9000 amino acid residues. Cross-attention PHV outperformed existing state-of-the-art models using a benchmark dataset and accurately predicted PPIs for unknown viruses. Cross-attention PHV also predicted human-SARS-CoV-2 PPIs with area under the curve values >0.95. The Cross-attention PHV web server and source codes are freely available at https://kurata35.bio.kyutech.ac.jp/Cross-attention_PHV/ and https://github.com/kuratahiroyuki/Cross-Attention_PHV, respectively.
病毒感染是全球主要的健康问题。例如,严重急性呼吸综合征冠状病毒2(SARS-CoV-2)惊人的传播速度导致了全球大流行。病毒将遗传物质整合到宿主基因组中,以劫持宿主细胞功能,如细胞周期和细胞凋亡。在这些病毒过程中,蛋白质-蛋白质相互作用(PPI)起着关键作用。因此,识别人类与病毒之间的PPI对于理解感染机制、宿主对病毒感染的免疫反应以及发现有效药物至关重要。包括基于质谱的蛋白质组学和酵母双杂交分析在内的实验方法被广泛用于识别人类-病毒PPI,但这些实验方法耗时、昂贵且费力。为了克服这个问题,我们通过实施交叉注意力机制和一维卷积神经网络(1D-CNN)这两项关键技术,开发了一种名为交叉注意力PHV的新型计算预测器。交叉注意力机制在增强预测和泛化能力方面非常有效。将1D-CNN应用于词向量生成的特征矩阵降低了计算成本,从而将蛋白质序列的允许长度扩展到9000个氨基酸残基。使用基准数据集时,交叉注意力PHV的表现优于现有的最先进模型,并准确预测了未知病毒的PPI。交叉注意力PHV还预测了人类与SARS-CoV-2的PPI,曲线下面积值>0.95。交叉注意力PHV网络服务器和源代码可分别在https://kurata35.bio.kyutech.ac.jp/Cross-attention_PHV/和https://github.com/kuratahiroyuki/Cross-Attention_PHV上免费获取。