Chen Chen, Boorla Veda Sheersh, Chowdhury Ratul, Nissly Ruth H, Gontu Abhinay, Chothe Shubhada K, LaBella Lindsey, Jakka Padmaja, Ramasamy Santhamani, Vandegrift Kurt J, Nair Meera Surendran, Kuchipudi Suresh V, Maranas Costas D
Department of Chemical Engineering, The Pennsylvania State University, University Park, PA 16802, USA.
Animal Diagnostic Laboratory, Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, PA 16802, USA.
bioRxiv. 2022 Mar 23:2022.03.22.485413. doi: 10.1101/2022.03.22.485413.
The cellular entry of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) involves the association of its receptor binding domain (RBD) with human angiotensin converting enzyme 2 (hACE2) as the first crucial step. Efficient and reliable prediction of RBD-hACE2 binding affinity changes upon amino acid substitutions can be valuable for public health surveillance and monitoring potential spillover and adaptation into non-human species. Here, we introduce a convolutional neural network (CNN) model trained on protein sequence and structural features to predict experimental RBD-hACE2 binding affinities of 8,440 variants upon single and multiple amino acid substitutions in the RBD or ACE2. The model achieves a classification accuracy of 83.28% and a Pearson correlation coefficient of 0.85 between predicted and experimentally calculated binding affinities in five-fold cross-validation tests and predicts improved binding affinity for most circulating variants. We pro-actively used the CNN model to exhaustively screen for novel RBD variants with combinations of up to four single amino acid substitutions and suggested candidates with the highest improvements in RBD-ACE2 binding affinity for human and animal ACE2 receptors. We found that the binding affinity of RBD variants against animal ACE2s follows similar trends as those against human ACE2. White-tailed deer ACE2 binds to RBD almost as tightly as human ACE2 while cattle, pig, and chicken ACE2s bind weakly. The model allows testing whether adaptation of the virus for increased binding with other animals would cause concomitant increases in binding with hACE2 or decreased fitness due to adaptation to other hosts.
严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的细胞进入过程涉及到其受体结合域(RBD)与人血管紧张素转换酶2(hACE2)的结合,这是第一步关键步骤。高效且可靠地预测氨基酸替换后RBD-hACE2结合亲和力的变化,对于公共卫生监测以及监测病毒向非人类物种的潜在溢出和适应性具有重要价值。在此,我们引入了一种基于蛋白质序列和结构特征训练的卷积神经网络(CNN)模型,用于预测RBD或ACE2中单个和多个氨基酸替换后的8440个变体的实验性RBD-hACE2结合亲和力。在五折交叉验证测试中,该模型的分类准确率达到83.28%,预测结合亲和力与实验计算结合亲和力之间的皮尔逊相关系数为0.85,并预测了大多数流行变体的结合亲和力有所提高。我们积极利用CNN模型全面筛选具有多达四个单氨基酸替换组合的新型RBD变体,并提出了对人类和动物ACE2受体的RBD-ACE2结合亲和力提高最大的候选变体。我们发现RBD变体与动物ACE2的结合亲和力趋势与与人类ACE2的相似。白尾鹿ACE2与RBD结合几乎与人类ACE2一样紧密,而牛、猪和鸡的ACE2结合较弱。该模型能够测试病毒与其他动物结合增加的适应性是否会导致与hACE2结合增加,或者由于对其他宿主的适应性而导致适应性降低。