Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082, China.
Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082, China.
Virol Sin. 2022 Jun;37(3):437-444. doi: 10.1016/j.virs.2022.04.006. Epub 2022 May 2.
The coronavirus 3C-like (3CL) protease, a cysteine protease, plays an important role in viral infection and immune escape. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage sites of the coronavirus 3CL protease on the viral polyprotein, and found that the cleavage motif were highly conserved for viruses in the genera of Alphacoronavirus, Betacoronavirus and Gammacoronavirus. Strong residue preferences were observed at the neighboring positions of the cleavage sites. A random forest (RF) model was built to predict the cleavage sites of the coronavirus 3CL protease based on the representation of residues in cleavage motifs by amino acid indexes, and the model achieved an AUC of 0.96 in cross-validations. The RF model was further tested on an independent test dataset which were composed of cleavage sites on 99 proteins from multiple coronavirus hosts. It achieved an AUC of 0.95 and predicted correctly 80% of the cleavage sites. Then, 1,352 human proteins were predicted to be cleaved by the 3CL protease by the RF model. These proteins were enriched in several GO terms related to the cytoskeleton, such as the microtubule, actin and tubulin. Finally, a webserver named 3CLP was built to predict the cleavage sites of the coronavirus 3CL protease based on the RF model. Overall, the study provides an effective tool for identifying cleavage sites of the 3CL protease and provides insights into the molecular mechanism underlying the pathogenicity of coronaviruses.
冠状病毒 3C 样(3CL)蛋白酶是一种半胱氨酸蛋白酶,在病毒感染和免疫逃逸中发挥着重要作用。然而,目前仍然缺乏有效的工具来确定 3CL 蛋白酶的切割位点。本研究系统地研究了冠状病毒 3CL 蛋白酶在病毒多蛋白上切割位点的多样性,发现α冠状病毒、β冠状病毒和γ冠状病毒属的病毒的切割基序高度保守。在切割位点的邻近位置观察到强烈的残基偏好。基于氨基酸指数对切割基序中残基的表示,构建了一个随机森林(RF)模型来预测冠状病毒 3CL 蛋白酶的切割位点,该模型在交叉验证中的 AUC 为 0.96。RF 模型进一步在由来自多种冠状病毒宿主的 99 种蛋白质的切割位点组成的独立测试数据集上进行了测试,AUC 为 0.95,正确预测了 80%的切割位点。然后,RF 模型预测了 1352 个人类蛋白可被 3CL 蛋白酶切割。这些蛋白富集在与细胞骨架相关的几个 GO 术语中,如微管、肌动蛋白和微管蛋白。最后,基于 RF 模型构建了一个名为 3CLP 的网络服务器,用于预测冠状病毒 3CL 蛋白酶的切割位点。总的来说,本研究为鉴定 3CL 蛋白酶的切割位点提供了一种有效的工具,并深入了解了冠状病毒的致病性的分子机制。