Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, P.R. China.
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, P.R. China.
Brief Bioinform. 2021 Mar 22;22(2):1085-1095. doi: 10.1093/bib/bbaa423.
As the current worldwide outbreaks of the SARS-CoV-2, it is urgently needed to develop effective therapeutic agents for inhibiting the pathogens or treating the related diseases. Antimicrobial peptides (AMP) with functional activity against coronavirus could be a considerable solution, yet there is no research for identifying anti-coronavirus (anti-CoV) peptides with the computational approach. In this study, we first investigated the physiochemical and compositional properties of the collected anti-CoV peptides by comparing against three other negative sets: antivirus peptides without anti-CoV function (antivirus), regular AMP without antivirus functions (non-AVP) and peptides without antimicrobial functions (non-AMP). Then, we established classifiers for identifying anti-CoV peptides between different negative sets based on random forest. Imbalanced learning strategies were adopted due to the severe class-imbalance within the datasets. The geometric mean of the sensitivity and specificity (GMean) under the identification from antivirus, non-AVP and non-AMP reaches 83.07%, 85.51% and 98.82%, respectively. Then, to pursue identifying anti-CoV peptides from broad-spectrum peptides, we designed a double-stages classifier based on the collected datasets. In the first stage, the classifier characterizes AMPs from regular peptides. It achieves an area under the receiver operating curve (AUCROC) value of 97.31%. The second stage is to identify the anti-CoV peptides between the combined negatives of other AMPs. Here, the GMean of evaluation on the independent test set is 79.42%. The proposed approach is considered as an applicable scheme for assisting the development of novel anti-CoV peptides. The datasets and source codes used in this study are available at https://github.com/poncey/PreAntiCoV.
由于目前全球范围内 SARS-CoV-2 的爆发,迫切需要开发有效的治疗剂来抑制病原体或治疗相关疾病。具有针对冠状病毒功能活性的抗菌肽 (AMP) 可能是一个相当不错的解决方案,但目前还没有使用计算方法来识别抗冠状病毒 (anti-CoV) 肽的研究。在这项研究中,我们首先通过将收集到的抗-CoV 肽与另外三个负集进行比较,研究了它们的理化和组成特性:没有抗-CoV 功能的抗病毒肽 (antivirus)、没有抗病毒功能的常规 AMP (non-AVP) 和没有抗菌功能的肽 (non-AMP)。然后,我们基于随机森林为不同负集之间的抗-CoV 肽建立了分类器。由于数据集内严重的类不平衡,我们采用了不平衡学习策略。从 antivirus、non-AVP 和 non-AMP 中识别的敏感性和特异性的几何平均值 (GMean) 分别达到 83.07%、85.51%和 98.82%。然后,为了从广谱肽中寻找抗-CoV 肽,我们基于收集的数据集设计了一个两阶段分类器。在第一阶段,分类器从常规肽中表征 AMP。它实现了 97.31%的接收者操作曲线 (AUCROC) 值。第二阶段是在其他 AMP 的组合负集中识别抗-CoV 肽。这里,独立测试集的评估 GMean 为 79.42%。所提出的方法被认为是辅助开发新型抗-CoV 肽的可行方案。本研究中使用的数据集和源代码可在 https://github.com/poncey/PreAntiCoV 上获得。