Liu Mingyou, Liu Hongmei, Wu Tao, Zhu Yingxue, Zhou Yuwei, Huang Ziru, Xiang Changcheng, Huang Jian
School of Biology and Engineering, Guizhou Medical University, Guiyang, Guizhou, China.
School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, Sichuan, China.
Amino Acids. 2023 Sep;55(9):1121-1136. doi: 10.1007/s00726-023-03300-6. Epub 2023 Jul 4.
The ongoing COVID-19 pandemic has caused dramatic loss of human life. There is an urgent need for safe and efficient anti-coronavirus infection drugs. Anti-coronavirus peptides (ACovPs) can inhibit coronavirus infection. With high-efficiency, low-toxicity, and broad-spectrum inhibitory effects on coronaviruses, they are promising candidates to be developed into a new type of anti-coronavirus drug. Experiment is the traditional way of ACovPs' identification, which is less efficient and more expensive. With the accumulation of experimental data on ACovPs, computational prediction provides a cheaper and faster way to find anti-coronavirus peptides' candidates. In this study, we ensemble several state-of-the-art machine learning methodologies to build nine classification models for the prediction of ACovPs. These models were pre-trained using deep neural networks, and the performance of our ensemble model, ACP-Dnnel, was evaluated across three datasets and independent dataset. We followed Chou's 5-step rules. (1) we constructed the benchmark datasets data1, data2, and data3 for training and testing, and introduced the independent validation dataset ACVP-M; (2) we analyzed the peptides sequence composition feature of the benchmark dataset; (3) we constructed the ACP-Dnnel model with deep convolutional neural network (DCNN) merged the bi-directional long short-term memory (BiLSTM) as the base model for pre-training to extract the features embedded in the benchmark dataset, and then, nine classification algorithms were introduced to ensemble together for classification prediction and voting together; (4) tenfold cross-validation was introduced during the training process, and the final model performance was evaluated; (5) finally, we constructed a user-friendly web server accessible to the public at http://150.158.148.228:5000/ . The highest accuracy (ACC) of ACP-Dnnel reaches 97%, and the Matthew's correlation coefficient (MCC) value exceeds 0.9. On three different datasets, its average accuracy is 96.0%. After the latest independent dataset validation, ACP-Dnnel improved at MCC, SP, and ACC values 6.2%, 7.5% and 6.3% greater, respectively. It is suggested that ACP-Dnnel can be helpful for the laboratory identification of ACovPs, speeding up the anti-coronavirus peptide drug discovery and development. We constructed the web server of anti-coronavirus peptides' prediction and it is available at http://150.158.148.228:5000/ .
持续的新冠疫情已导致大量人员死亡。迫切需要安全有效的抗冠状病毒感染药物。抗冠状病毒肽(ACovPs)能够抑制冠状病毒感染。它们对冠状病毒具有高效、低毒和广谱抑制作用,有望被开发成新型抗冠状病毒药物。实验是鉴定ACovPs的传统方法,效率较低且成本较高。随着ACovPs实验数据的积累,计算预测为寻找抗冠状病毒肽候选物提供了一种更便宜、更快的方法。在本研究中,我们整合了几种最先进的机器学习方法,构建了九个用于预测ACovPs的分类模型。这些模型使用深度神经网络进行预训练,并在三个数据集和独立数据集上评估了我们的集成模型ACP-Dnnel的性能。我们遵循了周的五步规则。(1)我们构建了用于训练和测试的基准数据集data1、data2和data3,并引入了独立验证数据集ACVP-M;(2)我们分析了基准数据集的肽序列组成特征;(3)我们构建了以深度卷积神经网络(DCNN)合并双向长短期记忆(BiLSTM)为基础模型进行预训练的ACP-Dnnel模型,以提取基准数据集中嵌入的特征,然后引入九种分类算法进行集成,共同进行分类预测和投票;(4)在训练过程中引入十折交叉验证,并评估最终模型性能;(5)最后,我们构建了一个用户友好的网络服务器,公众可通过http://150.158.148.228:5000/访问。ACP-Dnnel的最高准确率(ACC)达到97%,马修斯相关系数(MCC)值超过0.9。在三个不同数据集上,其平均准确率为96.0%。在最新的独立数据集验证后,ACP-Dnnel在MCC、SP和ACC值上分别提高了6.2%、7.5%和6.3%。表明ACP-Dnnel有助于实验室鉴定ACovPs,加速抗冠状病毒肽药物的发现和开发。我们构建了抗冠状病毒肽预测的网络服务器,可通过http://150.158.148.228:5000/访问。