Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac265.
The COVID-19 pandemic caused several million deaths worldwide. Development of anti-coronavirus drugs is thus urgent. Unlike conventional non-peptide drugs, antiviral peptide drugs are highly specific, easy to synthesize and modify, and not highly susceptible to drug resistance. To reduce the time and expense involved in screening thousands of peptides and assaying their antiviral activity, computational predictors for identifying anti-coronavirus peptides (ACVPs) are needed. However, few experimentally verified ACVP samples are available, even though a relatively large number of antiviral peptides (AVPs) have been discovered. In this study, we attempted to predict ACVPs using an AVP dataset and a small collection of ACVPs. Using conventional features, a binary profile and a word-embedding word2vec (W2V), we systematically explored five different machine learning methods: Transformer, Convolutional Neural Network, bidirectional Long Short-Term Memory, Random Forest (RF) and Support Vector Machine. Via exhaustive searches, we found that the RF classifier with W2V consistently achieved better performance on different datasets. The two main controlling factors were: (i) the dataset-specific W2V dictionary was generated from the training and independent test datasets instead of the widely used general UniProt proteome and (ii) a systematic search was conducted and determined the optimal k-mer value in W2V, which provides greater discrimination between positive and negative samples. Therefore, our proposed method, named iACVP, consistently provides better prediction performance compared with existing state-of-the-art methods. To assist experimentalists in identifying putative ACVPs, we implemented our model as a web server accessible via the following link: http://kurata35.bio.kyutech.ac.jp/iACVP.
COVID-19 大流行导致了全球数百万人死亡。因此,开发抗冠状病毒药物迫在眉睫。与传统的非肽类药物不同,抗病毒肽类药物具有高度特异性,易于合成和修饰,不易产生耐药性。为了减少筛选数千种肽并测定其抗病毒活性的时间和费用,需要开发用于识别抗冠状病毒肽(ACVP)的计算预测器。但是,尽管已经发现了相对大量的抗病毒肽(AVP),但可用的经实验验证的 ACVP 样本却很少。在这项研究中,我们试图使用 AVP 数据集和一小部分 ACVP 来预测 ACVP。使用常规特征、二进制轮廓和词嵌入 word2vec(W2V),我们系统地探索了五种不同的机器学习方法:Transformer、卷积神经网络、双向长短期记忆、随机森林(RF)和支持向量机。通过穷尽搜索,我们发现 RF 分类器与 W2V 一致地在不同数据集上实现了更好的性能。两个主要控制因素是:(i)数据集特定的 W2V 字典是从训练和独立测试数据集生成的,而不是使用广泛的通用 UniProt 蛋白质组,(ii)进行了系统搜索并确定了 W2V 中的最佳 k-mer 值,这提供了阳性和阴性样本之间更好的区分度。因此,与现有最先进的方法相比,我们提出的方法(称为 iACVP)一致地提供了更好的预测性能。为了帮助实验人员识别潜在的 ACVP,我们将我们的模型实现为一个可通过以下链接访问的网络服务器:http://kurata35.bio.kyutech.ac.jp/iACVP。