Karakaya Onur, Kilimci Zeynep Hilal
Research and Development Inc., Turkcell Technology, İstanbul, Turkey.
Department of Information Systems Engineering, Kocaeli University, Kocaeli, Turkey.
PeerJ Comput Sci. 2024 Feb 20;10:e1831. doi: 10.7717/peerj-cs.1831. eCollection 2024.
Anticancer peptides (ACPs) are a group of peptides that exhibit antineoplastic properties. The utilization of ACPs in cancer prevention can present a viable substitute for conventional cancer therapeutics, as they possess a higher degree of selectivity and safety. Recent scientific advancements generate an interest in peptide-based therapies which offer the advantage of efficiently treating intended cells without negatively impacting normal cells. However, as the number of peptide sequences continues to increase rapidly, developing a reliable and precise prediction model becomes a challenging task. In this work, our motivation is to advance an efficient model for categorizing anticancer peptides employing the consolidation of word embedding and deep learning models. First, Word2Vec, GloVe, FastText, One-Hot-Encoding approaches are evaluated as embedding techniques for the purpose of extracting peptide sequences. Then, the output of embedding models are fed into deep learning approaches CNN, LSTM, BiLSTM. To demonstrate the contribution of proposed framework, extensive experiments are carried on widely-used datasets in the literature, ACPs250 and independent. Experiment results show the usage of proposed model enhances classification accuracy when compared to the state-of-the-art studies. The proposed combination, FastText+BiLSTM, exhibits 92.50% of accuracy for ACPs250 dataset, and 96.15% of accuracy for the Independent dataset, thence determining new state-of-the-art.
抗癌肽(ACPs)是一类具有抗肿瘤特性的肽。在癌症预防中使用抗癌肽可以成为传统癌症治疗方法的可行替代方案,因为它们具有更高的选择性和安全性。最近的科学进展引发了人们对基于肽的疗法的兴趣,这种疗法具有有效治疗目标细胞而不对正常细胞产生负面影响的优势。然而,随着肽序列数量的迅速持续增加,开发一个可靠且精确的预测模型成为一项具有挑战性的任务。在这项工作中,我们的动机是通过整合词嵌入和深度学习模型来推进一种用于对抗癌肽进行分类的高效模型。首先,评估Word2Vec、GloVe、FastText、独热编码方法作为嵌入技术以提取肽序列。然后,将嵌入模型的输出输入到深度学习方法CNN、LSTM、双向LSTM中。为了证明所提出框架的贡献,在文献中广泛使用的数据集ACPs250和独立数据集上进行了广泛的实验。实验结果表明,与现有研究相比,所提出模型的使用提高了分类准确率。所提出的组合FastText + 双向LSTM在ACPs250数据集上的准确率为92.50%,在独立数据集上的准确率为96.15%,从而确定了新的最先进水平。