Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
Faculty of Applied Sciences, Macao Polytechnic University, Macau.
J Mol Biol. 2024 Sep 1;436(17):168687. doi: 10.1016/j.jmb.2024.168687. Epub 2024 Jun 25.
Anticancer peptides (ACPs), naturally occurring molecules with remarkable potential to target and kill cancer cells. However, identifying ACPs based solely from their primary amino acid sequences remains a major hurdle in immunoinformatics. In the past, several web-based machine learning (ML) tools have been proposed to assist researchers in identifying potential ACPs for further testing. Notably, our meta-approach method, mACPpred, introduced in 2019, has significantly advanced the field of ACP research. Given the exponential growth in the number of characterized ACPs, there is now a pressing need to create an updated version of mACPpred. To develop mACPpred 2.0, we constructed an up-to-date benchmarking dataset by integrating all publicly available ACP datasets. We employed a large-scale of feature descriptors, encompassing both conventional feature descriptors and advanced pre-trained natural language processing (NLP)-based embeddings. We evaluated their ability to discriminate between ACPs and non-ACPs using eleven different classifiers. Subsequently, we employed a stacked deep learning (SDL) approach, incorporating 1D convolutional neural network (1D CNN) blocks and hybrid features. These features included the top seven performing NLP-based features and 90 probabilistic features, allowing us to identify hidden patterns within these diverse features and improve the accuracy of our ACP prediction model. This is the first study to integrate spatial and probabilistic feature representations for predicting ACPs. Rigorous cross-validation and independent tests conclusively demonstrated that mACPpred 2.0 not only surpassed its predecessor (mACPpred) but also outperformed the existing state-of-the-art predictors, highlighting the importance of advanced feature representation capabilities attained through SDL. To facilitate widespread use and accessibility, we have developed a user-friendly for mACPpred 2.0, available at https://balalab-skku.org/mACPpred2/.
抗癌肽 (ACPs) 是一类具有显著靶向和杀伤癌细胞潜力的天然分子。然而,仅根据其一级氨基酸序列来鉴定 ACP 仍然是免疫信息学中的一个主要难题。过去,已经提出了几种基于网络的机器学习 (ML) 工具来帮助研究人员鉴定潜在的 ACP 进行进一步测试。值得注意的是,我们在 2019 年引入的元方法 mACPpred 极大地推动了 ACP 研究领域的发展。鉴于已鉴定的 ACP 数量呈指数级增长,现在迫切需要创建一个更新的 mACPpred 版本。为了开发 mACPpred 2.0,我们通过整合所有公开可用的 ACP 数据集构建了一个最新的基准数据集。我们使用了大规模的特征描述符,包括传统的特征描述符和先进的基于预训练的自然语言处理 (NLP) 的嵌入。我们使用十一种不同的分类器来评估它们区分 ACP 和非 ACP 的能力。随后,我们采用了堆叠深度学习 (SDL) 方法,结合了 1D 卷积神经网络 (1D CNN) 块和混合特征。这些特征包括表现最好的前七个基于 NLP 的特征和 90 个概率特征,使我们能够识别这些不同特征中的隐藏模式,并提高我们的 ACP 预测模型的准确性。这是第一项集成空间和概率特征表示来预测 ACP 的研究。严格的交叉验证和独立测试得出结论,mACPpred 2.0 不仅超越了它的前身 (mACPpred),而且超过了现有的最先进的预测器,突出了通过 SDL 获得的先进特征表示能力的重要性。为了方便广泛使用和访问,我们开发了一个用户友好的 mACPpred 2.0 网站,网址是 https://balalab-skku.org/mACPpred2/。