基于多尺度注意力架构学习嵌入特征，以提高抗癌肽的预测性能。

Learning embedding features based on multisense-scaled attention architecture to improve the predictive performance of anticancer peptides.

机构信息

School of Software, Shandong University, Jinan, China.

Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.

出版信息

Bioinformatics. 2021 Dec 11;37(24):4684-4693. doi: 10.1093/bioinformatics/btab560.

DOI:10.1093/bioinformatics/btab560

PMID:34323948

Abstract

MOTIVATION

Anticancer peptides (ACPs) have recently emerged as effective anticancer drugs in cancer therapy. Machine learning-based predictors have been developed to identify ACPs and achieve satisfactory performance. However, existing methods suffer from experience-based feature engineering, which not only restricts the representation ability of the models to a certain extent but also lacks adaptivity for different data, limiting the further improvement of the predictive performance and impacting the robustness of the predictive models. To alleviate the above problems, we propose a novel deep-learning-based predictor named ACPred-LAF, in which we propose a novel multisense and multiscaled embedding algorithm to automatically learn and extract context sequential characteristics of ACPs.

RESULTS

Through the feature comparative analysis, we demonstrate that our learnable and self-adaptive embedding features are better than hand-crafted features in capturing discriminative information, which can effectively benefit the performance improvement for ACP prediction. In addition, benchmarking comparison results demonstrate that our ACPred-LAF outperforms the state-of-the-art methods both on existing benchmark datasets and our newly constructed dataset. Furthermore, we also prove and validate the robustness of the model via the data interference experiment. To avoid potential evaluation bias, here, we construct a new ACP benchmark dataset named ACP-Mixed by integrating existing datasets. We expect our newly constructed dataset to be a golden standard benchmark dataset in this field. To facilitate the use of our model, we develop a web server as the implementation of ACPred-LAF.

AVAILABILITY AND IMPLEMENTATION

Our proposed ACPred-LAF, newly constructed benchmark dataset ACP-Mixed are open source collaborative initiatives available in the GitHub repository (https://github.com/TearsWaiting/ACPred-LAF). Besides, a webserver as the implementation of ACPred-LAF that can be accessed via: http://server.malab.cn/ACPred-LAF.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

抗癌肽 (ACPs) 最近作为癌症治疗中的有效抗癌药物出现。已经开发了基于机器学习的预测器来识别 ACP 并取得令人满意的性能。然而，现有的方法存在基于经验的特征工程问题，这不仅在一定程度上限制了模型的表示能力，而且缺乏对不同数据的适应性，限制了预测性能的进一步提高，并影响了预测模型的稳健性。为了解决上述问题，我们提出了一种名为 ACPred-LAF 的新型基于深度学习的预测器，其中我们提出了一种新的多义多尺度嵌入算法，用于自动学习和提取 ACP 的上下文序列特征。

结果

通过特征比较分析，我们证明了我们可学习的自适应嵌入特征在捕获判别信息方面优于手工制作的特征，这可以有效地提高 ACP 预测的性能。此外，基准比较结果表明，我们的 ACPred-LAF 在现有的基准数据集和我们新构建的数据集上均优于最先进的方法。此外，我们还通过数据干扰实验证明和验证了模型的稳健性。为避免潜在的评估偏差，我们在此构建了一个新的 ACP 基准数据集 ACP-Mixed，通过整合现有数据集。我们希望我们新构建的数据集成为该领域的黄金标准基准数据集。为了方便模型的使用，我们开发了一个作为 ACPred-LAF 实现的网络服务器。