Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
Indraprastha Institute of Information Technology, New Delhi, India.
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa153.
Increasing use of therapeutic peptides for treating cancer has received considerable attention of the scientific community in the recent years. The present study describes the in silico model developed for predicting and designing anticancer peptides (ACPs). ACPs residue composition analysis show the preference of A, F, K, L and W. Positional preference analysis revealed that residues A, F and K are favored at N-terminus and residues L and K are preferred at C-terminus. Motif analysis revealed the presence of motifs like LAKLA, AKLAK, FAKL and LAKL in ACPs. Machine learning models were developed using various input features and implementing different machine learning classifiers on two datasets main and alternate dataset. In the case of main dataset, dipeptide composition based ETree classifier model achieved maximum Matthews correlation coefficient (MCC) of 0.51 and 0.83 area under receiver operating characteristics (AUROC) on the training dataset. In the case of alternate dataset, amino acid composition based ETree classifier performed best and achieved the highest MCC of 0.80 and AUROC of 0.97 on the training dataset. Five-fold cross-validation technique was implemented for model training and testing, and their performance was also evaluated on the validation dataset. Best models were implemented in the webserver AntiCP 2.0, which is freely available at https://webs.iiitd.edu.in/raghava/anticp2/. The webserver is compatible with multiple screens such as iPhone, iPad, laptop and android phones. The standalone version of the software is available at GitHub; docker-based container also developed.
近年来,治疗癌症的治疗性肽的应用越来越受到科学界的关注。本研究描述了为预测和设计抗癌肽(ACP)而开发的计算模型。ACP 残基组成分析表明 A、F、K、L 和 W 是偏好残基。位置偏好分析表明,残基 A、F 和 K 优先在 N 端,而残基 L 和 K 优先在 C 端。基序分析表明,ACP 中存在 LAKLA、AKLAK、FAKL 和 LAKL 等基序。使用各种输入特征并在两个数据集(主要数据集和备用数据集)上实现不同的机器学习分类器,开发了机器学习模型。在主要数据集的情况下,基于二肽组成的 ETree 分类器模型在训练数据集上实现了最大马修斯相关系数(MCC)为 0.51 和 0.83 的接收器操作特征(AUROC)。在备用数据集的情况下,基于氨基酸组成的 ETree 分类器表现最佳,在训练数据集上实现了最高 MCC 为 0.80 和 AUROC 为 0.97。实施了五重交叉验证技术进行模型训练和测试,并在验证数据集上评估了它们的性能。最佳模型已在 web 服务器 AntiCP 2.0 中实现,该服务器可在 https://webs.iiitd.edu.in/raghava/anticp2/ 免费获得。该服务器与 iPhone、iPad、笔记本电脑和安卓手机等多种屏幕兼容。该软件的独立版本可在 GitHub 上获得;还开发了基于 Docker 的容器。