School of Science, Xi'an Polytechnic University, Xi'an, P. R. China.
SAR QSAR Environ Res. 2023 Jan;34(1):1-19. doi: 10.1080/1062936X.2022.2160011. Epub 2022 Dec 23.
Cancer is one of the main diseases threatening human life, accounting for millions of deaths around the world each year. Traditional physical and chemical methods for cancer treatment are extremely time-consuming, lab-intensive, expensive, inefficient and difficult to be applied in a high-throughput way. Hence, it is an urgent task to develop automated computational methods to enable fast and accurate identification of anticancer peptides (ACPs). In this paper, we develop a novel model named iACP-GE to identify ACPs. Multi-features are extracted by using binary encoding, enhanced grouped amino acid composition and BLOSUM62 encoding based on the N5C5 sequence, as well as detrended forward moving-average auto-cross correlation analysis based on physicochemical properties of 20 natural amino acids. Thus, 835 features are obtained for each sample, in order to avoid information redundancy, gradient boosting decision tree was adopted as the feature selection strategy. Then, the optimal feature subset is input to the extra tree classifier. The accuracies of ACP740 and ACP240 datasets with the 5-fold cross-validation were 90.54% and 91.25%, respectively. Experimental results indicate that iACP-GE significantly outperforms several existing models on ACP740 and ACP240 datasets and can be used as an effective tool for the identification of ACPs. The datasets and source codes for iACP-GE are available at https://github.com/yunyunliang88/iACP-GE.
癌症是威胁人类生命的主要疾病之一,每年在全球造成数百万人死亡。传统的癌症治疗物理和化学方法非常耗时、实验室密集、昂贵、效率低下,并且难以实现高通量应用。因此,开发自动化计算方法以实现快速准确地识别抗癌肽(ACP)是当务之急。在本文中,我们开发了一种名为 iACP-GE 的新型模型来识别 ACP。通过使用二进制编码、增强分组氨基酸组成和基于 BLOSUM62 的编码以及基于 20 种天然氨基酸理化性质的去趋势向前移动平均自相关分析,从 N5C5 序列中提取了多种特征。因此,为每个样本获得了 835 个特征,以避免信息冗余,采用梯度提升决策树作为特征选择策略。然后,最优特征子集被输入到 Extra Tree 分类器中。在 5 折交叉验证中,ACP740 和 ACP240 数据集的准确率分别为 90.54%和 91.25%。实验结果表明,iACP-GE 在 ACP740 和 ACP240 数据集上明显优于几个现有模型,可作为识别 ACP 的有效工具。iACP-GE 的数据集和源代码可在 https://github.com/yunyunliang88/iACP-GE 上获得。