University of Electronic Science and Technology of China.
Institute of Fundamental and Frontier Sciences at University of Electronic Science and Technology of China.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab008.
Anticancer peptides constitute one of the most promising therapeutic agents for combating common human cancers. Using wet experiments to verify whether a peptide displays anticancer characteristics is time-consuming and costly. Hence, in this study, we proposed a computational method named identify anticancer peptides via deep representation learning features (iACP-DRLF) using light gradient boosting machine algorithm and deep representation learning features. Two kinds of sequence embedding technologies were used, namely soft symmetric alignment embedding and unified representation (UniRep) embedding, both of which involved deep neural network models based on long short-term memory networks and their derived networks. The results showed that the use of deep representation learning features greatly improved the capability of the models to discriminate anticancer peptides from other peptides. Also, UMAP (uniform manifold approximation and projection for dimension reduction) and SHAP (shapley additive explanations) analysis proved that UniRep have an advantage over other features for anticancer peptide identification. The python script and pretrained models could be downloaded from https://github.com/zhibinlv/iACP-DRLF or from http://public.aibiochem.net/iACP-DRLF/.
抗癌肽是治疗常见人类癌症最有前途的治疗剂之一。使用湿实验来验证肽是否具有抗癌特性既耗时又昂贵。因此,在这项研究中,我们提出了一种名为通过深度表示学习特征识别抗癌肽的计算方法(iACP-DRLF),该方法使用轻梯度提升机算法和深度表示学习特征。使用了两种序列嵌入技术,即软对称对齐嵌入和统一表示(UniRep)嵌入,这两种技术都涉及基于长短期记忆网络及其衍生网络的深度神经网络模型。结果表明,深度表示学习特征的使用极大地提高了模型区分抗癌肽和其他肽的能力。此外,UMAP(均匀流形逼近和投影降维)和 SHAP(Shapley 加法解释)分析证明,UniRep 比其他特征在抗癌肽识别方面具有优势。Python 脚本和预训练模型可以从 https://github.com/zhibinlv/iACP-DRLF 或从 http://public.aibiochem.net/iACP-DRLF/ 下载。