Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City, 300, Taiwan.
Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.
Sci Rep. 2021 Jun 30;11(1):13594. doi: 10.1038/s41598-021-93124-9.
Anticancer peptides (ACPs) are a kind of bioactive peptides which could be used as a novel type of anticancer drug that has several advantages over chemistry-based drug, including high specificity, strong tumor penetration capacity, and low toxicity to normal cells. As the number of experimentally verified bioactive peptides has increased significantly, various of in silico approaches are imperative for investigating the characteristics of ACPs. However, the lack of methods for investigating the differences in physicochemical properties of ACPs. In this study, we compared the N- and C-terminal amino acid composition for each peptide, there are three major subtypes of ACPs that are defined based on the distribution of positively charged residues. For the first time, we were motivated to develop a two-step machine learning model for identification of the subtypes of ACPs, which classify the input data into the corresponding group before applying the classifier. Further, to improve the predictive power, the hybrid feature sets were considered for prediction. Evaluation by five-fold cross-validation showed that the two-step model trained with sequence-based features and physicochemical properties was most effective in discriminating between ACPs and non-ACPs. The two-step model trained with the hybrid features performed well, with a sensitivity of 86.75%, a specificity of 85.75%, an accuracy of 86.08%, and a Matthews Correlation Coefficient value of 0.703. Furthermore, the model also consistently provides the effective performance in independent testing set, with sensitivity of 77.6%, specificity of 94.74%, accuracy of 88.99% and the MCC value reached 0.75. Finally, the two-step model has been implemented as a web-based tool, namely iDACP, which is now freely available at http://mer.hc.mmh.org.tw/iDACP/ .
抗癌肽 (ACPs) 是一种生物活性肽,可用作新型抗癌药物,相对于基于化学的药物具有多种优势,包括高特异性、强肿瘤穿透能力和对正常细胞的低毒性。随着经过实验验证的生物活性肽数量的显著增加,各种基于计算的方法对于研究 ACP 的特性至关重要。然而,目前缺乏研究 ACP 理化性质差异的方法。在这项研究中,我们比较了每个肽的 N-和 C-末端氨基酸组成,基于带正电荷残基的分布,定义了三种主要的 ACP 亚型。我们首次开发了一种两步机器学习模型,用于识别 ACP 的亚型,该模型在将输入数据分类到相应组之前先对其进行分类。此外,为了提高预测能力,还考虑了混合特征集进行预测。五重交叉验证评估表明,使用基于序列的特征和理化性质训练的两步模型最有效地区分 ACP 和非 ACP。使用混合特征训练的两步模型表现良好,灵敏度为 86.75%,特异性为 85.75%,准确性为 86.08%,马修斯相关系数值为 0.703。此外,该模型在独立测试集中也始终提供有效的性能,灵敏度为 77.6%,特异性为 94.74%,准确性为 88.99%,MCC 值达到 0.75。最后,两步模型已实现为一个基于网络的工具,即 iDACP,现在可在 http://mer.hc.mmh.org.tw/iDACP/ 免费获得。