Ge Ruiquan, Feng Guanwen, Jing Xiaoyang, Zhang Renfeng, Wang Pu, Wu Qing
Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China.
Xi'an Key Laboratory of Big Data and Intelligent Vision, School of Computer Science and Technology, Xidian University, Xi'an, China.
Front Genet. 2020 Jul 30;11:760. doi: 10.3389/fgene.2020.00760. eCollection 2020.
As cancer remains one of the main threats of human life, developing efficient cancer treatments is urgent. Anticancer peptides, which could overcome the significant side effects and poor results of traditional cancer treatments, have become a new potential alternative these years. However, identifying anticancer peptides by experimental methods is time consuming and resource consuming, it is of great significance to develop effective computational tools to quickly and accurately identify potential anticancer peptides from amino acid sequences. For most current computational methods, feature representation plays a key role in their final successes. This study proposes a novel fast and accurate approach to identify anticancer peptides using diversified feature representations and ensemble learning method. For the feature representations, the information is encoded from multidimensional feature spaces, including sequence composition, sequence-order, physicochemical properties, etc. In order to better model the potential relationships of peptides, multiple ensemble classifiers, LightGBMs, are applied to detect the different feature sets at first. Then the obtained multiple outputs are used as inputs of the support vector machine classifier, which effectively identifies anticancer peptides. Experimental results on cross validation and independent test sets demonstrate that our method can achieve better or comparable performances compared with other state-of-the-art methods.
由于癌症仍然是人类生命的主要威胁之一,开发有效的癌症治疗方法迫在眉睫。近年来,能够克服传统癌症治疗显著副作用和不佳效果的抗癌肽已成为一种新的潜在替代方案。然而,通过实验方法鉴定抗癌肽既耗时又耗资源,因此开发有效的计算工具以从氨基酸序列中快速准确地鉴定潜在抗癌肽具有重要意义。对于当前大多数计算方法而言,特征表示在其最终成功中起着关键作用。本研究提出了一种新颖的快速准确方法,利用多样化的特征表示和集成学习方法来鉴定抗癌肽。对于特征表示,信息是从多维特征空间进行编码的,包括序列组成、序列顺序、理化性质等。为了更好地模拟肽的潜在关系,首先应用多个集成分类器LightGBM来检测不同的特征集。然后将获得的多个输出用作支持向量机分类器的输入,从而有效地鉴定抗癌肽。交叉验证和独立测试集上的实验结果表明,与其他现有最先进方法相比,我们的方法能够实现更好或相当的性能。