Suppr超能文献

利用广义混沌博弈表示法鉴定抗癌肽

Identifying anticancer peptides by using a generalized chaos game representation.

作者信息

Ge Li, Liu Jiaguo, Zhang Yusen, Dehmer Matthias

机构信息

School of Mathematics and Statistics, Shandong University at Weihai, Weihai, 264209, China.

Department of Mechatronics and Biomedical Computer Science, UMIT, Hall in Tyrol, Austria.

出版信息

J Math Biol. 2019 Jan;78(1-2):441-463. doi: 10.1007/s00285-018-1279-x. Epub 2018 Oct 5.

Abstract

We generalize chaos game representation (CGR) to higher dimensional spaces while maintaining its bijection, keeping such method sufficiently representative and mathematically rigorous compare to previous attempts. We first state and prove the asymptotic property of CGR and our generalized chaos game representation (GCGR) method. The prediction follows that the dissimilarity of sequences which possess identical subsequences but distinct positions would be lowered exponentially by the length of the identical subsequence; this effect was taking place unbeknownst to researchers. By shining a spotlight on it now, we show the effect fundamentally supports (G)CGR as a similarity measure or feature extraction technique. We develop two feature extraction techniques: GCGR-Centroid and GCGR-Variance. We use the GCGR-Centroid to analyze the similarity between protein sequences by using the datasets 9 ND5, 24 TF and 50 beta-globin proteins. We obtain consistent results compared with previous studies which proves the significance thereof. Finally, by utilizing support vector machines, we train the anticancer peptide prediction model by using both GCGR-Centroid and GCGR-Variance, and achieve a significantly higher prediction performance by employing the 3 well-studied anticancer peptide datasets.

摘要

我们将混沌游戏表示(CGR)推广到更高维空间,同时保持其双射性,与之前的尝试相比,使该方法具有足够的代表性且在数学上更为严谨。我们首先阐述并证明了CGR和我们的广义混沌游戏表示(GCGR)方法的渐近性质。由此预测,具有相同子序列但位置不同的序列之间的差异会随着相同子序列长度的增加而呈指数下降;这种效应在研究人员不知情的情况下就已存在。现在通过关注这一点,我们表明该效应从根本上支持(G)CGR作为一种相似性度量或特征提取技术。我们开发了两种特征提取技术:GCGR - 质心和GCGR - 方差。我们使用GCGR - 质心,通过9个ND5、24个TF和50个β - 珠蛋白数据集来分析蛋白质序列之间的相似性。与之前的研究相比,我们获得了一致的结果,证明了其重要性。最后,通过利用支持向量机,我们使用GCGR - 质心和GCGR - 方差训练抗癌肽预测模型,并通过使用3个经过充分研究的抗癌肽数据集实现了显著更高的预测性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验