Suppr超能文献

基于二项式贪婪遗传算法的新有效表示和智能核线性判别分析的蛋白质亚核定位。

Protein subnuclear localization based on a new effective representation and intelligent kernel linear discriminant analysis by dichotomous greedy genetic algorithm.

机构信息

School of Information Science and Engineering, Yunnan University, Kunming, PR China.

出版信息

PLoS One. 2018 Apr 12;13(4):e0195636. doi: 10.1371/journal.pone.0195636. eCollection 2018.

Abstract

A wide variety of methods have been proposed in protein subnuclear localization to improve the prediction accuracy. However, one important trend of these means is to treat fusion representation by fusing multiple feature representations, of which, the fusion process takes a lot of time. In view of this, this paper novelly proposed a method by combining a new single feature representation and a new algorithm to obtain good recognition rate. Specifically, based on the position-specific scoring matrix (PSSM), we proposed a new expression, correlation position-specific scoring matrix (CoPSSM) as the protein feature representation. Based on the classic nonlinear dimension reduction algorithm, kernel linear discriminant analysis (KLDA), we added a new discriminant criterion and proposed a dichotomous greedy genetic algorithm (DGGA) to intelligently select its kernel bandwidth parameter. Two public datasets with Jackknife test and KNN classifier were used for the numerical experiments. The results showed that the overall success rate (OSR) with single representation CoPSSM is larger than that with many relevant representations. The OSR of the proposed method can reach as high as 87.444% and 90.3361% for these two datasets, respectively, outperforming many current methods. To show the generalization of the proposed algorithm, two extra standard datasets of protein subcellular were chosen to conduct the expending experiment, and the prediction accuracy by Jackknife test and Independent test is still considerable.

摘要

为了提高预测精度,人们提出了多种蛋白质亚核定位方法。然而,这些方法的一个重要趋势是通过融合多种特征表示来处理融合表示,其中融合过程需要大量时间。针对这一问题,本文创新性地提出了一种方法,该方法结合了一种新的单一特征表示和一种新的算法,以获得良好的识别率。具体来说,基于位置特异性评分矩阵(PSSM),我们提出了一种新的表达式,即相关性位置特异性评分矩阵(CoPSSM)作为蛋白质的特征表示。基于经典的非线性降维算法核线性判别分析(KLDA),我们增加了一个新的判别准则,并提出了一种二项式贪婪遗传算法(DGGA)来智能选择其核带宽参数。使用 Jackknife 测试和 KNN 分类器对两个公共数据集进行了数值实验。结果表明,使用单一表示 CoPSSM 的整体成功率(OSR)大于使用许多相关表示的 OSR。对于这两个数据集,所提出方法的 OSR 分别高达 87.444%和 90.3361%,优于许多现有方法。为了展示所提出算法的泛化性,选择了两个额外的蛋白质亚细胞标准数据集进行扩展实验,Jackknife 测试和独立测试的预测准确性仍然相当可观。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5518/5896989/04949d1a8e5a/pone.0195636.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验