Suppr超能文献

通过将序列特征纳入 Chou 的 PseAAC 来预测多标签蛋白质的亚细胞定位。

Predicting subcellular localization of multi-label proteins by incorporating the sequence features into Chou's PseAAC.

机构信息

Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.

Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.

出版信息

Genomics. 2019 Dec;111(6):1325-1332. doi: 10.1016/j.ygeno.2018.09.004. Epub 2018 Sep 7.

Abstract

The emergence of numerous genome projects has made the experimental classification of the protein localization almost impossible due to the exponential increase in the number of protein samples. However, most of the applications are merely developed for single-plex and completely ignored the presence of one protein at two or more locations in a cell. In this regard, few attempts were carried out to target Multi-label protein localizations; consequently, undesirable accuracies are achieved. This paper presents a novel approach, in which a discrete feature extraction method is fused with physicochemical properties of amino acids by using Chou's general form of Pseudo Amino Acid Composition. The technique is tested on two benchmark datasets namely: Gpos-mploc and Virus-mPLoc. The empirical results demonstrated that the proposed method yields better results via two examined classifiers i.e. ML-KNN and Rank-SVM. It is established that the proposed model has improved values in all performance measures considered for the comparison.

摘要

由于蛋白质样本数量呈指数级增长,众多基因组项目的出现使得蛋白质定位的实验分类几乎变得不可能。然而,大多数应用程序仅仅是为单plex 开发的,完全忽略了一个蛋白质在细胞中存在于两个或更多位置的情况。在这方面,很少有尝试针对多标签蛋白质定位;因此,得到的准确性不理想。本文提出了一种新方法,其中通过使用 Chou 的通用形式的伪氨基酸组成,将离散特征提取方法与氨基酸的理化性质融合在一起。该技术在两个基准数据集 Gpos-mploc 和 Virus-mPLoc 上进行了测试。实验结果表明,该方法通过两种经过检验的分类器(即 ML-KNN 和 Rank-SVM)产生了更好的结果。已经确定,所提出的模型在所有考虑的比较性能指标中都具有改进的值。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验