Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan, Himachal Pradesh 173 234, India.
J Biosci. 2020;45.
Subcellular localization prediction of the proteome is one of major goals of large-scale genome or proteome sequencing projects to define the gene functions that could be possible with the help of computational modeling techniques. Previously, different methods have been developed for this purpose using multi-label classification system and achieved a high level of accuracy. However, during the validation of our blind dataset of plant vacuole proteins, we observed that they have poor performance with accuracy value range from ~1.3% to 48.5%. The results showed that the previously developed methods are not very accurate for the plant vacuole protein prediction and thus emphasize the need to develop a more accurate and reliable algorithm. In this study, we have developed various compositions as well as PSSM-based models and achieved a high accuracy than previously developed methods. We have shown that our best model achieved ~63% accuracy on blind dataset, which is far better than currently available tools. Furthermore, we have implemented our best models in the form of GUI-based free software called 'VacPred' which is compatible with both Linux and Window platform. This software is freely available for download at www.deepaklab.com/vacpred.
蛋白质组亚细胞定位预测是大规模基因组或蛋白质组测序项目的主要目标之一,目的是借助计算建模技术来确定基因功能。此前,已经开发了多种使用多标签分类系统的方法来实现这一目标,并达到了很高的准确性水平。然而,在对我们的植物液泡蛋白盲数据集进行验证时,我们观察到这些方法的准确性较差,范围在1.3%到 48.5%之间。结果表明,以前开发的方法对植物液泡蛋白的预测并不十分准确,因此需要开发更准确、更可靠的算法。在这项研究中,我们开发了各种基于 PSSM 的组合模型,并取得了比以前开发的方法更高的准确性。我们已经证明,我们最好的模型在盲数据集上的准确率达到了63%,远远优于现有的工具。此外,我们以图形用户界面 (GUI) 为基础,开发了一款名为“VacPred”的免费软件,并将其实现为两种模型,该软件适用于 Linux 和 Windows 平台。该软件可在 www.deepaklab.com/vacpred 上免费下载。