IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1449-1458. doi: 10.1109/TCBB.2020.3037465. Epub 2022 Jun 3.
Chloroplast is one of the most classic organelles in algae and plant cells. Identifying the locations of chloroplast proteins in the chloroplast organelle is an important as well as a challenging task in deciphering their functions. Biological-based experiments to identify the Protein Sub-Chloroplast Localization (PSCL) is time-consuming and cost-intensive. Over the last decade, a few computational methods have been developed to predict PSCL in which earlier works assumed to predict only single-location; whereas, recent works are able to predict multiple-locations of chloroplast organelle. However, the performances of all the state-of-the-art predictors are poor. This article proposes a novel skip-gram technique to extract highly discriminating patterns from evolutionary profiles and a multi-label deep neural network to predict the PSCL. The proposed model is assessed on two publicly available datasets, i.e., Benchmark and Novel. Experimental results demonstrate that the proposed work outperforms significantly when compared to the state-of-the-art multi-label PSCL predictors. A multi-label prediction accuracy (i.e., Overall Actual Accuracy) of the proposed model is enhanced by an absolute minimum margin of 6.7 percent on Benchmark dataset and 7.9 percent on Novel dataset when compared to the best PSCL predictor from the literature. Further, result of statistical t-test concludes that the performance of the proposed work is significantly improved and thus, the proposed work is an effective computational model to solve multi-label PSCL prediction. The proposed prediction model is hosted on web-server and available at https://nitkit-vgst727-nppsa.nitk.ac.in/deeplocpred/.
叶绿体是藻类和植物细胞中最经典的细胞器之一。确定叶绿体蛋白在叶绿体细胞器中的位置是阐明其功能的一项重要且具有挑战性的任务。基于生物学的实验来识别蛋白亚叶绿体定位(PSCL)既耗时又昂贵。在过去的十年中,已经开发了几种计算方法来预测 PSCL,早期的工作假设只能预测单一位置;而最近的工作能够预测叶绿体细胞器的多个位置。然而,所有最先进的预测器的性能都很差。本文提出了一种新颖的 skip-gram 技术,从进化轮廓中提取高度区分的模式,并使用多标签深度神经网络来预测 PSCL。该模型在两个公开可用的数据集,即 Benchmark 和 Novel 上进行了评估。实验结果表明,与最先进的多标签 PSCL 预测器相比,该模型的性能有显著提高。与文献中的最佳 PSCL 预测器相比,该模型在 Benchmark 数据集上的多标签预测准确性(即总体实际准确性)提高了至少 6.7%,在 Novel 数据集上提高了至少 7.9%。此外,统计 t 检验的结果表明,该模型的性能得到了显著提高,因此,该模型是解决多标签 PSCL 预测的有效计算模型。该预测模型托管在网络服务器上,并可在 https://nitkit-vgst727-nppsa.nitk.ac.in/deeplocpred/ 上访问。