School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.
Bioinformatics. 2022 Aug 10;38(16):4019-4026. doi: 10.1093/bioinformatics/btac432.
Characterization of protein subcellular localization has become an important and long-standing task in bioinformatics and computational biology, which provides valuable information for elucidating various cellular functions of proteins and guiding drug design.
Here, we develop a novel bioimage-based computational approach, termed PScL-DDCFPred, to accurately predict protein subcellular localizations in human tissues. PScL-DDCFPred first extracts multiview image features, including global and local features, as base or pure features; next, it applies a new integrative feature selection method based on stepwise discriminant analysis and generalized discriminant analysis to identify the optimal feature sets from the extracted pure features; Finally, a classifier based on deep neural network (DNN) and deep-cascade forest (DCF) is established. Stringent 10-fold cross-validation tests on the new protein subcellular localization training dataset, constructed from the human protein atlas databank, illustrates that PScL-DDCFPred achieves a better performance than several existing state-of-the-art methods. Moreover, the independent test set further illustrates the generalization capability and superiority of PScL-DDCFPred over existing predictors. In-depth analysis shows that the excellent performance of PScL-DDCFPred can be attributed to three critical factors, namely the effective combination of the DNN and DCF models, complementarity of global and local features, and use of the optimal feature sets selected by the integrative feature selection algorithm.
https://github.com/csbio-njust-edu/PScL-DDCFPred.
Supplementary data are available at Bioinformatics online.
蛋白质亚细胞定位的特征描述已成为生物信息学和计算生物学中的一项重要且长期的任务,它为阐明蛋白质的各种细胞功能和指导药物设计提供了有价值的信息。
在这里,我们开发了一种新的基于生物图像的计算方法,称为 PScL-DDCFPred,用于准确预测人类组织中的蛋白质亚细胞定位。PScL-DDCFPred 首先提取多视图图像特征,包括全局和局部特征,作为基本或纯特征;接下来,它应用一种新的基于逐步判别分析和广义判别分析的综合特征选择方法,从提取的纯特征中识别出最佳特征集;最后,建立基于深度神经网络(DNN)和深度级联森林(DCF)的分类器。在从人类蛋白质图谱数据库构建的新蛋白质亚细胞定位训练数据集上进行严格的 10 折交叉验证测试表明,PScL-DDCFPred 优于几种现有的最先进方法。此外,独立测试集进一步说明了 PScL-DDCFPred 相对于现有预测器的泛化能力和优越性。深入分析表明,PScL-DDCFPred 的优异性能可归因于三个关键因素,即 DNN 和 DCF 模型的有效结合、全局和局部特征的互补性以及综合特征选择算法选择的最佳特征集的使用。
https://github.com/csbio-njust-edu/PScL-DDCFPred。
补充数据可在 Bioinformatics 在线获得。