Suppr超能文献

PScL-DDCFPred:一种基于集成深度学习的方法,用于从生物图像数据中描述人类蛋白质的多类亚细胞定位。

PScL-DDCFPred: an ensemble deep learning-based approach for characterizing multiclass subcellular localization of human proteins from bioimage data.

机构信息

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.

Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.

出版信息

Bioinformatics. 2022 Aug 10;38(16):4019-4026. doi: 10.1093/bioinformatics/btac432.

Abstract

MOTIVATION

Characterization of protein subcellular localization has become an important and long-standing task in bioinformatics and computational biology, which provides valuable information for elucidating various cellular functions of proteins and guiding drug design.

RESULTS

Here, we develop a novel bioimage-based computational approach, termed PScL-DDCFPred, to accurately predict protein subcellular localizations in human tissues. PScL-DDCFPred first extracts multiview image features, including global and local features, as base or pure features; next, it applies a new integrative feature selection method based on stepwise discriminant analysis and generalized discriminant analysis to identify the optimal feature sets from the extracted pure features; Finally, a classifier based on deep neural network (DNN) and deep-cascade forest (DCF) is established. Stringent 10-fold cross-validation tests on the new protein subcellular localization training dataset, constructed from the human protein atlas databank, illustrates that PScL-DDCFPred achieves a better performance than several existing state-of-the-art methods. Moreover, the independent test set further illustrates the generalization capability and superiority of PScL-DDCFPred over existing predictors. In-depth analysis shows that the excellent performance of PScL-DDCFPred can be attributed to three critical factors, namely the effective combination of the DNN and DCF models, complementarity of global and local features, and use of the optimal feature sets selected by the integrative feature selection algorithm.

AVAILABILITY AND IMPLEMENTATION

https://github.com/csbio-njust-edu/PScL-DDCFPred.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质亚细胞定位的特征描述已成为生物信息学和计算生物学中的一项重要且长期的任务,它为阐明蛋白质的各种细胞功能和指导药物设计提供了有价值的信息。

结果

在这里,我们开发了一种新的基于生物图像的计算方法,称为 PScL-DDCFPred,用于准确预测人类组织中的蛋白质亚细胞定位。PScL-DDCFPred 首先提取多视图图像特征,包括全局和局部特征,作为基本或纯特征;接下来,它应用一种新的基于逐步判别分析和广义判别分析的综合特征选择方法,从提取的纯特征中识别出最佳特征集;最后,建立基于深度神经网络(DNN)和深度级联森林(DCF)的分类器。在从人类蛋白质图谱数据库构建的新蛋白质亚细胞定位训练数据集上进行严格的 10 折交叉验证测试表明,PScL-DDCFPred 优于几种现有的最先进方法。此外,独立测试集进一步说明了 PScL-DDCFPred 相对于现有预测器的泛化能力和优越性。深入分析表明,PScL-DDCFPred 的优异性能可归因于三个关键因素,即 DNN 和 DCF 模型的有效结合、全局和局部特征的互补性以及综合特征选择算法选择的最佳特征集的使用。

可用性和实现

https://github.com/csbio-njust-edu/PScL-DDCFPred。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

引用本文的文献

本文引用的文献

9
Toward mapping the human body at a cellular resolution.致力于以细胞分辨率绘制人体图谱。
Mol Biol Cell. 2018 Aug 1;29(15):1779-1785. doi: 10.1091/mbc.E18-04-0260.
10
Bioimage Classification with Handcrafted and Learned Features.基于手工特征和学习特征的生物图像分类
IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar 30. doi: 10.1109/TCBB.2018.2821127.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验