Suppr超能文献

PScL-HDeep:基于图像的人类组织蛋白亚细胞定位预测,使用基于手工和深度学习特征的两层特征选择的集成学习方法。

PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection.

机构信息

Nanjing University of Science and Technology, China.

School of Computer Science and Engineering, Nanjing University of Science and Technology, China.

出版信息

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab278.

Abstract

Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. However, most existing methods have limited capability in terms of the overall accuracy, time consumption and generalization power. To address these problems, in this study, we developed a novel computational approach based on human protein atlas (HPA) data, referred to as PScL-HDeep, for accurate and efficient image-based prediction of protein subcellular location in human tissues. We extracted different handcrafted and deep learned (by employing pretrained deep learning model) features from different viewpoints of the image. The step-wise discriminant analysis (SDA) algorithm was applied to generate the optimal feature set from each original raw feature set. To further obtain a more informative feature subset, support vector machine-based recursive feature elimination with correlation bias reduction (SVM-RFE + CBR) feature selection algorithm was applied to the integrated feature set. Finally, the classification models, namely support vector machine with radial basis function (SVM-RBF) and support vector machine with linear kernel (SVM-LNR), were learned on the final selected feature set. To evaluate the performance of the proposed method, a new gold standard benchmark training dataset was constructed from the HPA databank. PScL-HDeep achieved the maximum performance on 10-fold cross validation test on this dataset and showed a better efficacy over existing predictors. Furthermore, we also illustrated the generalization ability of the proposed method by conducting a stringent independent validation test.

摘要

蛋白质亚细胞定位在描述蛋白质功能和理解各种细胞过程中起着至关重要的作用。因此,准确识别蛋白质亚细胞位置是一项重要但具有挑战性的任务。已经提出了许多计算方法来预测蛋白质的亚细胞位置。然而,大多数现有的方法在整体准确性、时间消耗和泛化能力方面都有一定的局限性。为了解决这些问题,在本研究中,我们开发了一种基于人类蛋白质图谱(HPA)数据的新型计算方法,称为 PScL-HDeep,用于准确、高效地预测人类组织中蛋白质的亚细胞位置。我们从图像的不同视角提取了不同的手工和深度学习(通过使用预先训练的深度学习模型)特征。逐步判别分析(SDA)算法被应用于从每个原始原始特征集中生成最优特征集。为了进一步获得更具信息量的特征子集,基于支持向量机的递归特征消除与相关偏置减少(SVM-RFE+CBR)特征选择算法被应用于集成特征集。最后,支持向量机的径向基函数(SVM-RBF)和支持向量机的线性核(SVM-LNR)的分类模型被应用于最终选择的特征集上进行学习。为了评估所提出方法的性能,我们从 HPA 数据库构建了一个新的黄金标准基准训练数据集。PScL-HDeep 在该数据集上的 10 折交叉验证测试中达到了最大性能,并在现有预测器中表现出更好的效果。此外,我们还通过进行严格的独立验证测试说明了该方法的泛化能力。

相似文献

4
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.
BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.
7
TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning.
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1419-1429. doi: 10.1109/TCBB.2019.2893634. Epub 2019 Jan 18.
9
Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.
10
An efficient model selection for linear discriminant function-based recursive feature elimination.
J Biomed Inform. 2022 May;129:104070. doi: 10.1016/j.jbi.2022.104070. Epub 2022 Apr 15.

引用本文的文献

1
4
A Review for Artificial Intelligence Based Protein Subcellular Localization.
Biomolecules. 2024 Mar 27;14(4):409. doi: 10.3390/biom14040409.
5
Leveraging a meta-learning approach to advance the accuracy of Na blocking peptides prediction.
Sci Rep. 2024 Feb 23;14(1):4463. doi: 10.1038/s41598-024-55160-z.
7
Empirical comparison and analysis of machine learning-based approaches for druggable protein identification.
EXCLI J. 2023 Aug 29;22:915-927. doi: 10.17179/excli2023-6410. eCollection 2023.
9
TROLLOPE: A novel sequence-based stacked approach for the accelerated discovery of linear T-cell epitopes of hepatitis C virus.
PLoS One. 2023 Aug 25;18(8):e0290538. doi: 10.1371/journal.pone.0290538. eCollection 2023.

本文引用的文献

3
Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks.
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1966-1980. doi: 10.1109/TCBB.2019.2917429. Epub 2020 Dec 8.
4
TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning.
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1419-1429. doi: 10.1109/TCBB.2019.2893634. Epub 2019 Jan 18.
5
Bioimage Classification with Handcrafted and Learned Features.
IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar 30. doi: 10.1109/TCBB.2018.2821127.
7
ATPbind: Accurate Protein-ATP Binding Site Prediction by Combining Sequence-Profiling and Structure-Based Comparisons.
J Chem Inf Model. 2018 Feb 26;58(2):501-510. doi: 10.1021/acs.jcim.7b00397. Epub 2018 Feb 8.
8
iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier.
Genomics. 2018 Sep;110(5):239-246. doi: 10.1016/j.ygeno.2017.10.008. Epub 2017 Nov 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验