• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多视图特征融合的蛋白质亚细胞定位预测。

Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.

机构信息

College of Information Science and Engineering, Hunan University, Changsha 410082, China.

School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.

出版信息

Molecules. 2019 Mar 6;24(5):919. doi: 10.3390/molecules24050919.

DOI:10.3390/molecules24050919
PMID:30845684
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6429470/
Abstract

The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-throughput sequencing technologies and proteomic methods, the protein sequences of numerous yeasts have become publicly available, which enables us to computationally predict yeast protein subcellular localization. However, widely-used protein sequence representation techniques, such as amino acid composition and the Chou's pseudo amino acid composition (PseAAC), are difficult in extracting adequate information about the interactions between residues and position distribution of each residue. Therefore, it is still urgent to develop novel sequence representations. In this study, we have presented two novel protein sequence representation techniques including Generalized Chaos Game Representation (GCGR) based on the frequency and distributions of the residues in the protein primary sequence, and novel statistics and information theory (NSI) reflecting local position information of the sequence. In the GCGR + NSI representation, a protein primary sequence is simply represented by a 5-dimensional feature vector, while other popular methods like PseAAC and dipeptide adopt features of more than hundreds of dimensions. In practice, the feature representation is highly efficient in predicting protein subcellular localization. Even without using machine learning-based classifiers, a simple model based on the feature vector can achieve prediction accuracies of 0.8825 and 0.7736 respectively for the CL317 and ZW225 datasets. To further evaluate the effectiveness of the proposed encoding schemes, we introduce a multi-view features-based method to combine the two above-mentioned features with other well-known features including PseAAC and dipeptide composition, and use support vector machine as the classifier to predict protein subcellular localization. This novel model achieves prediction accuracies of 0.927 and 0.871 respectively for the CL317 and ZW225 datasets, better than other existing methods in the jackknife tests. The results suggest that the GCGR and NSI features are useful complements to popular protein sequence representations in predicting yeast protein subcellular localization. Finally, we validate a few newly predicted protein subcellular localizations by evidences from some published articles in authority journals and books.

摘要

蛋白质亚细胞定位的预测对于推断蛋白质功能、基因调控和蛋白质-蛋白质相互作用至关重要。随着高通量测序技术和蛋白质组学方法的进步,许多酵母的蛋白质序列已经公开可用,这使得我们能够计算预测酵母蛋白质亚细胞定位。然而,广泛使用的蛋白质序列表示技术,如氨基酸组成和周的伪氨基酸组成(PseAAC),很难提取关于残基之间相互作用和每个残基位置分布的足够信息。因此,开发新的序列表示仍然是当务之急。在这项研究中,我们提出了两种新的蛋白质序列表示技术,包括基于蛋白质一级序列中残基频率和分布的广义混沌游戏表示(GCGR),以及反映序列局部位置信息的新统计和信息理论(NSI)。在 GCGR + NSI 表示中,蛋白质一级序列简单地表示为 5 维特征向量,而其他流行的方法,如 PseAAC 和二肽,采用的特征维度超过数百个。在实践中,特征表示在预测蛋白质亚细胞定位方面非常高效。即使不使用基于机器学习的分类器,仅基于特征向量的简单模型也可以分别为 CL317 和 ZW225 数据集实现 0.8825 和 0.7736 的预测精度。为了进一步评估所提出的编码方案的有效性,我们引入了一种多视图特征方法,将上述两种特征与其他著名特征(包括 PseAAC 和二肽组成)相结合,并使用支持向量机作为分类器来预测蛋白质亚细胞定位。这个新模型分别为 CL317 和 ZW225 数据集实现了 0.927 和 0.871 的预测精度,在交叉验证测试中优于其他现有方法。结果表明,GCGR 和 NSI 特征在预测酵母蛋白质亚细胞定位方面是流行的蛋白质序列表示的有用补充。最后,我们通过一些权威期刊和书籍上发表的文章中的证据验证了一些新预测的蛋白质亚细胞定位。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/bfb8d70bec14/molecules-24-00919-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/317860f43243/molecules-24-00919-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/fb543eb9c4bd/molecules-24-00919-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/c94e289f3c39/molecules-24-00919-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/e3db09917247/molecules-24-00919-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/bfb8d70bec14/molecules-24-00919-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/317860f43243/molecules-24-00919-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/fb543eb9c4bd/molecules-24-00919-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/c94e289f3c39/molecules-24-00919-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/e3db09917247/molecules-24-00919-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34a8/6429470/bfb8d70bec14/molecules-24-00919-g005.jpg

相似文献

1
Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.基于多视图特征融合的蛋白质亚细胞定位预测。
Molecules. 2019 Mar 6;24(5):919. doi: 10.3390/molecules24050919.
2
Predicting apoptosis protein subcellular localization by integrating auto-cross correlation and PSSM into Chou's PseAAC.通过将自相关和 PSSM 整合到 Chou 的 PseAAC 中,预测细胞凋亡蛋白的亚细胞定位。
J Theor Biol. 2018 Nov 14;457:163-169. doi: 10.1016/j.jtbi.2018.08.042. Epub 2018 Sep 1.
3
Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC.基于过采样方法和周式广义伪氨基酸组成预测蛋白质亚细胞定位
J Theor Biol. 2018 Jan 21;437:239-250. doi: 10.1016/j.jtbi.2017.10.030. Epub 2017 Oct 31.
4
Prediction of protein subcellular multi-localization based on the general form of Chou's pseudo amino acid composition.基于周氏伪氨基酸组成通用形式的蛋白质亚细胞多定位预测
Protein Pept Lett. 2012 Apr;19(4):375-87. doi: 10.2174/092986612799789369.
5
CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition.CE-PLoc:一种通过融合不同模式的伪氨基酸组成来预测蛋白质亚细胞位置的集成分类器。
Comput Biol Chem. 2011 Aug 10;35(4):218-29. doi: 10.1016/j.compbiolchem.2011.05.003. Epub 2011 May 27.
6
Accurate prediction of subcellular location of apoptosis proteins combining Chou's PseAAC and PsePSSM based on wavelet denoising.基于小波去噪结合周氏伪氨基酸组成和伪位置特异性得分矩阵对凋亡蛋白亚细胞定位的准确预测
Oncotarget. 2017 Nov 21;8(64):107640-107665. doi: 10.18632/oncotarget.22585. eCollection 2017 Dec 8.
7
Accurate prediction of multi-label protein subcellular localization through multi-view feature learning with RBRL classifier.通过使用 RBRL 分类器的多视图特征学习实现多标签蛋白质亚细胞定位的准确预测。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab012.
8
Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.基于概率潜在语义索引的核转位信号预测核蛋白。
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.
9
The effect of three novel feature extraction methods on the prediction of the subcellular localization of multi-site virus proteins.三种新型特征提取方法对多定位点病毒蛋白亚细胞定位预测的影响。
Bioengineered. 2018 Jan 1;9(1):196-202. doi: 10.1080/21655979.2017.1373536. Epub 2017 Nov 22.
10
EuLoc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou's PseAAC.EuLoc:一个通过将序列片段的各种特征纳入到 Chou 的 PseAAC 的通用形式中,从而准确预测真核生物蛋白质亚细胞定位的网络服务器。
J Comput Aided Mol Des. 2013 Jan;27(1):91-103. doi: 10.1007/s10822-012-9628-0. Epub 2013 Jan 3.

引用本文的文献

1
Protein subcellular localization prediction tools.蛋白质亚细胞定位预测工具。
Comput Struct Biotechnol J. 2024 Apr 15;23:1796-1807. doi: 10.1016/j.csbj.2024.04.032. eCollection 2024 Dec.
2
A Review for Artificial Intelligence Based Protein Subcellular Localization.基于人工智能的蛋白质亚细胞定位研究综述
Biomolecules. 2024 Mar 27;14(4):409. doi: 10.3390/biom14040409.
3
Caseins: Versatility of Their Micellar Organization in Relation to the Functional and Nutritional Properties of Milk.-caseins:乳中胶束组织的多功能性及其与牛奶的功能和营养特性的关系。

本文引用的文献

1
Identification of essential yeast genes involved in polyamine resistance.参与多胺抗性的酵母必需基因的鉴定。
Gene. 2018 Nov 30;677:361-369. doi: 10.1016/j.gene.2018.08.066. Epub 2018 Aug 25.
2
HPSLPred: An Ensemble Multi-Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source.HPSLPred:一种用于人类蛋白质亚细胞定位预测的集成多标签分类器,源数据不均衡。
Proteomics. 2017 Sep;17(17-18). doi: 10.1002/pmic.201700262.
3
Translate to divide: сontrol of the cell cycle by protein synthesis.翻译为“划分”:通过蛋白质合成对细胞周期进行控制。 (不过原英文表述不太准确规范,正确可能是“Translation to divide: control of the cell cycle by protein synthesis.” 更准确译文:翻译为“划分”:蛋白质合成对细胞周期的控制。 ) 但按要求严格只给出上述译文
Molecules. 2023 Feb 21;28(5):2023. doi: 10.3390/molecules28052023.
4
Recent Advances in the Prediction of Subcellular Localization of Proteins and Related Topics.蛋白质亚细胞定位预测及相关主题的最新进展
Front Bioinform. 2022 May 19;2:910531. doi: 10.3389/fbinf.2022.910531. eCollection 2022.
5
Gm-PLoc: A Subcellular Localization Model of Multi-Label Protein Based on GAN and DeepFM.Gm-PLoc:一种基于生成对抗网络(GAN)和深度因子分解机(DeepFM)的多标签蛋白质亚细胞定位模型
Front Genet. 2022 Jun 15;13:912614. doi: 10.3389/fgene.2022.912614. eCollection 2022.
6
Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization.用于植物蛋白质亚细胞定位多标签分类的多个分类器集成
Life (Basel). 2021 Mar 30;11(4):293. doi: 10.3390/life11040293.
7
Bird Eye View of Protein Subcellular Localization Prediction.蛋白质亚细胞定位预测鸟瞰图
Life (Basel). 2020 Dec 14;10(12):347. doi: 10.3390/life10120347.
8
A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector.一种蛋白质的新型数值表示:三维混沌博弈表示及其扩展自然向量。
Comput Struct Biotechnol J. 2020 Jul 15;18:1904-1913. doi: 10.1016/j.csbj.2020.07.004. eCollection 2020.
9
Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA.基于进化信息和 LDA 的两种新特征提取方法对凋亡蛋白的亚细胞定位预测
BMC Bioinformatics. 2020 May 24;21(1):212. doi: 10.1186/s12859-020-3539-1.
10
HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection.HMMPred:基于 HMM 轮廓和 XGBoost 特征选择的 DNA 结合蛋白精确预测。
Comput Math Methods Med. 2020 Mar 28;2020:1384749. doi: 10.1155/2020/1384749. eCollection 2020.
Microb Cell. 2015 Mar 20;2(4):94-104. doi: 10.15698/mic2015.04.198.
4
Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC.通过将基于进化的描述符纳入周的通用伪氨基酸组成来预测革兰氏阳性和革兰氏阴性蛋白质的亚细胞定位
J Theor Biol. 2015 Jan 7;364:284-94. doi: 10.1016/j.jtbi.2014.09.029. Epub 2014 Sep 28.
5
CELLO2GO: a web server for protein subCELlular LOcalization prediction with functional gene ontology annotation.CELLO2GO:一个用于蛋白质亚细胞定位预测并带有功能基因本体注释的网络服务器。
PLoS One. 2014 Jun 9;9(6):e99368. doi: 10.1371/journal.pone.0099368. eCollection 2014.
6
HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.HybridGO-Loc:在基因本体论上挖掘混合特征以预测多定位蛋白质的亚细胞定位。
PLoS One. 2014 Mar 19;9(3):e89545. doi: 10.1371/journal.pone.0089545. eCollection 2014.
7
Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.通过多视角特征的并行融合提高膜蛋白亚细胞定位预测。
IEEE Trans Nanobioscience. 2012 Dec;11(4):375-85. doi: 10.1109/TNB.2012.2208473. Epub 2012 Aug 3.
8
Predicting apoptosis protein subcellular location with PseAAC by incorporating tripeptide composition.通过结合三肽组成,利用伪氨基酸组成预测凋亡蛋白的亚细胞定位。
Protein Pept Lett. 2011 Nov;18(11):1086-92. doi: 10.2174/092986611797200931.
9
Gene ontology based transfer learning for protein subcellular localization.基于基因本体论的蛋白质亚细胞定位迁移学习。
BMC Bioinformatics. 2011 Feb 2;12:44. doi: 10.1186/1471-2105-12-44.
10
TESTLoc: protein subcellular localization prediction from EST data.TESTLoc:从 EST 数据预测蛋白质亚细胞定位。
BMC Bioinformatics. 2010 Nov 15;11:563. doi: 10.1186/1471-2105-11-563.