• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于氨基酸分类的光谱核融合的蛋白质亚核定位。

Amino acid classification based spectrum kernel fusion for protein subnuclear localization.

机构信息

Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, PR China.

出版信息

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-11-S1-S17.

DOI:10.1186/1471-2105-11-S1-S17
PMID:20122188
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3009488/
Abstract

BACKGROUND

Prediction of protein localization in subnuclear organelles is more challenging than general protein subcelluar localization. There are only three computational models for protein subnuclear localization thus far, to the best of our knowledge. Two models were based on protein primary sequence only. The first model assumed homogeneous amino acid substitution pattern across all protein sequence residue sites and used BLOSUM62 to encode k-mer of protein sequence. Ensemble of SVM based on different k-mers drew the final conclusion, achieving 50% overall accuracy. The simplified assumption did not exploit protein sequence profile and ignored the fact of heterogeneous amino acid substitution patterns across sites. The second model derived the PsePSSM feature representation from protein sequence by simply averaging the profile PSSM and combined the PseAA feature representation to construct a kNN ensemble classifier Nuc-PLoc, achieving 67.4% overall accuracy. The two models based on protein primary sequence only both achieved relatively poor predictive performance. The third model required that GO annotations be available, thus restricting the model's applicability.

METHODS

In this paper, we only use the amino acid information of protein sequence without any other information to design a widely-applicable model for protein subnuclear localization. We use K-spectrum kernel to exploit the contextual information around an amino acid and the conserved motif information. Besides expanding window size, we adopt various amino acid classification approaches to capture diverse aspects of amino acid physiochemical properties. Each amino acid classification generates a series of spectrum kernels based on different window size. Thus, (I) window expansion can capture more contextual information and cover size-varying motifs; (II) various amino acid classifications can exploit multi-aspect biological information from the protein sequence. Finally, we combine all the spectrum kernels by simple addition into one single kernel called SpectrumKernel+ for protein subnuclear localization.

RESULTS

We conduct the performance evaluation experiments on two benchmark datasets: Lei and Nuc-PLoc. Experimental results show that SpectrumKernel+ achieves substantial performance improvement against the previous model Nuc-PLoc, with overall accuracy 83.47% against 67.4%; and 71.23% against 50% of Lei SVM Ensemble, against 66.50% of Lei GO SVM Ensemble.

CONCLUSION

The method SpectrumKernel+ can exploit rich amino acid information of protein sequence by embedding into implicit size-varying motifs the multi-aspect amino acid physiochemical properties captured by amino acid classification approaches. The kernels derived from diverse amino acid classification approaches and different sizes of k-mer are summed together for data integration. Experiments show that the method SpectrumKernel+ significantly outperforms the existing models for protein subnuclear localization.

摘要

背景

亚核细胞器的蛋白质定位预测比一般的蛋白质亚细胞定位更具挑战性。据我们所知,目前只有三种用于蛋白质亚核定位的计算模型。前两个模型仅基于蛋白质的一级序列。第一个模型假设所有蛋白质序列残基位点的氨基酸替代模式都是均匀的,并使用 BLOSUM62 对蛋白质序列的 k-mer 进行编码。基于不同 k-mer 的 SVM 集成得出最终结论,总体准确率为 50%。这种简化的假设没有利用蛋白质序列的轮廓,也忽略了不同位点氨基酸替代模式不均匀的事实。第二个模型通过简单地对蛋白质序列的轮廓 PSSM 求平均值来从蛋白质序列中导出 PsePSSM 特征表示,并结合 PseAA 特征表示来构建 kNN 集成分类器 Nuc-PLoc,总体准确率为 67.4%。这两个仅基于蛋白质一级序列的模型都取得了相对较差的预测性能。第三个模型需要有 GO 注释,因此限制了模型的适用性。

方法

在本文中,我们仅使用蛋白质序列的氨基酸信息,而不使用任何其他信息,为蛋白质亚核定位设计了一种广泛适用的模型。我们使用 K-光谱核来利用氨基酸周围的上下文信息和保守基序信息。除了扩展窗口大小外,我们还采用了各种氨基酸分类方法来捕捉氨基酸理化性质的不同方面。每种氨基酸分类方法都会根据不同的窗口大小生成一系列光谱核。因此,(I)窗口扩展可以捕获更多的上下文信息,并涵盖大小变化的基序;(II)各种氨基酸分类方法可以从蛋白质序列中利用多方面的生物信息。最后,我们将所有光谱核通过简单的加和组合成一个单一的光谱核 SpectrumKernel+,用于蛋白质亚核定位。

结果

我们在两个基准数据集 Lei 和 Nuc-PLoc 上进行了性能评估实验。实验结果表明,SpectrumKernel+相对于之前的模型 Nuc-PLoc 取得了实质性的性能提升,总体准确率为 83.47%,比 67.4%提高了 16.07%;与 Lei SVM Ensemble 的 71.23%相比,比 66.50%提高了 4.73%。

结论

方法 SpectrumKernel+可以通过将多方面的氨基酸理化性质嵌入到氨基酸分类方法所捕获的隐含大小变化的基序中,来利用蛋白质序列丰富的氨基酸信息。从不同的氨基酸分类方法和不同大小的 k-mer 中派生的核函数被加和在一起进行数据集成。实验表明,该方法 SpectrumKernel+显著优于现有的蛋白质亚核定位模型。

相似文献

1
Amino acid classification based spectrum kernel fusion for protein subnuclear localization.基于氨基酸分类的光谱核融合的蛋白质亚核定位。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-11-S1-S17.
2
An ensemble method for predicting subnuclear localizations from primary protein structures.一种基于原始蛋白质结构预测亚核定位的集成方法。
PLoS One. 2013;8(2):e57225. doi: 10.1371/journal.pone.0057225. Epub 2013 Feb 27.
3
An SVM-based system for predicting protein subnuclear localizations.一种基于支持向量机的蛋白质亚核定位预测系统。
BMC Bioinformatics. 2005 Dec 7;6:291. doi: 10.1186/1471-2105-6-291.
4
Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM.Nuc-PLoc:一种通过融合伪氨基酸组成和伪位置特异性得分矩阵来预测蛋白质亚核定位的新型网络服务器。
Protein Eng Des Sel. 2007 Nov;20(11):561-7. doi: 10.1093/protein/gzm057. Epub 2007 Nov 10.
5
Gene ontology based transfer learning for protein subcellular localization.基于基因本体论的蛋白质亚细胞定位迁移学习。
BMC Bioinformatics. 2011 Feb 2;12:44. doi: 10.1186/1471-2105-12-44.
6
Using Chou's pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location.基于近似熵的周氏伪氨基酸组成和AdaBoost分类器集成来预测蛋白质亚核定位。
Amino Acids. 2008 May;34(4):669-75. doi: 10.1007/s00726-008-0034-9. Epub 2008 Feb 7.
7
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
8
Predicting protein subnuclear localization using GO-amino-acid composition features.利用基因本体论-氨基酸组成特征预测蛋白质亚核定位
Biosystems. 2009 Nov;98(2):73-9. doi: 10.1016/j.biosystems.2009.06.007. Epub 2009 Jul 5.
9
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Hum-PLoc:一种用于预测人类蛋白质亚细胞定位的新型集成分类器。
Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.
10
Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。
J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.

引用本文的文献

1
Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review.智能蛋白质设计与分子特征技术:全面综述。
Molecules. 2023 Nov 30;28(23):7865. doi: 10.3390/molecules28237865.
2
Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification.基于联合核学习模型的多组学数据融合用于癌症亚型发现和关键基因识别
Front Genet. 2021 Mar 4;12:647141. doi: 10.3389/fgene.2021.647141. eCollection 2021.
3
Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks.

本文引用的文献

1
A method to improve protein subcellular localization prediction by integrating various biological data sources.一种通过整合各种生物数据源来改进蛋白质亚细胞定位预测的方法。
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S43. doi: 10.1186/1471-2105-10-S1-S43.
2
Improved prediction of malaria degradomes by supervised learning with SVM and profile kernel.通过支持向量机和轮廓核的监督学习改进疟疾降解组的预测。
Genetica. 2009 May;136(1):189-209. doi: 10.1007/s10709-008-9336-9. Epub 2008 Dec 6.
3
Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species.
利用蛋白质相互作用网络预测蛋白质亚线粒体定位
Iran J Biotechnol. 2018 Aug 11;16(3):e1933. doi: 10.15171/ijb.1933. eCollection 2018 Aug.
4
Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA.基于有效融合表示和降维算法LDA的蛋白质亚核定位
Int J Mol Sci. 2015 Dec 19;16(12):30343-61. doi: 10.3390/ijms161226237.
5
Protein sub-nuclear localization prediction using SVM and Pfam domain information.利用支持向量机和Pfam结构域信息进行蛋白质亚核定位预测。
PLoS One. 2014 Jun 4;9(6):e98345. doi: 10.1371/journal.pone.0098345. eCollection 2014.
6
Frequency of dipeptides and antidipeptides.二肽和反二肽的频率。
Comput Struct Biotechnol J. 2013 Aug 14;8:e201308001. doi: 10.5936/csbj.201308001. eCollection 2013.
7
Efficient and interpretable prediction of protein functional classes by correspondence analysis and compact set relations.基于对应分析和紧致集关系的高效可解释蛋白质功能类预测。
PLoS One. 2013 Oct 11;8(10):e75542. doi: 10.1371/journal.pone.0075542. eCollection 2013.
8
An ensemble method for predicting subnuclear localizations from primary protein structures.一种基于原始蛋白质结构预测亚核定位的集成方法。
PLoS One. 2013;8(2):e57225. doi: 10.1371/journal.pone.0057225. Epub 2013 Feb 27.
9
Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model.通过将序列进化信息纳入伪氨基酸组成,利用灰色系统模型预测疟原虫分泌蛋白。
PLoS One. 2012;7(11):e49040. doi: 10.1371/journal.pone.0049040. Epub 2012 Nov 26.
10
Multi-label multi-kernel transfer learning for human protein subcellular localization.多标签多内核迁移学习在人类蛋白质亚细胞定位中的应用。
PLoS One. 2012;7(6):e37716. doi: 10.1371/journal.pone.0037716. Epub 2012 Jun 13.
蛋白质网络显著提高了对多种真核生物亚细胞定位的预测能力。
Nucleic Acids Res. 2008 Nov;36(20):e136. doi: 10.1093/nar/gkn619. Epub 2008 Oct 4.
4
PairProSVM: protein subcellular localization based on local pairwise profile alignment and SVM.PairProSVM:基于局部两两轮廓比对和支持向量机的蛋白质亚细胞定位
IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):416-22. doi: 10.1109/TCBB.2007.70256.
5
Predicting sub-Golgi localization of type II membrane proteins.预测II型膜蛋白的高尔基体亚结构定位
Bioinformatics. 2008 Aug 15;24(16):1779-86. doi: 10.1093/bioinformatics/btn309. Epub 2008 Jun 18.
6
Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms.Cell-PLoc:用于预测多种生物体中蛋白质亚细胞定位的一组网络服务器程序包。
Nat Protoc. 2008;3(2):153-62. doi: 10.1038/nprot.2007.494.
7
An overview of statistical learning theory.统计学习理论概述。
IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.
8
Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM.Nuc-PLoc:一种通过融合伪氨基酸组成和伪位置特异性得分矩阵来预测蛋白质亚核定位的新型网络服务器。
Protein Eng Des Sel. 2007 Nov;20(11):561-7. doi: 10.1093/protein/gzm057. Epub 2007 Nov 10.
9
Protein homology detection and fold inference through multiple alignment entropy profiles.通过多序列比对熵谱进行蛋白质同源性检测和折叠推断。
Proteins. 2008 Jan 1;70(1):248-56. doi: 10.1002/prot.21506.
10
Prediction of subcellular protein localization based on functional domain composition.基于功能域组成预测亚细胞蛋白质定位
Biochem Biophys Res Commun. 2007 Jun 1;357(2):366-70. doi: 10.1016/j.bbrc.2007.03.139. Epub 2007 Apr 2.