基于信号和同源性方法级联融合的快速亚细胞定位。

Fast subcellular localization by cascaded fusion of signal-based and homology-based methods.

机构信息

Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong.

出版信息

Proteome Sci. 2011 Oct 14;9 Suppl 1(Suppl 1):S8. doi: 10.1186/1477-5956-9-S1-S8.

DOI:10.1186/1477-5956-9-S1-S8

PMID:22166017

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3289086/

Abstract

BACKGROUND

The functions of proteins are closely related to their subcellular locations. In the post-genomics era, the amount of gene and protein data grows exponentially, which necessitates the prediction of subcellular localization by computational means.

RESULTS

This paper proposes mitigating the computation burden of alignment-based approaches to subcellular localization prediction by a cascaded fusion of cleavage site prediction and profile alignment. Specifically, the informative segments of protein sequences are identified by a cleavage site predictor using the information in their N-terminal shorting signals. Then, the sequences are truncated at the cleavage site positions, and the shortened sequences are passed to PSI-BLAST for computing their profiles. Subcellular localization are subsequently predicted by a profile-to-profile alignment support-vector-machine (SVM) classifier. To further reduce the training and recognition time of the classifier, the SVM classifier is replaced by a new kernel method based on the perturbational discriminant analysis (PDA).

CONCLUSIONS

Experimental results on a new dataset based on Swiss-Prot Release 57.5 show that the method can make use of the best property of signal- and homology-based approaches and can attain an accuracy comparable to that achieved by using full-length sequences. Analysis of profile-alignment score matrices suggest that both profile creation time and profile alignment time can be reduced without significant reduction in subcellular localization accuracy. It was found that PDA enjoys a short training time as compared to the conventional SVM. We advocate that the method will be important for biologists to conduct large-scale protein annotation or for bioinformaticians to perform preliminary investigations on new algorithms that involve pairwise alignments.

摘要

背景

蛋白质的功能与其亚细胞定位密切相关。在后基因组时代，基因和蛋白质数据的数量呈指数级增长，这就需要通过计算手段来预测亚细胞定位。

结果

本文提出了一种通过切割位点预测和序列比对级联融合来减轻基于比对的亚细胞定位预测计算负担的方法。具体来说，使用序列的 N 端短信号中的信息，通过切割位点预测器识别蛋白质序列的信息片段。然后，在切割位点位置截断序列，并将缩短的序列传递给 PSI-BLAST 计算它们的轮廓。随后，通过基于轮廓到轮廓比对的支持向量机（SVM）分类器预测亚细胞定位。为了进一步减少分类器的训练和识别时间，用一种基于扰动判别分析（PDA）的新核方法代替 SVM 分类器。

结论

在基于 Swiss-Prot Release 57.5 的新数据集上的实验结果表明，该方法可以利用信号和同源性方法的最佳特性，并且可以达到与使用全长序列相当的精度。对轮廓比对得分矩阵的分析表明，在不显著降低亚细胞定位精度的情况下，可以减少轮廓创建时间和轮廓比对时间。与传统的 SVM 相比，PDA 具有较短的训练时间。我们主张该方法对于生物学家进行大规模蛋白质注释或生物信息学家进行涉及两两比对的新算法的初步研究将是重要的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ff/3289086/83aaf727de95/1477-5956-9-S1-S8-1.jpg

相似文献

Fast subcellular localization by cascaded fusion of signal-based and homology-based methods.基于信号和同源性方法级联融合的快速亚细胞定位。

Proteome Sci. 2011 Oct 14;9 Suppl 1(Suppl 1):S8. doi: 10.1186/1477-5956-9-S1-S8.

Prediction of protein subcellular localization.蛋白质亚细胞定位预测

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization.ProLoc-GO：利用信息丰富的基因本体术语进行基于序列的蛋白质亚细胞定位预测。

BMC Bioinformatics. 2008 Feb 1;9:80. doi: 10.1186/1471-2105-9-80.

PairProSVM: protein subcellular localization based on local pairwise profile alignment and SVM.PairProSVM：基于局部两两轮廓比对和支持向量机的蛋白质亚细胞定位

IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):416-22. doi: 10.1109/TCBB.2007.70256.

Protein subcellular localization prediction using multiple kernel learning based support vector machine.基于多核学习支持向量机的蛋白质亚细胞定位预测

Mol Biosyst. 2017 Mar 28;13(4):785-795. doi: 10.1039/c6mb00860g.

Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.基于概率潜在语义索引的核转位信号预测核蛋白。

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.

Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。

J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.

Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。

Proc IEEE Comput Syst Bioinform Conf. 2004:152-60. doi: 10.1109/csb.2004.1332428.

FGsub: Fusarium graminearum protein subcellular localizations predicted from primary structures.FGsub：根据一级结构预测的禾谷镰刀菌蛋白质亚细胞定位

BMC Syst Biol. 2010 Sep 13;4 Suppl 2(Suppl 2):S12. doi: 10.1186/1752-0509-4-S2-S12.

Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

本文引用的文献

Semi-supervised protein subcellular localization.半监督蛋白质亚细胞定位

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S47. doi: 10.1186/1471-2105-10-S1-S47.

PairProSVM: protein subcellular localization based on local pairwise profile alignment and SVM.PairProSVM：基于局部两两轮廓比对和支持向量机的蛋白质亚细胞定位

IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):416-22. doi: 10.1109/TCBB.2007.70256.

Recent progress in protein subcellular location prediction.蛋白质亚细胞定位预测的最新进展。

Anal Biochem. 2007 Nov 1;370(1):1-16. doi: 10.1016/j.ab.2007.07.006. Epub 2007 Jul 12.

WoLF PSORT: protein localization predictor.WoLF PSORT：蛋白质定位预测工具。

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W585-7. doi: 10.1093/nar/gkm259. Epub 2007 May 21.

Locating proteins in the cell using TargetP, SignalP and related tools.使用TargetP、SignalP及相关工具在细胞中定位蛋白质。

Nat Protoc. 2007;2(4):953-71. doi: 10.1038/nprot.2007.131.

Predicting subcellular localization via protein motif co-occurrence.通过蛋白质基序共现预测亚细胞定位。

Genome Res. 2004 Oct;14(10A):1957-66. doi: 10.1101/gr.2650004.

Improved prediction of signal peptides: SignalP 3.0.信号肽预测的改进：SignalP 3.0

J Mol Biol. 2004 Jul 16;340(4):783-95. doi: 10.1016/j.jmb.2004.05.028.

Predicting subcellular localization of proteins using machine-learned classifiers.使用机器学习分类器预测蛋白质的亚细胞定位。

Bioinformatics. 2004 Mar 1;20(4):547-56. doi: 10.1093/bioinformatics/btg447. Epub 2004 Jan 22.

Prediction of protein subcellular locations using fuzzy k-NN method.使用模糊k近邻法预测蛋白质亚细胞定位。

Bioinformatics. 2004 Jan 1;20(1):21-8. doi: 10.1093/bioinformatics/btg366.

Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.利用氨基酸组成和氨基酸对，通过支持向量机预测蛋白质亚细胞定位。

Bioinformatics. 2003 Sep 1;19(13):1656-63. doi: 10.1093/bioinformatics/btg222.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于信号和同源性方法级联融合的快速亚细胞定位。

Fast subcellular localization by cascaded fusion of signal-based and homology-based methods.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献