基于特定区室特征和结构保守性的蛋白质亚细胞定位预测

Protein subcellular localization prediction based on compartment-specific features and structure conservation.

作者信息

Su Emily Chia-Yu, Chiu Hua-Sheng, Lo Allan, Hwang Jenn-Kang, Sung Ting-Yi, Hsu Wen-Lian

机构信息

Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan.

出版信息

BMC Bioinformatics. 2007 Sep 8;8:330. doi: 10.1186/1471-2105-8-330.

DOI:10.1186/1471-2105-8-330

PMID:17825110

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2040162/

Abstract

BACKGROUND

Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins.

RESULTS

We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%.

CONCLUSION

Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant improvement. The biological features are interpretable and can be applied in advanced analyses and experimental designs. Moreover, the overall accuracy of combining the structural homology approach is further improved, which suggests that structural conservation could be a useful indicator for inferring localization in addition to sequence homology. The proposed method can be used in large-scale analyses of proteomes.

摘要

背景

蛋白质亚细胞定位对于基因组注释、蛋白质功能预测和药物发现至关重要。使用实验方法确定亚细胞定位耗时；因此，计算方法变得非常必要。对定位预测的广泛研究导致了几种方法的发展，包括基于组成和基于同源性的方法。然而，如果未检测到同源序列，它们的性能可能会显著下降。此外，由于缺乏表征未知蛋白质的信息，整合各种特征的方法在高通量蛋白质组分析中可能会面临覆盖率低的问题。

结果

我们提出了一种针对革兰氏阴性菌的混合预测方法，该方法结合了一对一支持向量机（SVM）模型和结构同源性方法。SVM模型包含多个二元分类器，其中纳入了源自革兰氏阴性菌转运途径的生物学特征。在结构同源性方法中，我们采用二级结构比对进行结构相似性比较，并将排名最高的蛋白质的已知定位指定为查询蛋白质的预测定位。使用基准数据集进行十折交叉验证时，混合方法的总体准确率分别达到93.7%和93.2%。在评估数据集的评估中，我们的方法也达到了84.0%的准确预测准确率，特别是在对与训练数据同源性较低的序列进行测试时。还纳入了一种三分法数据分割程序以防止对预测性能的高估。此外，我们表明，对于序列同一性小于30%的非冗余数据集，预测准确率应约为85%。

结论

我们的结果表明，源自革兰氏阴性菌转运途径的生物学特征有显著改进。这些生物学特征是可解释的，可应用于高级分析和实验设计。此外，结合结构同源性方法的总体准确率进一步提高，这表明除了序列同源性外，结构保守性可能是推断定位的有用指标。所提出的方法可用于蛋白质组的大规模分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481c/2040162/48a3fc9cb57c/1471-2105-8-330-1.jpg

相似文献

Protein subcellular localization prediction based on compartment-specific features and structure conservation.基于特定区室特征和结构保守性的蛋白质亚细胞定位预测

BMC Bioinformatics. 2007 Sep 8;8:330. doi: 10.1186/1471-2105-8-330.

Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.利用 Chou 的 5 步规则，通过基于基因本体论注释和序列比对的多标签学习，预测革兰氏阴性和革兰氏阳性细菌蛋白质的亚细胞定位。

J Integr Bioinform. 2020 Jun 29;18(1):51-79. doi: 10.1515/jib-2019-0091.

Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.基于概率潜在语义索引的核转位信号预测核蛋白。

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.

Prediction of protein subcellular localization.蛋白质亚细胞定位预测

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis.PSORTb v.2.0：细菌蛋白质亚细胞定位的扩展预测及比较蛋白质组分析获得的见解

Bioinformatics. 2005 Mar 1;21(5):617-23. doi: 10.1093/bioinformatics/bti057. Epub 2004 Oct 22.

Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.基于支持向量机的方法，利用进化信息和基序预测分枝杆菌蛋白质的亚细胞定位

BMC Bioinformatics. 2007 Sep 13;8:337. doi: 10.1186/1471-2105-8-337.

Protein subcellular localization prediction based on compartment-specific biological features.基于特定区室生物学特征的蛋白质亚细胞定位预测

Comput Syst Bioinformatics Conf. 2006:325-30.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization.ProLoc-GO：利用信息丰富的基因本体术语进行基于序列的蛋白质亚细胞定位预测。

BMC Bioinformatics. 2008 Feb 1;9:80. doi: 10.1186/1471-2105-9-80.

Efficient and interpretable prediction of protein functional classes by correspondence analysis and compact set relations.基于对应分析和紧致集关系的高效可解释蛋白质功能类预测。

PLoS One. 2013 Oct 11;8(10):e75542. doi: 10.1371/journal.pone.0075542. eCollection 2013.

引用本文的文献

Subtractive proteomics and molecular docking identify therapeutic targets and drug candidates in drug resistant Klebsiella Michiganensis THO-011.消减蛋白质组学和分子对接鉴定耐药物密歇根克雷伯菌THO - 011中的治疗靶点和候选药物。

Sci Rep. 2025 Jul 3;15(1):23776. doi: 10.1038/s41598-025-08107-x.

In silico discovery of druggable targets in Citrobacter koseri using echinoderm metabolites and molecular dynamics simulation.利用棘皮动物代谢物和分子动力学模拟在克氏柠檬酸杆菌中发现可成药靶标的计算方法。

Sci Rep. 2024 Nov 5;14(1):26776. doi: 10.1038/s41598-024-77342-5.

Integration of molecular docking and molecular dynamics simulations with subtractive proteomics approach to identify the novel drug targets and their inhibitors in Streptococcus gallolyticus.运用分子对接和分子动力学模拟技术，结合消减蛋白质组学方法，鉴定出酿脓链球菌中的新型药物靶标及其抑制剂。

Sci Rep. 2024 Jun 26;14(1):14755. doi: 10.1038/s41598-024-64769-z.

An Update on "Reverse Vaccinology": The Pathway from Genomes and Epitope Predictions to Tailored, Recombinant Vaccines.“反向疫苗学”最新进展：从基因组和表位预测到定制的重组疫苗。

Methods Mol Biol. 2022;2412:45-71. doi: 10.1007/978-1-0716-1892-9_4.

Structural and Computational Biology in the Design of Immunogenic Vaccine Antigens.结构与计算生物学在免疫原性疫苗抗原设计中的应用。

J Immunol Res. 2015;2015:156241. doi: 10.1155/2015/156241. Epub 2015 Oct 7.

Prediction of protein subcellular localization by incorporating multiobjective PSO-based feature subset selection into the general form of Chou's PseAAC.通过将基于多目标粒子群优化的特征子集选择纳入周氏伪氨基酸组成的一般形式来预测蛋白质亚细胞定位

Med Biol Eng Comput. 2015 Apr;53(4):331-44. doi: 10.1007/s11517-014-1238-7. Epub 2015 Jan 7.

Bagging with CTD--a novel signature for the hierarchical prediction of secreted protein trafficking in eukaryotes.袋模型（Bagging）与 CTD--一种新型的真核生物分泌蛋白运输的层次预测特征。

Genomics Proteomics Bioinformatics. 2013 Dec;11(6):385-90. doi: 10.1016/j.gpb.2013.07.005. Epub 2013 Dec 6.

Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning.利用机器学习从序列归因特征识别和表征质体型蛋白。

BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S7. doi: 10.1186/1471-2105-14-S14-S7. Epub 2013 Oct 9.

PLoS One. 2013 Oct 11;8(10):e75542. doi: 10.1371/journal.pone.0075542. eCollection 2013.

Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.基于概率潜在语义索引的核转位信号预测核蛋白。

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.

本文引用的文献

Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides.信号-CF：一种用于预测信号肽的亚位点耦合和窗口融合方法。

Biochem Biophys Res Commun. 2007 Jun 8;357(3):633-40. doi: 10.1016/j.bbrc.2007.03.162. Epub 2007 Apr 5.

Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites.Euk-mPLoc：一种通过整合多个位点进行大规模真核生物蛋白质亚细胞定位预测的融合分类器。

J Proteome Res. 2007 May;6(5):1728-34. doi: 10.1021/pr060635i. Epub 2007 Mar 31.

Protein subcellular localization prediction based on compartment-specific biological features.基于特定区室生物学特征的蛋白质亚细胞定位预测

Comput Syst Bioinformatics Conf. 2006:325-30.

Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites.Hum-mPLoc：一种通过纳入具有多个位点的样本进行大规模人类蛋白质亚细胞定位预测的集成分类器。

Biochem Biophys Res Commun. 2007 Apr 20;355(4):1006-11. doi: 10.1016/j.bbrc.2007.02.071. Epub 2007 Feb 23.

NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition.NERBio：利用选定的词连接、术语规范化和全局模式来改进生物医学命名实体识别。

BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S11. doi: 10.1186/1471-2105-7-S5-S11.

PLPD: reliable protein localization prediction from imbalanced and overlapped datasets.PLPD：从不平衡和重叠数据集中进行可靠的蛋白质定位预测。

Nucleic Acids Res. 2006;34(17):4655-66. doi: 10.1093/nar/gkl638. Epub 2006 Sep 11.

Methods for predicting bacterial protein subcellular localization.预测细菌蛋白质亚细胞定位的方法。

Nat Rev Microbiol. 2006 Oct;4(10):741-51. doi: 10.1038/nrmicro1494. Epub 2006 Sep 11.

Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers.通过融合优化的证据理论K近邻分类器预测真核生物蛋白质亚细胞定位

J Proteome Res. 2006 Aug;5(8):1888-97. doi: 10.1021/pr060167c.

BaCelLo: a balanced subcellular localization predictor.BaCelLo：一种平衡的亚细胞定位预测器。

Bioinformatics. 2006 Jul 15;22(14):e408-16. doi: 10.1093/bioinformatics/btl222.

Prediction of protein subcellular localization.蛋白质亚细胞定位预测

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于特定区室特征和结构保守性的蛋白质亚细胞定位预测

Protein subcellular localization prediction based on compartment-specific features and structure conservation.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献