菌株、物种和属特异性蛋白质的计算识别

Computational identification of strain-, species- and genus-specific proteins.

作者信息

Mazumder Raja, Natale Darren A, Murthy Sudhir, Thiagarajan Rathi, Wu Cathy H

机构信息

Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, Washington, DC 20057-1414, USA.

出版信息

BMC Bioinformatics. 2005 Nov 23;6:279. doi: 10.1186/1471-2105-6-279.

DOI:10.1186/1471-2105-6-279

PMID:16305751

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1310627/

Abstract

BACKGROUND

The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relationship with close relatives. Such proteins can also serve as taxon-specific diagnostic targets.

DESCRIPTION

A pipeline using a combination of computational and manual analyses of BLAST results was developed to identify strain-, species-, and genus-specific proteins and to catalog the closest sequenced relative for each protein in a proteome. Proteins encoded by a given strain are preliminarily considered to be unique if BLAST, using a comprehensive protein database, fails to retrieve (with an e-value better than 0.001) any protein not encoded by the query strain, species or genus (for strain-, species- and genus-specific proteins respectively), or if BLAST, using the best hit as the query (reverse BLAST), does not retrieve the initial query protein. Results are manually inspected for homology if the initial query is retrieved in the reverse BLAST but is not the best hit. Sequences unlikely to retrieve homologs using the default BLOSUM62 matrix (usually short sequences) are re-tested using the PAM30 matrix, thereby increasing the number of retrieved homologs and increasing the stringency of the search for unique proteins. The above protocol was used to examine several food- and water-borne pathogens. We find that the reverse BLAST step filters out about 22% of proteins with homologs that would otherwise be considered unique at the genus and species levels. Analysis of the annotations of unique proteins reveals that many are remnants of prophage proteins, or may be involved in virulence. The data generated from this study can be accessed and further evaluated from the CUPID (Core and Unique Protein Identification) system web site (updated semi-annually) at http://pir.georgetown.edu/cupid.

CONCLUSION

CUPID provides a set of proteins specific to a genus, species or a strain, and identifies the most closely related organism.

摘要

背景

在不同分类水平上鉴定独特蛋白质具有科学和实用价值。菌株、物种和属特异性蛋白质可以深入了解定义生物体的标准及其与近缘亲属的关系。此类蛋白质还可作为分类群特异性诊断靶点。

描述

开发了一种流程，结合对BLAST结果的计算分析和人工分析，以鉴定菌株、物种和属特异性蛋白质，并为蛋白质组中的每种蛋白质编目最接近的已测序亲属。如果使用综合蛋白质数据库进行BLAST未能检索到（E值优于0.001）任何不由查询菌株、物种或属编码的蛋白质（分别针对菌株、物种和属特异性蛋白质），或者如果使用最佳匹配作为查询进行BLAST（反向BLAST）未检索到初始查询蛋白质，则初步认为给定菌株编码的蛋白质是独特的。如果在反向BLAST中检索到初始查询但不是最佳匹配，则人工检查结果的同源性。使用默认BLOSUM62矩阵不太可能检索到同源物的序列（通常是短序列）使用PAM30矩阵重新测试，从而增加检索到的同源物数量并提高对独特蛋白质搜索的严格性。上述方案用于检查几种食源性病原体和水源性病原体。我们发现，反向BLAST步骤过滤掉了约22%在属和种水平上原本会被视为独特的具有同源物的蛋白质。对独特蛋白质注释的分析表明，许多是前噬菌体蛋白质的残余物，或者可能与毒力有关。本研究产生的数据可从http://pir.georgetown.edu/cupid的CUPID（核心和独特蛋白质鉴定）系统网站（每半年更新一次）访问并进一步评估。

结论

CUPID提供了一组特定于属、种或菌株的蛋白质，并鉴定出最密切相关的生物体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b64/1310627/2fd00cdfcaf2/1471-2105-6-279-1.jpg

相似文献

Computational identification of strain-, species- and genus-specific proteins.菌株、物种和属特异性蛋白质的计算识别

BMC Bioinformatics. 2005 Nov 23;6:279. doi: 10.1186/1471-2105-6-279.

Identification of Yersinia pestis and Escherichia coli strains by whole cell and outer membrane protein extracts with mass spectrometry-based proteomics.基于质谱蛋白质组学的全细胞和外膜蛋白提取物鉴定鼠疫耶尔森氏菌和大肠杆菌菌株。

J Proteome Res. 2010 Jul 2;9(7):3647-55. doi: 10.1021/pr100402y.

fastSCOP: a fast web server for recognizing protein structural domains and SCOP superfamilies.fastSCOP：一个用于识别蛋白质结构域和SCOP超家族的快速网络服务器。

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W438-43. doi: 10.1093/nar/gkm288. Epub 2007 May 7.

Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST).Windows .NET网络分布式基本局部比对搜索工具包（W.ND-BLAST）。

BMC Bioinformatics. 2005 Apr 8;6:93. doi: 10.1186/1471-2105-6-93.

Identification of potential drug targets by subtractive genome analysis of Bacillus anthracis A0248: An in silico approach.通过炭疽芽孢杆菌A0248的消减基因组分析鉴定潜在药物靶点：一种计算机模拟方法。

Comput Biol Chem. 2014 Oct;52:66-72. doi: 10.1016/j.compbiolchem.2014.09.005. Epub 2014 Sep 18.

SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.SS-Wrapper：用于在Linux集群上进行相似性搜索的一组包装应用程序。

BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.

Sequence and hydropathy profile analysis of two classes of secondary transporters.两类次级转运蛋白的序列与亲水性分析

Mol Membr Biol. 2005 May-Jun;22(3):177-89. doi: 10.1080/09687860500063324.

Recent Hits Acquired by BLAST (ReHAB): a tool to identify new hits in sequence similarity searches.通过BLAST获取的近期命中结果（ReHAB）：一种在序列相似性搜索中识别新命中结果的工具。

BMC Bioinformatics. 2005 Feb 8;6:23. doi: 10.1186/1471-2105-6-23.

Tracembler--software for in-silico chromosome walking in unassembled genomes.Tracembler——用于未组装基因组中电子染色体步移的软件。

BMC Bioinformatics. 2007 May 9;8:151. doi: 10.1186/1471-2105-8-151.

Alkahest NuclearBLAST : a user-friendly BLAST management and analysis system.阿尔卡hest核BLAST：一个用户友好的BLAST管理与分析系统。

BMC Bioinformatics. 2005 Jun 15;6:147. doi: 10.1186/1471-2105-6-147.

引用本文的文献

Census-based rapid and accurate metagenome taxonomic profiling.基于普查的快速准确宏基因组分类分析。

BMC Genomics. 2014 Oct 21;15(1):918. doi: 10.1186/1471-2164-15-918.

Evolutionary and experimental assessment of novel markers for detection of Xanthomonas euvesicatoria in plant samples.在植物样本中检测黄单胞菌属的新型标记物的进化和实验评估。

PLoS One. 2012;7(5):e37836. doi: 10.1371/journal.pone.0037836. Epub 2012 May 24.

Toward an efficient method of identifying core genes for evolutionary and functional microbial phylogenies.面向进化和功能微生物系统发生学中核心基因识别的有效方法。

PLoS One. 2011;6(9):e24704. doi: 10.1371/journal.pone.0024704. Epub 2011 Sep 12.

Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation.代表性蛋白质组：用于序列分析和功能注释的稳定、可扩展且无偏的蛋白质组集。

PLoS One. 2011 Apr 27;6(4):e18910. doi: 10.1371/journal.pone.0018910.

Systems integration of biodefense omics data for analysis of pathogen-host interactions and identification of potential targets.生物防御组学数据的系统集成用于分析病原体-宿主相互作用和鉴定潜在靶标。

PLoS One. 2009 Sep 25;4(9):e7162. doi: 10.1371/journal.pone.0007162.

Signature, a web server for taxonomic characterization of sequence samples using signature genes.Signature，一个使用特征基因对序列样本进行分类特征分析的网络服务器。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W470-4. doi: 10.1093/nar/gkn277. Epub 2008 May 17.

Luminex detection of fecal indicators in river samples, marine recreational water, and beach sand.利用鲁米诺克斯技术检测河流样本、海洋休闲用水和沙滩沙中的粪便指示物。

Mar Pollut Bull. 2007 May;54(5):521-36. doi: 10.1016/j.marpolbul.2006.12.018. Epub 2007 Mar 9.

本文引用的文献

The Universal Protein Resource (UniProt).通用蛋白质资源（UniProt）。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D154-9. doi: 10.1093/nar/gki070.

Procom: a web-based tool to compare multiple eukaryotic proteomes.Procom：一个用于比较多个真核生物蛋白质组的基于网络的工具。

Bioinformatics. 2005 Apr 15;21(8):1693-4. doi: 10.1093/bioinformatics/bti161. Epub 2004 Nov 25.

A combined transmembrane topology and signal peptide prediction method.一种跨膜拓扑结构与信号肽联合预测方法。

J Mol Biol. 2004 May 14;338(5):1027-36. doi: 10.1016/j.jmb.2004.03.016.

The iProClass integrated database for protein functional analysis.用于蛋白质功能分析的iProClass综合数据库。

Comput Biol Chem. 2004 Feb;28(1):87-96. doi: 10.1016/j.compbiolchem.2003.10.003.

The ORFanage: an ORFan database.孤儿基因数据库：一个孤儿基因数据库。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D281-3. doi: 10.1093/nar/gkh116.

Analysis of singleton ORFans in fully sequenced microbial genomes.对全测序微生物基因组中单个孤儿基因的分析。

Proteins. 2003 Nov 1;53(2):241-51. doi: 10.1002/prot.10423.

Characterization of species-specific genes using a flexible, web-based querying system.使用灵活的基于网络的查询系统对物种特异性基因进行表征。

FEMS Microbiol Lett. 2003 Aug 29;225(2):213-20. doi: 10.1016/S0378-1097(03)00512-3.

[Role of horizontal gene transfer by bacteriophages in the origin of pathogenic bacteria].[噬菌体介导的水平基因转移在病原菌起源中的作用]

Genetika. 2003 May;39(5):595-620.

PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria.PSORT-B：改进革兰氏阴性菌蛋白质亚细胞定位预测

Nucleic Acids Res. 2003 Jul 1;31(13):3613-7. doi: 10.1093/nar/gkg602.

Phylogenomics: intersection of evolution and genomics.系统发育基因组学：进化与基因组学的交叉领域。

Science. 2003 Jun 13;300(5626):1706-7. doi: 10.1126/science.1086292.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

菌株、物种和属特异性蛋白质的计算识别

Computational identification of strain-, species- and genus-specific proteins.

作者信息

机构信息

出版信息

BACKGROUND

DESCRIPTION

CONCLUSION

背景

描述

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献