Suppr超能文献

菌株、物种和属特异性蛋白质的计算识别

Computational identification of strain-, species- and genus-specific proteins.

作者信息

Mazumder Raja, Natale Darren A, Murthy Sudhir, Thiagarajan Rathi, Wu Cathy H

机构信息

Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, Washington, DC 20057-1414, USA.

出版信息

BMC Bioinformatics. 2005 Nov 23;6:279. doi: 10.1186/1471-2105-6-279.

Abstract

BACKGROUND

The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relationship with close relatives. Such proteins can also serve as taxon-specific diagnostic targets.

DESCRIPTION

A pipeline using a combination of computational and manual analyses of BLAST results was developed to identify strain-, species-, and genus-specific proteins and to catalog the closest sequenced relative for each protein in a proteome. Proteins encoded by a given strain are preliminarily considered to be unique if BLAST, using a comprehensive protein database, fails to retrieve (with an e-value better than 0.001) any protein not encoded by the query strain, species or genus (for strain-, species- and genus-specific proteins respectively), or if BLAST, using the best hit as the query (reverse BLAST), does not retrieve the initial query protein. Results are manually inspected for homology if the initial query is retrieved in the reverse BLAST but is not the best hit. Sequences unlikely to retrieve homologs using the default BLOSUM62 matrix (usually short sequences) are re-tested using the PAM30 matrix, thereby increasing the number of retrieved homologs and increasing the stringency of the search for unique proteins. The above protocol was used to examine several food- and water-borne pathogens. We find that the reverse BLAST step filters out about 22% of proteins with homologs that would otherwise be considered unique at the genus and species levels. Analysis of the annotations of unique proteins reveals that many are remnants of prophage proteins, or may be involved in virulence. The data generated from this study can be accessed and further evaluated from the CUPID (Core and Unique Protein Identification) system web site (updated semi-annually) at http://pir.georgetown.edu/cupid.

CONCLUSION

CUPID provides a set of proteins specific to a genus, species or a strain, and identifies the most closely related organism.

摘要

背景

在不同分类水平上鉴定独特蛋白质具有科学和实用价值。菌株、物种和属特异性蛋白质可以深入了解定义生物体的标准及其与近缘亲属的关系。此类蛋白质还可作为分类群特异性诊断靶点。

描述

开发了一种流程,结合对BLAST结果的计算分析和人工分析,以鉴定菌株、物种和属特异性蛋白质,并为蛋白质组中的每种蛋白质编目最接近的已测序亲属。如果使用综合蛋白质数据库进行BLAST未能检索到(E值优于0.001)任何不由查询菌株、物种或属编码的蛋白质(分别针对菌株、物种和属特异性蛋白质),或者如果使用最佳匹配作为查询进行BLAST(反向BLAST)未检索到初始查询蛋白质,则初步认为给定菌株编码的蛋白质是独特的。如果在反向BLAST中检索到初始查询但不是最佳匹配,则人工检查结果的同源性。使用默认BLOSUM62矩阵不太可能检索到同源物的序列(通常是短序列)使用PAM30矩阵重新测试,从而增加检索到的同源物数量并提高对独特蛋白质搜索的严格性。上述方案用于检查几种食源性病原体和水源性病原体。我们发现,反向BLAST步骤过滤掉了约22%在属和种水平上原本会被视为独特的具有同源物的蛋白质。对独特蛋白质注释的分析表明,许多是前噬菌体蛋白质的残余物,或者可能与毒力有关。本研究产生的数据可从http://pir.georgetown.edu/cupid的CUPID(核心和独特蛋白质鉴定)系统网站(每半年更新一次)访问并进一步评估。

结论

CUPID提供了一组特定于属、种或菌株的蛋白质,并鉴定出最密切相关的生物体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b64/1310627/2fd00cdfcaf2/1471-2105-6-279-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验