SIMAP--一个综合的预先计算的蛋白质序列相似性、结构域、注释和聚类数据库。

SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

机构信息

Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany.

出版信息

Nucleic Acids Res. 2010 Jan;38(Database issue):D223-6. doi: 10.1093/nar/gkp949. Epub 2009 Nov 11.

DOI:10.1093/nar/gkp949

PMID:19906725

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2808863/

Abstract

The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

摘要

利用序列比较进行蛋白质功能预测以及重建进化起源仍然是序列分析中最强大的工具。由于已知蛋白质序列数量的指数级增长以及相似性矩阵的二次增长，蛋白质相似性矩阵（SIMAP）的计算成为一项计算密集型任务。SIMAP 数据库提供了蛋白质序列相似性矩阵、基于序列的特征和序列聚类的全面和最新的预计算。截至 2009 年 9 月，SIMAP 涵盖了 4800 万种蛋白质和超过 2300 万种非冗余序列。SIMAP 的新功能包括通过包含 ENSEMBL 等数据库来扩展序列空间，以及基于一致处理和注释整合宏基因组。此外，Blast2GO 对 SIMAP 中的所有序列进行了蛋白质功能预测的预计算，并改进了数据访问和查询功能。SIMAP 协助生物学家系统地查询最新的序列空间，并为计算生物学中的大规模下游项目提供便利。通过个人网页门户（http://mips.gsf.de/simap/）以及 DAS（http://webclu.bio.wzw.tum.de/das/）和 Web 服务（http://mips.gsf.de/webservices/services/SimapService2.0?wsdl）提供对 SIMAP 的免费访问。

相似文献

SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.SIMAP--一个综合的预先计算的蛋白质序列相似性、结构域、注释和聚类数据库。

Nucleic Acids Res. 2010 Jan;38(Database issue):D223-6. doi: 10.1093/nar/gkp949. Epub 2009 Nov 11.

SIMAP--structuring the network of protein similarities.SIMAP——构建蛋白质相似性网络

Nucleic Acids Res. 2008 Jan;36(Database issue):D289-92. doi: 10.1093/nar/gkm963. Epub 2007 Nov 23.

SIMAP--the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage.SIMAP--包含所有蛋白质序列相似性和注释的数据库，具有新的界面和更大的覆盖面。

Nucleic Acids Res. 2014 Jan;42(Database issue):D279-84. doi: 10.1093/nar/gkt970. Epub 2013 Oct 27.

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii42-6. doi: 10.1093/bioinformatics/bti1107.

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D252-6. doi: 10.1093/nar/gkj106.

MIPS: analysis and annotation of genome information in 2007.MIPS：2007年基因组信息的分析与注释

Nucleic Acids Res. 2008 Jan;36(Database issue):D196-201. doi: 10.1093/nar/gkm980. Epub 2007 Dec 23.

CyanoBase: the cyanobacteria genome database update 2010.蓝藻基因组数据库更新 2010 版（CyanoBase：the cyanobacteria genome database update 2010）

Nucleic Acids Res. 2010 Jan;38(Database issue):D379-81. doi: 10.1093/nar/gkp915. Epub 2009 Oct 30.

PROMPT: a protein mapping and comparison tool.提示：一种蛋白质图谱绘制与比较工具。

BMC Bioinformatics. 2006 Jul 4;7:331. doi: 10.1186/1471-2105-7-331.

MIPS: analysis and annotation of proteins from whole genomes in 2005.MIPS：2005年全基因组蛋白质分析与注释

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D169-72. doi: 10.1093/nar/gkj148.

MannDB - a microbial database of automated protein sequence analyses and evidence integration for protein characterization.MannDB - 一个用于蛋白质表征的自动蛋白质序列分析和证据整合的微生物数据库。

BMC Bioinformatics. 2006 Oct 17;7:459. doi: 10.1186/1471-2105-7-459.

引用本文的文献

Transcriptional Dysregulations of Seven Non-Differentially Expressed Genes as Biomarkers of Metastatic Colon Cancer.七种非差异表达基因转录失调作为转移性结直肠癌的生物标志物。

Genes (Basel). 2023 May 24;14(6):1138. doi: 10.3390/genes14061138.

Sporisorium reilianum possesses a pool of effector proteins that modulate virulence on maize.盾壳霉属拥有一套效应蛋白，这些蛋白调节对玉米的致病性。

Mol Plant Pathol. 2019 Jan;20(1):124-136. doi: 10.1111/mpp.12744. Epub 2018 Oct 11.

Maximized Autotransporter-Mediated Expression (MATE) for Surface Display and Secretion of Recombinant Proteins in .用于在[具体生物或环境，原文此处缺失相关信息]中进行重组蛋白的表面展示和分泌的最大化自转运体介导表达（MATE）

Food Technol Biotechnol. 2015 Sep;53(3):251-260. doi: 10.17113/ftb.53.03.15.3802.

iVirus: facilitating new insights in viral ecology with software and community data sets imbedded in a cyberinfrastructure.iVirus：借助嵌入网络基础设施的软件和社区数据集促进对病毒生态学的新见解。

ISME J. 2017 Jan;11(1):7-14. doi: 10.1038/ismej.2016.89. Epub 2016 Jul 15.

Identifying problematic drugs based on the characteristics of their targets.根据靶点特征识别有问题的药物。

Front Pharmacol. 2015 Sep 1;6:186. doi: 10.3389/fphar.2015.00186. eCollection 2015.

ProtPhylo: identification of protein-phenotype and protein-protein functional associations via phylogenetic profiling.ProtPhylo：通过系统发育谱分析鉴定蛋白质-表型和蛋白质-蛋白质功能关联。

Nucleic Acids Res. 2015 Jul 1;43(W1):W160-8. doi: 10.1093/nar/gkv455. Epub 2015 May 8.

The Fusarium graminearum genome reveals more secondary metabolite gene clusters and hints of horizontal gene transfer.禾谷镰刀菌基因组揭示了更多次生代谢物基因簇以及水平基因转移的迹象。

PLoS One. 2014 Oct 15;9(10):e110311. doi: 10.1371/journal.pone.0110311. eCollection 2014.

eggNOG v4.0: nested orthology inference across 3686 organisms.eggNOG v4.0：跨越 3686 个生物体的嵌套同源推断。

Nucleic Acids Res. 2014 Jan;42(Database issue):D231-9. doi: 10.1093/nar/gkt1253. Epub 2013 Dec 1.

SeqDepot: streamlined database of biological sequences and precomputed features.SeqDepot：生物序列和预计算特征的简化数据库。

Bioinformatics. 2014 Jan 15;30(2):295-7. doi: 10.1093/bioinformatics/btt658. Epub 2013 Nov 13.

Signature protein of the PVC superphylum.PVC超门的标志性蛋白。

Appl Environ Microbiol. 2014 Jan;80(2):440-5. doi: 10.1128/AEM.02655-13. Epub 2013 Nov 1.

本文引用的文献

Ensembl 2009.Ensembl 2009.

Nucleic Acids Res. 2009 Jan;37(Database issue):D690-7. doi: 10.1093/nar/gkn828. Epub 2008 Nov 25.

Database resources of the National Center for Biotechnology Information.美国国立生物技术信息中心的数据库资源。

Nucleic Acids Res. 2009 Jan;37(Database issue):D5-15. doi: 10.1093/nar/gkn741. Epub 2008 Oct 21.

PEDANT covers all complete RefSeq genomes.PEDANT涵盖了所有完整的RefSeq基因组。

Nucleic Acids Res. 2009 Jan;37(Database issue):D408-11. doi: 10.1093/nar/gkn749. Epub 2008 Oct 21.

InterPro: the integrative protein signature database.InterPro：综合蛋白质特征数据库。

Nucleic Acids Res. 2009 Jan;37(Database issue):D211-5. doi: 10.1093/nar/gkn785. Epub 2008 Oct 21.

High-throughput functional annotation and data mining with the Blast2GO suite.使用Blast2GO套件进行高通量功能注释和数据挖掘。

Nucleic Acids Res. 2008 Jun;36(10):3420-35. doi: 10.1093/nar/gkn176. Epub 2008 Apr 29.

IMG/M: a data management and analysis system for metagenomes.IMG/M：一种用于宏基因组的数据管理与分析系统。

Nucleic Acids Res. 2008 Jan;36(Database issue):D534-8. doi: 10.1093/nar/gkm869. Epub 2007 Oct 11.

The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.“魔法师二号”全球海洋采样探险：从西北大西洋到东热带太平洋

PLoS Biol. 2007 Mar;5(3):e77. doi: 10.1371/journal.pbio.0050077.

The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.“魔法师二号”全球海洋采样考察：拓展蛋白质家族的范畴

PLoS Biol. 2007 Mar;5(3):e16. doi: 10.1371/journal.pbio.0050016.

MEGAN analysis of metagenomic data.宏基因组数据的MEGAN分析

Genome Res. 2007 Mar;17(3):377-86. doi: 10.1101/gr.5969107. Epub 2007 Jan 25.

NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.美国国立生物技术信息中心参考序列（RefSeq）：一个经过整理的基因组、转录本和蛋白质的非冗余序列数据库。

Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5. doi: 10.1093/nar/gkl842. Epub 2006 Nov 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SIMAP--一个综合的预先计算的蛋白质序列相似性、结构域、注释和聚类数据库。

SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献