M5nr：一个新颖的非冗余数据库，包含来自多个来源的蛋白质序列和注释以及相关工具。

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools.

机构信息

Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA.

出版信息

BMC Bioinformatics. 2012 Jun 21;13:141. doi: 10.1186/1471-2105-13-141.

DOI:10.1186/1471-2105-13-141

PMID:22720753

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3410781/

Abstract

BACKGROUND

Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference.

DESCRIPTION

We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank.

CONCLUSIONS

The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.

摘要

背景

序列相似性计算在宏基因组分析中成为一个限制因素。以开放、可交换格式编码的序列相似性搜索结果有可能限制对这些数据集进行重新计算分析的需求。共享相似性结果的前提是有一个共同的参考。

描述

我们引入了一种自动维护全面、非冗余蛋白质数据库的机制，并创建了该资源的季度版本。此外，我们还提供了工具，可将相似性搜索转换为许多注释名称空间，例如 KEGG 或 NCBI 的 GenBank。

结论

我们提供的数据和工具允许使用单个计算创建多个结果集，从而允许在大型序列数据集之间的组之间共享计算结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7706/3410781/2b430997cc2c/1471-2105-13-141-1.jpg

相似文献

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools.M5nr：一个新颖的非冗余数据库，包含来自多个来源的蛋白质序列和注释以及相关工具。

BMC Bioinformatics. 2012 Jun 21;13:141. doi: 10.1186/1471-2105-13-141.

SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.SIMAP--一个综合的预先计算的蛋白质序列相似性、结构域、注释和聚类数据库。

Nucleic Acids Res. 2010 Jan;38(Database issue):D223-6. doi: 10.1093/nar/gkp949. Epub 2009 Nov 11.

ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.ORFer——从GenBank中检索蛋白质序列和开放阅读框，并存储到关系数据库或文本文件中。

BMC Bioinformatics. 2002 Dec 19;3:40. doi: 10.1186/1471-2105-3-40.

NCBI's Conserved Domain Database and Tools for Protein Domain Analysis.NCBI 的保守结构域数据库和蛋白质结构域分析工具。

Curr Protoc Bioinformatics. 2020 Mar;69(1):e90. doi: 10.1002/cpbi.90.

A new bioinformatics analysis tools framework at EMBL-EBI.一个新的生物信息学分析工具框架在 EMBL-EBI。

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W695-9. doi: 10.1093/nar/gkq313. Epub 2010 May 3.

COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.认知器：宏基因组数据集功能注释框架

PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.

PSimScan: algorithm and utility for fast protein similarity search.PSimScan：快速蛋白质相似性搜索的算法和工具。

PLoS One. 2013;8(3):e58505. doi: 10.1371/journal.pone.0058505. Epub 2013 Mar 7.

UniRef: comprehensive and non-redundant UniProt reference clusters.UniRef：全面且无冗余的UniProt参考簇。

Bioinformatics. 2007 May 15;23(10):1282-8. doi: 10.1093/bioinformatics/btm098. Epub 2007 Mar 22.

Using SQL Databases for Sequence Similarity Searching and Analysis.使用SQL数据库进行序列相似性搜索与分析。

Curr Protoc Bioinformatics. 2017 Sep 13;59:9.4.1-9.4.22. doi: 10.1002/cpbi.32.

Prediction of protein subcellular localization.蛋白质亚细胞定位预测

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

引用本文的文献

Metagenomic analysis reveals methanogenic and other archaeal genes in the digestive tract of invasive Japanese beetle larvae and associated soil.宏基因组分析揭示了入侵性日本金龟子幼虫消化道及相关土壤中的产甲烷古菌基因和其他古菌基因。

Front Microbiol. 2025 Jul 25;16:1609893. doi: 10.3389/fmicb.2025.1609893. eCollection 2025.

Analysis of metagenomic data.宏基因组数据的分析

Nat Rev Methods Primers. 2025;5. doi: 10.1038/s43586-024-00376-6. Epub 2025 Jan 23.

Enhancing bloodstream infection diagnostics: a novel filtration and targeted next-generation sequencing approach for precise pathogen identification.增强血流感染诊断：一种用于精确病原体鉴定的新型过滤和靶向新一代测序方法。

Front Microbiol. 2025 Mar 20;16:1538265. doi: 10.3389/fmicb.2025.1538265. eCollection 2025.

Nitrogen metabolism of the highly ureolytic bacterium Proteus penneri S99 isolated from the rumen.从瘤胃中分离出的高度尿素分解菌彭氏变形杆菌S99的氮代谢

BMC Microbiol. 2025 Feb 28;25(1):104. doi: 10.1186/s12866-025-03808-9.

Rhizosphere microbiome influence on tomato growth under low-nutrient settings.低养分条件下根际微生物群对番茄生长的影响。

FEMS Microbiol Ecol. 2025 Feb 20;101(3). doi: 10.1093/femsec/fiaf019.

Microbiome insights from a South African cultural and natural landmark cave using metagenomics next-generation sequencing.利用宏基因组学下一代测序技术对南非一处具有文化和自然地标意义的洞穴进行微生物组研究

Microbiol Resour Announc. 2025 Mar 11;14(3):e0118324. doi: 10.1128/mra.01183-24. Epub 2025 Feb 18.

Differential interactions of Rickettsia species with tick microbiota in Rh. sanguineus and Rh. turanicus.不同的立克次体物种与 Rh. sanguineus 和 Rh. turanicus 中的蜱虫微生物组的相互作用。

Sci Rep. 2024 Sep 5;14(1):20674. doi: 10.1038/s41598-024-71539-4.

Metagenomic functional profiling: to sketch or not to sketch?宏基因组功能谱分析：描绘还是不描绘？

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii165-ii173. doi: 10.1093/bioinformatics/btae397.

High-throughput metagenomic assessment of Cango Cave microbiome-A South African limestone cave.对南非石灰岩洞穴——藏戈洞微生物群落的高通量宏基因组评估

Data Brief. 2024 Apr 15;54:110381. doi: 10.1016/j.dib.2024.110381. eCollection 2024 Jun.

Deciphering the microbial communities of alkaline hot spring in Panamik, Ladakh, India using a high-throughput sequencing approach.采用高通量测序方法解析印度拉达克帕米尔碱性热泉中的微生物群落。

Braz J Microbiol. 2024 Jun;55(2):1465-1476. doi: 10.1007/s42770-024-01346-6. Epub 2024 Apr 25.

本文引用的文献

CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.CloVR：一种虚拟机，用于在桌面环境下通过云计算实现自动化和可移植的序列分析。

BMC Bioinformatics. 2011 Aug 30;12:356. doi: 10.1186/1471-2105-12-356.

eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations.eggNOG v2.0：通过增强的非监督同源物聚类、物种和功能注释，扩展基因的进化系统发生。

Nucleic Acids Res. 2010 Jan;38(Database issue):D190-5. doi: 10.1093/nar/gkp951. Epub 2009 Nov 9.

FIGfams: yet another set of protein families.FIGfams：另一组蛋白质家族。

Nucleic Acids Res. 2009 Nov;37(20):6643-54. doi: 10.1093/nar/gkp698. Epub 2009 Sep 17.

Single-molecule sequencing of an individual human genome.对单个人类基因组进行单分子测序。

Nat Biotechnol. 2009 Sep;27(9):847-50. doi: 10.1038/nbt.1561. Epub 2009 Aug 10.

The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.宏基因组学RAST服务器——用于宏基因组自动系统发育和功能分析的公共资源。

BMC Bioinformatics. 2008 Sep 19;9:386. doi: 10.1186/1471-2105-9-386.

KEGG for linking genomes to life and the environment.京都基因与基因组百科全书，用于将基因组与生命及环境相联系。

Nucleic Acids Res. 2008 Jan;36(Database issue):D480-4. doi: 10.1093/nar/gkm882. Epub 2007 Dec 12.

The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases.蛋白质标识符交叉引用（PICR）服务：协调多个源数据库中的蛋白质标识符。

BMC Bioinformatics. 2007 Oct 18;8:401. doi: 10.1186/1471-2105-8-401.

A database of unique protein sequence identifiers for proteome studies.用于蛋白质组研究的独特蛋白质序列标识符数据库。

Proteomics. 2006 Aug;6(16):4514-22. doi: 10.1002/pmic.200600032.

The integrated microbial genomes (IMG) system.综合微生物基因组（IMG）系统

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D344-8. doi: 10.1093/nar/gkj024.

BioThesaurus: a web-based thesaurus of protein and gene names.生物词库：一个基于网络的蛋白质和基因名称词库。

Bioinformatics. 2006 Jan 1;22(1):103-5. doi: 10.1093/bioinformatics/bti749. Epub 2005 Nov 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

M5nr：一个新颖的非冗余数据库，包含来自多个来源的蛋白质序列和注释以及相关工具。

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools.

机构信息

出版信息

BACKGROUND

DESCRIPTION

CONCLUSIONS

背景

描述

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献