Suppr超能文献

M5nr:一个新颖的非冗余数据库,包含来自多个来源的蛋白质序列和注释以及相关工具。

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools.

机构信息

Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA.

出版信息

BMC Bioinformatics. 2012 Jun 21;13:141. doi: 10.1186/1471-2105-13-141.

Abstract

BACKGROUND

Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference.

DESCRIPTION

We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank.

CONCLUSIONS

The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.

摘要

背景

序列相似性计算在宏基因组分析中成为一个限制因素。以开放、可交换格式编码的序列相似性搜索结果有可能限制对这些数据集进行重新计算分析的需求。共享相似性结果的前提是有一个共同的参考。

描述

我们引入了一种自动维护全面、非冗余蛋白质数据库的机制,并创建了该资源的季度版本。此外,我们还提供了工具,可将相似性搜索转换为许多注释名称空间,例如 KEGG 或 NCBI 的 GenBank。

结论

我们提供的数据和工具允许使用单个计算创建多个结果集,从而允许在大型序列数据集之间的组之间共享计算结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7706/3410781/2b430997cc2c/1471-2105-13-141-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验