穷人的 BLASTX——使用 PAUDA 进行高通量宏基因组蛋白质数据库搜索。

A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA.

机构信息

Singapore Centre on Environmental Life Sciences Engineering, School of Biological Sciences, Nanyang Technological University, Singapore 637551, Center for Bioinformatics, University of Tübingen, 72076 Tübingen, Germany and Life Sciences Institute, National University of Singapore, Singapore 117456.

出版信息

Bioinformatics. 2014 Jan 1;30(1):38-9. doi: 10.1093/bioinformatics/btt254. Epub 2013 May 7.

DOI:10.1093/bioinformatics/btt254

PMID:23658416

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3866550/

Abstract

SUMMARY

In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ~10,000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800,000 CPU hours, leading to the same clustering of samples by functional profiles.

AVAILABILITY

PAUDA is freely available from: http://ab.inf.uni-tuebingen.de/software/pauda. Also supplementary method details are available from this website.

摘要

在宏基因组学背景下，我们提出了一种新的蛋白质数据库搜索方法，称为 PAUDA，它的运行速度比 BLASTX 快约 10000 倍，而将reads 分配到 KEGG 直系同源群的比例约为其三分之一，并生成与 BLASTX 获得的高度相关的基因和分类群丰度谱。PAUDA 分析 24600 万条来自永久冻土土壤的 Illumina DNA reads 的数据集仅需 <80 CPU 小时，而之前的 BLASTX 分析（在 1.76 亿条reads 的一个子集上）则需要 800000 CPU 小时，从而导致功能谱对样本进行相同的聚类。

可用性

PAUDA 可从以下网址免费获得：http://ab.inf.uni-tuebingen.de/software/pauda。此外，该网站还提供了补充方法的详细信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6453/3866550/8014c58ac5b1/btt254f1ap.jpg

相似文献

A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA.

Bioinformatics. 2014 Jan 1;30(1):38-9. doi: 10.1093/bioinformatics/btt254. Epub 2013 May 7.

Faster sequence homology searches by clustering subsequences.

Bioinformatics. 2015 Apr 15;31(8):1183-90. doi: 10.1093/bioinformatics/btu780. Epub 2014 Nov 27.

GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data.

Methods Mol Biol. 2017;1611:15-25. doi: 10.1007/978-1-4939-7015-5_2.

Fast and sensitive protein alignment using DIAMOND.

Nat Methods. 2015 Jan;12(1):59-60. doi: 10.1038/nmeth.3176. Epub 2014 Nov 17.

MetaCache: context-aware classification of metagenomic reads using minhashing.

Bioinformatics. 2017 Dec 1;33(23):3740-3748. doi: 10.1093/bioinformatics/btx520.

Evaluation of a hybrid approach using UBLAST and BLASTX for metagenomic sequences annotation of specific functional genes.

PLoS One. 2014 Oct 27;9(10):e110947. doi: 10.1371/journal.pone.0110947. eCollection 2014.

Microbial community analysis using MEGAN.

Methods Enzymol. 2013;531:465-85. doi: 10.1016/B978-0-12-407863-5.00021-6.

COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.

PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.

Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin.

PLoS One. 2012;7(4):e34030. doi: 10.1371/journal.pone.0034030. Epub 2012 Apr 4.

Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes.

BMC Bioinformatics. 2020 Feb 24;21(1):74. doi: 10.1186/s12859-020-3416-y.

引用本文的文献

Mechanistic insights into the transcriptomic and metabolomic responses of Curcuma wenyujin under high phosphorus stress.

BMC Plant Biol. 2025 Feb 20;25(1):233. doi: 10.1186/s12870-025-06132-6.

A survey of k-mer methods and applications in bioinformatics.

Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.

Genomoviruses in Liver Samples of Bats.

Microorganisms. 2024 Mar 29;12(4):688. doi: 10.3390/microorganisms12040688.

Lambda3: homology search for protein, nucleotide, and bisulfite-converted sequences.

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae097.

Transcriptome profiling for developmental stages Protaetia brevitarsis seulensis with focus on wing development and metamorphosis.

PLoS One. 2023 Mar 1;18(3):e0277815. doi: 10.1371/journal.pone.0277815. eCollection 2023.

An evolutionary divergent pestivirus lacking the N gene systemically infects a whale species.

Emerg Microbes Infect. 2019;8(1):1383-1392. doi: 10.1080/22221751.2019.1664940.

Responses of intestinal virome to silver nanoparticles: safety assessment by classical virology, whole-genome sequencing and bioinformatics approaches.

Int J Nanomedicine. 2018 May 16;13:2857-2867. doi: 10.2147/IJN.S161379. eCollection 2018.

First detection of Wolbachia in the New Zealand biota.

PLoS One. 2018 Apr 25;13(4):e0195517. doi: 10.1371/journal.pone.0195517. eCollection 2018.

PALADIN: protein alignment for functional profiling whole metagenome shotgun data.

Bioinformatics. 2017 May 15;33(10):1473-1478. doi: 10.1093/bioinformatics/btx021.

Evaluating techniques for metagenome annotation using simulated sequence data.

FEMS Microbiol Ecol. 2016 Jul;92(7). doi: 10.1093/femsec/fiw095. Epub 2016 May 8.

本文引用的文献

Fast gapped-read alignment with Bowtie 2.

Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.

Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw.

Nature. 2011 Nov 6;480(7377):368-71. doi: 10.1038/nature10576.

RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data.

Bioinformatics. 2012 Jan 1;28(1):125-6. doi: 10.1093/bioinformatics/btr595. Epub 2011 Oct 28.

Integrative analysis of environmental sequences using MEGAN4.

Genome Res. 2011 Sep;21(9):1552-60. doi: 10.1101/gr.120618.111. Epub 2011 Jun 20.

Comparison of multiple metagenomes using phylogenetic networks based on ecological indices.

ISME J. 2010 Oct;4(10):1236-42. doi: 10.1038/ismej.2010.51. Epub 2010 Apr 29.

KEGG: kyoto encyclopedia of genes and genomes.

Nucleic Acids Res. 2000 Jan 1;28(1):27-30. doi: 10.1093/nar/28.1.27.

Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products.

Chem Biol. 1998 Oct;5(10):R245-9. doi: 10.1016/s1074-5521(98)90108-9.

Basic local alignment search tool.

J Mol Biol. 1990 Oct 5;215(3):403-10. doi: 10.1016/S0022-2836(05)80360-2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

穷人的 BLASTX——使用 PAUDA 进行高通量宏基因组蛋白质数据库搜索。

A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA.

机构信息

出版信息

Bioinformatics. 2014 Jan 1;30(1):38-9. doi: 10.1093/bioinformatics/btt254. Epub 2013 May 7.

DOI:10.1093/bioinformatics/btt254

PMID:23658416

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3866550/

Abstract

SUMMARY

AVAILABILITY

PAUDA is freely available from: http://ab.inf.uni-tuebingen.de/software/pauda. Also supplementary method details are available from this website.

摘要

可用性

PAUDA 可从以下网址免费获得：http://ab.inf.uni-tuebingen.de/software/pauda。此外，该网站还提供了补充方法的详细信息。

穷人的 BLASTX——使用 PAUDA 进行高通量宏基因组蛋白质数据库搜索。

A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA.

机构信息

出版信息

SUMMARY

AVAILABILITY

摘要

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

穷人的 BLASTX——使用 PAUDA 进行高通量宏基因组蛋白质数据库搜索。

A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA.

机构信息

出版信息

SUMMARY

AVAILABILITY

摘要

可用性

相似文献

引用本文的文献

本文引用的文献