与……进行大规模序列比较

Large-scale sequence comparisons with .

作者信息

Pierce N Tessa, Irber Luiz, Reiter Taylor, Brooks Phillip, Brown C Titus

机构信息

Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA.

出版信息

F1000Res. 2019 Jul 4;8:1006. doi: 10.12688/f1000research.19675.1. eCollection 2019.

DOI:10.12688/f1000research.19675.1

PMID:31508216

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6720031/

Abstract

The sourmash software package uses MinHash-based sketching to create "signatures", compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.

摘要

sourmash软件包使用基于MinHash的草图绘制来创建“签名”，即DNA、RNA和蛋白质序列的压缩表示形式，这些“签名”可以存储、搜索、探索并进行分类注释。sourmash签名可用于快速且在低内存条件下估计非常大的数据集之间的序列相似性，还可用于在大型基因组数据库中搜索与查询基因组和宏基因组相匹配的序列。sourmash用C++、Rust和Python实现，可在BSD许可下从http://github.com/dib-lab/sourmash免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79c6/6720031/4b7b9071f392/f1000research-8-21579-g0000.jpg

相似文献

Large-scale sequence comparisons with .

F1000Res. 2019 Jul 4;8:1006. doi: 10.12688/f1000research.19675.1. eCollection 2019.

K-mer based prediction of relatedness and ribotypes.

Microb Genom. 2022 Apr;8(4). doi: 10.1099/mgen.0.000804.

The khmer software package: enabling efficient nucleotide sequence analysis.

F1000Res. 2015 Sep 25;4:900. doi: 10.12688/f1000research.6924.1. eCollection 2015.

Mash: fast genome and metagenome distance estimation using MinHash.

Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.

PanKmer: k-mer-based and reference-free pangenome analysis.

Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad621.

These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure.

PLoS One. 2014 Jul 25;9(7):e101271. doi: 10.1371/journal.pone.0101271. eCollection 2014.

HyperGen: Compact and Efficient Genome Sketching using Hyperdimensional Vectors.

Bioinformatics. 2024 Jul 16;40(7). doi: 10.1093/bioinformatics/btae452.

SnapperDB: a database solution for routine sequencing analysis of bacterial isolates.

Bioinformatics. 2018 Sep 1;34(17):3028-3029. doi: 10.1093/bioinformatics/bty212.

Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers.

Bioinformatics. 2019 Nov 1;35(22):4560-4567. doi: 10.1093/bioinformatics/btz273.

Squeakr: an exact and approximate k-mer counting system.

Bioinformatics. 2018 Feb 15;34(4):568-575. doi: 10.1093/bioinformatics/btx636.

引用本文的文献

Starship giant transposons dominate plastic genomic regions in a fungal plant pathogen and drive virulence evolution.

Nat Commun. 2025 Jul 24;16(1):6806. doi: 10.1038/s41467-025-61986-6.

Tetranucleotide frequencies differentiate genomic boundaries and metabolic strategies across environmental microbiomes.

mSystems. 2025 Jul 8:e0174424. doi: 10.1128/msystems.01744-24.

Variant evolution graph: Can we infer how SARS-CoV-2 variants are evolving?

PLoS One. 2025 Jun 9;20(6):e0323970. doi: 10.1371/journal.pone.0323970. eCollection 2025.

Algorithms Mol Biol. 2025 May 15;20(1):8. doi: 10.1186/s13015-025-00276-8.

DNA extraction protocols for animal fecal material on blood spot cards.

PLoS One. 2025 May 12;20(5):e0313808. doi: 10.1371/journal.pone.0313808. eCollection 2025.

Pitfalls of bacterial pan-genome analysis approaches: a case study of Mycobacterium tuberculosis and two less clonal bacterial species.

Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf219.

Plasmid-driven strategies for clone success in Escherichia coli.

Nat Commun. 2025 Apr 3;16(1):2921. doi: 10.1038/s41467-025-57940-1.

Reference-free identification and pangenome analysis of accessory chromosomes in a major fungal plant pathogen.

NAR Genom Bioinform. 2025 Apr 2;7(2):lqaf034. doi: 10.1093/nargab/lqaf034. eCollection 2025 Jun.

Integrating sequence composition information into microbial diversity analyses with k-mer frequency counting.

mSystems. 2025 Mar 18;10(3):e0155024. doi: 10.1128/msystems.01550-24. Epub 2025 Feb 20.

MOSHPIT: accessible, reproducible metagenome data science on the QIIME 2 framework.

bioRxiv. 2025 Feb 21:2025.01.27.635007. doi: 10.1101/2025.01.27.635007.

本文引用的文献

Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity.

Genome Biol. 2020 Jul 6;21(1):164. doi: 10.1186/s13059-020-02066-4.

Streaming histogram sketching for rapid microbiome analytics.

Microbiome. 2019 Mar 16;7(1):40. doi: 10.1186/s40168-019-0653-2.

BinDash, software for fast genome distance estimation on a typical personal laptop.

Bioinformatics. 2019 Feb 15;35(4):671-673. doi: 10.1093/bioinformatics/bty651.

Salmon provides fast and bias-aware quantification of transcript expression.

Nat Methods. 2017 Apr;14(4):417-419. doi: 10.1038/nmeth.4197. Epub 2017 Mar 6.

Mash: fast genome and metagenome distance estimation using MinHash.

Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.

How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?

RNA. 2016 Jun;22(6):839-51. doi: 10.1261/rna.053959.115. Epub 2016 Mar 28.

Fast search of thousands of short-read sequencing experiments.

Nat Biotechnol. 2016 Mar;34(3):300-2. doi: 10.1038/nbt.3442. Epub 2016 Feb 8.

A survey of best practices for RNA-seq data analysis.

Genome Biol. 2016 Jan 26;17:13. doi: 10.1186/s13059-016-0881-8.

The khmer software package: enabling efficient nucleotide sequence analysis.

F1000Res. 2015 Sep 25;4:900. doi: 10.12688/f1000research.6924.1. eCollection 2015.

Kraken: ultrafast metagenomic sequence classification using exact alignments.

Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

与……进行大规模序列比较

Large-scale sequence comparisons with .

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献