基于 k- -mer 频率的单倍型重构进行无图谱变异调用。

Mapping-free variant calling using haplotype reconstruction from k-mer frequencies.

机构信息

School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.

出版信息

Bioinformatics. 2018 May 15;34(10):1659-1665. doi: 10.1093/bioinformatics/btx753.

DOI:10.1093/bioinformatics/btx753

PMID:29186321

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5946877/

Abstract

MOTIVATION

The standard protocol for detecting variation in DNA is to map millions of short sequence reads to a known reference and find loci that differ. While this approach works well, it cannot be applied where the sample contains dense variants or is too distant from known references. De novo assembly or hybrid methods can recover genomic variation, but the cost of computation is often much higher. We developed a novel k-mer algorithm and software implementation, Kestrel, capable of characterizing densely packed SNPs and large indels without mapping, assembly or de Bruijn graphs.

RESULTS

When applied to mosaic penicillin binding protein (PBP) genes in Streptococcus pneumoniae, we found near perfect concordance with assembled contigs at a fraction of the CPU time. Multilocus sequence typing (MLST) with this approach was able to bypass de novo assemblies. Kestrel has a very low false-positive rate when applied to the whole genome, and while Kestrel identified many variants missed by other methods, limitations of a purely k-mer based approach affect overall sensitivity.

AVAILABILITY AND IMPLEMENTATION

Source code and documentation for a Java implementation of Kestrel can be found at https://github.com/paudano/kestrel. All test code for this publication is located at https://github.com/paudano/kescases.

CONTACT

paudano@gatech.edu or fredrik.vannberg@biology.gatech.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

检测 DNA 变异的标准方法是将数百万个短序列读取映射到已知的参考序列，并找到不同的基因座。虽然这种方法效果很好，但在样本中包含密集变体或与已知参考序列相差太远的情况下，它就无法使用。从头组装或混合方法可以恢复基因组变异，但计算成本通常要高得多。我们开发了一种新的 k-mer 算法和软件实现，名为 Kestrel，可以在无需映射、组装或 de Bruijn 图的情况下，对密集排列的 SNPs 和大片段插入缺失进行特征描述。

结果

当将其应用于肺炎链球菌中镶嵌青霉素结合蛋白 (PBP) 基因时，我们发现与组装的连续基因片段几乎完全一致，而 CPU 时间仅为其一小部分。使用这种方法进行多位点序列分型 (MLST) 可以绕过从头组装。当应用于整个基因组时，Kestrel 的假阳性率非常低，尽管 Kestrel 识别出了许多其他方法错过的变体，但纯粹基于 k-mer 的方法的局限性会影响整体敏感性。

可用性和实现

Kestrel 的 Java 实现的源代码和文档可在 https://github.com/paudano/kestrel 上找到。本出版物的所有测试代码都位于 https://github.com/paudano/kescases 上。

联系人

paudano@gatech.edu 或 fredrik.vannberg@biology.gatech.edu。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d729/5946877/4c882f81467d/btx753f1.jpg

相似文献

Mapping-free variant calling using haplotype reconstruction from k-mer frequencies.

Bioinformatics. 2018 May 15;34(10):1659-1665. doi: 10.1093/bioinformatics/btx753.

stringMLST: a fast k-mer based tool for multilocus sequence typing.

Bioinformatics. 2017 Jan 1;33(1):119-121. doi: 10.1093/bioinformatics/btw586. Epub 2016 Sep 7.

Using 2k + 2 bubble searches to find single nucleotide polymorphisms in k-mer graphs.

Bioinformatics. 2015 Mar 1;31(5):642-6. doi: 10.1093/bioinformatics/btu706. Epub 2014 Oct 24.

Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.

Bioinformatics. 2020 Mar 1;36(5):1374-1381. doi: 10.1093/bioinformatics/btz102.

Integrating long-range connectivity information into de Bruijn graphs.

Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.

A space and time-efficient index for the compacted colored de Bruijn graph.

Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.

Modelling haplotypes with respect to reference cohort variation graphs.

Bioinformatics. 2017 Jul 15;33(14):i118-i123. doi: 10.1093/bioinformatics/btx236.

Haplotype-aware graph indexes.

Bioinformatics. 2020 Jan 15;36(2):400-407. doi: 10.1093/bioinformatics/btz575.

GraphBin: refined binning of metagenomic contigs using assembly graphs.

Bioinformatics. 2020 Jun 1;36(11):3307-3313. doi: 10.1093/bioinformatics/btaa180.

Bandage: interactive visualization of de novo genome assemblies.

Bioinformatics. 2015 Oct 15;31(20):3350-2. doi: 10.1093/bioinformatics/btv383. Epub 2015 Jun 22.

引用本文的文献

K-mer-based Approaches to Bridging Pangenomics and Population Genetics.

Mol Biol Evol. 2025 Mar 5;42(3). doi: 10.1093/molbev/msaf047.

Systematic Screening of Autosomal Dominant Tubulointerstitial Kidney Disease- MUC1 27dupC Pathogenic Variant through Exome Sequencing.

J Am Soc Nephrol. 2025 Feb 1;36(2):256-263. doi: 10.1681/ASN.0000000503. Epub 2024 Sep 26.

GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads.

Sci Rep. 2023 Oct 18;13(1):17765. doi: 10.1038/s41598-023-44636-z.

VNtyper enables accurate alignment-free genotyping of coding VNTR using short-read sequencing data in autosomal dominant tubulointerstitial kidney disease.

iScience. 2023 Jun 17;26(7):107171. doi: 10.1016/j.isci.2023.107171. eCollection 2023 Jul 21.

Exploring the sorghum race level diversity utilizing 272 sorghum accessions genomic resources.

Front Plant Sci. 2023 Mar 17;14:1143512. doi: 10.3389/fpls.2023.1143512. eCollection 2023.

2-kupl: mapping-free variant detection from DNA-seq data of matched samples.

BMC Bioinformatics. 2021 Jun 5;22(1):304. doi: 10.1186/s12859-021-04185-6.

Hardware acceleration of genomics data analysis: challenges and opportunities.

Bioinformatics. 2021 Jul 27;37(13):1785-1795. doi: 10.1093/bioinformatics/btab017.

Perfect Match Genomic Landscape strategy: Refinement and customization of reference genomes.

Proc Natl Acad Sci U S A. 2021 Apr 6;118(14). doi: 10.1073/pnas.2025192118.

Compact and evenly distributed k-mer binning for genomic sequences.

Bioinformatics. 2021 Sep 9;37(17):2563-2569. doi: 10.1093/bioinformatics/btab156.

STing: accurate and ultrafast genomic profiling with exact sequence matches.

Nucleic Acids Res. 2020 Aug 20;48(14):7681-7689. doi: 10.1093/nar/gkaa566.

本文引用的文献

A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance.

PLoS One. 2016 Jun 21;11(6):e0157718. doi: 10.1371/journal.pone.0157718. eCollection 2016.

Penicillin-Binding Protein Transpeptidase Signatures for Tracking and Predicting β-Lactam Resistance Levels in Streptococcus pneumoniae.

mBio. 2016 Jun 14;7(3):e00756-16. doi: 10.1128/mBio.00756-16.

Near-optimal probabilistic RNA-seq quantification.

Nat Biotechnol. 2016 May;34(5):525-7. doi: 10.1038/nbt.3519. Epub 2016 Apr 4.

Best practices for evaluating single nucleotide variant calling methods for microbial genomics.

Front Genet. 2015 Jul 7;6:235. doi: 10.3389/fgene.2015.00235. eCollection 2015.

Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains.

Genome Res. 2015 Jan;25(1):119-28. doi: 10.1101/gr.180190.114. Epub 2014 Nov 4.

Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications.

Nat Genet. 2014 Aug;46(8):912-918. doi: 10.1038/ng.3036. Epub 2014 Jul 13.

Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.

Nat Biotechnol. 2014 May;32(5):462-4. doi: 10.1038/nbt.2862. Epub 2014 Apr 20.

KAnalyze: a fast versatile pipelined k-mer toolkit.

Bioinformatics. 2014 Jul 15;30(14):2070-2. doi: 10.1093/bioinformatics/btu152. Epub 2014 Mar 18.

Kraken: ultrafast metagenomic sequence classification using exact alignments.

Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.

When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes.

PLoS One. 2013 Dec 9;8(12):e81760. doi: 10.1371/journal.pone.0081760. eCollection 2013.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 k- -mer 频率的单倍型重构进行无图谱变异调用。

Mapping-free variant calling using haplotype reconstruction from k-mer frequencies.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系人

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献