星云：超高效免图结构变异基因分型器。

Nebula: ultra-efficient mapping-free structural variant genotyper.

机构信息

Genome Center, UC Davis, Davis, California, 95616, USA.

UC Davis MIND Institute, Sacramento, California, 95817, USA.

出版信息

Nucleic Acids Res. 2021 May 7;49(8):e47. doi: 10.1093/nar/gkab025.

DOI:10.1093/nar/gkab025

PMID:33503255

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8096284/

Abstract

Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.

摘要

大型常见遗传变异（包括插入和结构变异）目录正在使用第二代和第三代全基因组测序技术的数据创建。然而，对新测序样本中这些变异的基因分型是一项艰巨的任务，需要大量的计算资源。此外，目前的方法大多仅限于特定类型的变异，并且在对复杂事件进行基因分型时通常容易出现各种错误和歧义。我们提出了一种超高效的方法，用于对任何类型的结构变异进行基因分型，这种方法不受当前基于映射方法的缺点和复杂性的限制。我们的方法 Nebula 利用 k-mer 计数的变化来预测结构变异的基因型。我们已经表明，Nebula 不仅在基因分型结构变异方面比基于映射的方法快一个数量级，而且与最先进的方法具有可比的准确性。此外，Nebula 是一个通用框架，不限于任何特定类型的事件。Nebula 可在 https://github.com/Parsoa/Nebula 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76ee/8096284/1bdadb1730ca/gkab025fig1.jpg

相似文献

Nebula: ultra-efficient mapping-free structural variant genotyper.星云：超高效免图结构变异基因分型器。

Nucleic Acids Res. 2021 May 7;49(8):e47. doi: 10.1093/nar/gkab025.

Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data.利用基因型阵列数据比较多样本和单样本变异检测结果，并改进来自深度覆盖全基因组测序数据的变异检测集。

Bioinformatics. 2017 Apr 15;33(8):1147-1153. doi: 10.1093/bioinformatics/btw786.

One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies.一刀切并不适用——RefEditor：构建个性化二倍体参考基因组以改善下一代测序研究中的读段映射和基因型调用

PLoS Comput Biol. 2015 Aug 12;11(8):e1004448. doi: 10.1371/journal.pcbi.1004448. eCollection 2015 Aug.

Alignment-Free Genotyping of Known Variations with MALVA.无参考基因组的 MALVA 已知变异基因分型

Methods Mol Biol. 2022;2493:247-256. doi: 10.1007/978-1-0716-2293-3_15.

KAGE: fast alignment-free graph-based genotyping of SNPs and short indels.KAGE：快速基于图的无比对 SNP 和短插入缺失基因型分析。

Genome Biol. 2022 Oct 4;23(1):209. doi: 10.1186/s13059-022-02771-2.

Genotype-Corrector: improved genotype calls for genetic mapping in F and RIL populations.基因型校正器：改进 F 和 RIL 群体中遗传图谱的基因型调用。

Sci Rep. 2018 Jul 4;8(1):10088. doi: 10.1038/s41598-018-28294-0.

mInDel: a high-throughput and efficient pipeline for genome-wide InDel marker development.mInDel：一种用于全基因组插入缺失标记开发的高通量高效流程

BMC Genomics. 2016 Apr 14;17:290. doi: 10.1186/s12864-016-2614-5.

GGTyper: genotyping complex structural variants using short-read sequencing data.GGTyper：使用短读测序数据进行基因分型复杂结构变异。

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii11-ii19. doi: 10.1093/bioinformatics/btae391.

High throughput genotyping of structural variations in a complex plant genome using an original Affymetrix® axiom® array.利用原始的 Affymetrix® axiom® 阵列对复杂植物基因组中的结构变异进行高通量基因分型。

BMC Genomics. 2019 Nov 13;20(1):848. doi: 10.1186/s12864-019-6136-9.

Effects of spaced k-mers on alignment-free genotyping.间隔 k-mer 对无比对基因分型的影响。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i213-i221. doi: 10.1093/bioinformatics/btad202.

引用本文的文献

A scalable distributed pipeline for reference-free variants calling.一种用于无参考变异检测的可扩展分布式流程。

BMC Genomics. 2025 Jun 3;26(Suppl 1):557. doi: 10.1186/s12864-025-11722-7.

On weighted k-mer dictionaries.关于加权k-元字典。

Algorithms Mol Biol. 2023 Jun 17;18(1):3. doi: 10.1186/s13015-023-00226-2.

Comparative genome analysis using sample-specific string detection in accurate long reads.在准确的长读段中使用样本特异性字符串检测进行比较基因组分析。

Bioinform Adv. 2021 May 31;1(1):vbab005. doi: 10.1093/bioadv/vbab005. eCollection 2021.

SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads.SVDSS：使用准确长读段中样本特异性字符串在难以测序的基因组区域发现结构变异。

Nat Methods. 2023 Apr;20(4):550-558. doi: 10.1038/s41592-022-01674-1. Epub 2022 Dec 22.

Genetic Polymorphisms Associated with Perioperative Joint Infection following Total Joint Arthroplasty: A Systematic Review and Meta-Analysis.全关节置换术后围手术期关节感染相关的基因多态性：一项系统评价和荟萃分析

Antibiotics (Basel). 2022 Sep 2;11(9):1187. doi: 10.3390/antibiotics11091187.

Population-scale genotyping of structural variation in the era of long-read sequencing.长读长测序时代结构变异的群体规模基因分型

Comput Struct Biotechnol J. 2022 May 27;20:2639-2647. doi: 10.1016/j.csbj.2022.05.047. eCollection 2022.

Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data.基于长读测序数据的结构变异基因分型方法的综合评估。

BMC Genomics. 2022 Apr 23;23(1):324. doi: 10.1186/s12864-022-08548-y.

Space-efficient representation of genomic k-mer count tables.基因组k-mer计数表的空间高效表示。

Algorithms Mol Biol. 2022 Mar 21;17(1):5. doi: 10.1186/s13015-022-00212-0.

Dysgu: efficient structural variant calling using short or long reads.Dysgu：使用短读长读进行高效的结构变异调用。

Nucleic Acids Res. 2022 May 20;50(9):e53. doi: 10.1093/nar/gkac039.

Effective sequence similarity detection with strobemers.利用频闪体进行有效的序列相似性检测。

Genome Res. 2021 Nov;31(11):2080-2094. doi: 10.1101/gr.275648.121. Epub 2021 Oct 19.

本文引用的文献

A benchmark of transposon insertion detection tools using real data.使用真实数据的转座子插入检测工具的基准测试。

Mob DNA. 2019 Dec 30;10:53. doi: 10.1186/s13100-019-0197-9. eCollection 2019.

Paragraph: a graph-based structural variant genotyper for short-read sequence data.段落：基于图的短读序列数据结构变异基因分型器。

Genome Biol. 2019 Dec 19;20(1):291. doi: 10.1186/s13059-019-1909-7.

Evaluation of computational genotyping of structural variation for clinical diagnoses.结构变异计算基因分型在临床诊断中的评估。

Gigascience. 2019 Sep 1;8(9). doi: 10.1093/gigascience/giz110.

Kevlar: A Mapping-Free Framework for Accurate Discovery of De Novo Variants.凯夫拉尔：一种用于准确发现新生变异的无映射框架。

iScience. 2019 Aug 30;18:28-36. doi: 10.1016/j.isci.2019.07.032. Epub 2019 Jul 23.

MALVA: Genotyping by Mapping-free ALlele Detection of Known VAriants.MALVA：通过对已知变异进行无图谱等位基因检测进行基因分型。

iScience. 2019 Aug 30;18:20-27. doi: 10.1016/j.isci.2019.07.011. Epub 2019 Jul 12.

Multi-platform discovery of haplotype-resolved structural variation in human genomes.多平台发现人类基因组中单体型分辨率结构变异。

Nat Commun. 2019 Apr 16;10(1):1784. doi: 10.1038/s41467-018-08148-z.

Discovery of tandem and interspersed segmental duplications using high-throughput sequencing.利用高通量测序发现串联和散在的片段重复。

Bioinformatics. 2019 Oct 15;35(20):3923-3930. doi: 10.1093/bioinformatics/btz237.

Toward fast and accurate SNP genotyping from whole genome sequencing data for bedside diagnostics.致力于从全基因组测序数据中快速准确地进行 SNP 基因分型，以实现床边诊断。

Bioinformatics. 2019 Feb 1;35(3):415-420. doi: 10.1093/bioinformatics/bty641.

Accurate genotyping across variant classes and lengths using variant graphs.使用变异图进行跨变异类和长度的精确基因分型。

Nat Genet. 2018 Jul;50(7):1054-1059. doi: 10.1038/s41588-018-0145-5. Epub 2018 Jun 18.

Association mapping from sequencing reads using -mers.基于 -mers 的测序reads 的关联作图。

Elife. 2018 Jun 13;7:e32920. doi: 10.7554/eLife.32920.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

星云：超高效免图结构变异基因分型器。

Nebula: ultra-efficient mapping-free structural variant genotyper.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献