Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
Bioinformatics. 2019 Sep 1;35(17):2907-2915. doi: 10.1093/bioinformatics/btz041.
MOTIVATION: Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. RESULTS: We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. AVAILABILITY AND IMPLEMENTATION: The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
动机:结构变体被定义为大于 50bp 的基因组变体。它们已被证明在任何给定的基因组中影响的碱基比单核苷酸多态性或小插入和缺失更多。此外,它们对人类表型和多样性有很大的影响,并与许多疾病有关。由于它们的大小和与重复序列的关联,它们很难通过鸟枪法测序检测到,尤其是基于短读长时。Pacific Biosciences 或 Oxford Nanopore Technologies 等提供的长读长、单分子测序技术可产生数千个碱基对长的读长。尽管错误率和测序成本较高,但长读长测序在检测结构变体方面具有许多优势。然而,可用的软件工具仍未充分利用这些可能性。
结果:我们提出了 SVIM,这是一种用于从长读长数据中敏感检测和精确表征结构变体的工具。SVIM 由三个组件组成,用于从读长比对中收集、聚类和组合结构变体特征。它可区分包括串联和散布重复在内的五种不同的变体类型,以及新型元件插入。SVIM 的独特之处在于能够提取重复的基因组起源和目的地。它在模拟数据和来自 Pacific Biosciences 和 Nanopore 测序仪的真实数据集的评估中与现有工具相比表现出色。
可用性和实现:SVIM 的源代码和可执行文件可在 Github 上获得:github.com/eldariont/svim。SVIM 是用 Python 3 实现的,并在 bioconda 和 Python 包索引上发布。
补充信息:补充数据可在 Bioinformatics 在线获得。
Bioinformatics. 2019-9-1
Bioinformatics. 2021-4-1
Bioinformatics. 2019-1-1
Bioinformatics. 2019-10-15
Bioinformatics. 2021-11-5
Bioinformatics. 2018-3-1
Bioinformatics. 2019-11-1
Bioinformatics. 2020-2-1
Bioinformatics. 2020-11-1
BMC Bioinformatics. 2020-2-21
Gigascience. 2025-1-6
Genes (Basel). 2025-7-29
Int J Mol Sci. 2025-7-10
World J Microbiol Biotechnol. 2025-7-28
Nature. 2025-7-23
Comput Struct Biotechnol J. 2025-6-29
Bioinformatics. 2018-9-15
Nat Methods. 2018-4-30
Nat Rev Genet. 2018-6
Nat Biotechnol. 2018-1-29
Bioinformatics. 2016-9-1
Genetics. 2016-4
Nat Rev Genet. 2016-4