一种基于重复序列的方法，用于使用长读长测序技术验证结构变异。

A recurrence-based approach for validating structural variation using long-read sequencing technology.

作者信息

Zhao Xuefang, Weber Alexandra M, Mills Ryan E

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Ave, Ann Arbor, MI 48109, USA.

Department of Human Genetics, University of Michigan, 1241 Catherine St, Ann Arbor, MI 48109, USA.

出版信息

Gigascience. 2017 Aug 1;6(8):1-9. doi: 10.1093/gigascience/gix061.

DOI:10.1093/gigascience/gix061

PMID:28873962

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5737365/

Abstract

Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read-based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.

摘要

尽管已经开发了许多算法来识别基因组序列中的结构变异（SVs），但缺乏可用于评估其结果的方法。这一点很重要，因为结构变异的准确识别仍然是基因组学中一个突出但重要的问题。理论上，能够生成更长序列读数的新测序技术的出现，可以为所有类型的SVs提供直接证据，而不管其跨越区域的长度如何。然而，目前以这种方式使用这些数据的努力需要使用大量计算资源来组装这些序列，以及对每个区域进行目视检查。在这里，我们介绍了VaPoR，这是一种高效算法，可使用长读测序数据自主验证大型SV集。我们评估了VaPoR在模拟和真实基因组中的SVs性能，并报告了在不同序列深度水平下总体准确性的高保真率。我们表明，VaPoR可以检测到范围更广的SVs，同时在假阳性验证方面仍与现有方法相匹配，并在断点精度和预测基因型方面提供额外的特征。我们进一步表明，VaPoR可以快速高效地运行，而无需大型处理或组装流程。VaPoR为基因组SVs提供了一种基于长读的验证方法，该方法需要相对较低的读深度和计算资源，因此将为准确的SV评估提供有针对性或低通量测序覆盖的实用工具。VaPoR软件可在以下网址获取：https://github.com/mills-lab/vapor 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bddd/5737365/dde4e1f430cd/gix061fig1.jpg

相似文献

A recurrence-based approach for validating structural variation using long-read sequencing technology.

Gigascience. 2017 Aug 1;6(8):1-9. doi: 10.1093/gigascience/gix061.

Duphold: scalable, depth-based annotation and curation of high-confidence structural variant calls.

Gigascience. 2019 Apr 1;8(4). doi: 10.1093/gigascience/giz040.

Comprehensive evaluation and guidance of structural variation detection tools in chicken whole genome sequence data.

BMC Genomics. 2024 Oct 16;25(1):970. doi: 10.1186/s12864-024-10875-1.

SVsearcher: A more accurate structural variation detection method in long read data.

Comput Biol Med. 2023 May;158:106843. doi: 10.1016/j.compbiomed.2023.106843. Epub 2023 Mar 31.

Validation of Genomic Structural Variants Through Long Sequencing Technologies.

Methods Mol Biol. 2018;1833:187-192. doi: 10.1007/978-1-4939-8666-8_15.

GGTyper: genotyping complex structural variants using short-read sequencing data.

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii11-ii19. doi: 10.1093/bioinformatics/btae391.

SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies.

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae336.

SVJedi: genotyping structural variations with long reads.

Bioinformatics. 2020 Nov 1;36(17):4568-4575. doi: 10.1093/bioinformatics/btaa527.

SVvalidation: A long-read-based validation method for genomic structural variation.

PLoS One. 2024 Jan 5;19(1):e0291741. doi: 10.1371/journal.pone.0291741. eCollection 2024.

Discovery of tandem and interspersed segmental duplications using high-throughput sequencing.

Bioinformatics. 2019 Oct 15;35(20):3923-3930. doi: 10.1093/bioinformatics/btz237.

引用本文的文献

A Murine Database of Structural Variants Enables the Genetic Architecture of a Spontaneous Murine Lymphoma to be Characterized.

bioRxiv. 2025 Jan 14:2025.01.09.632219. doi: 10.1101/2025.01.09.632219.

Adaptive functions of structural variants in human brain development.

Sci Adv. 2024 Apr 5;10(14):eadl4600. doi: 10.1126/sciadv.adl4600.

De novo and somatic structural variant discovery with SVision-pro.

Nat Biotechnol. 2025 Feb;43(2):181-185. doi: 10.1038/s41587-024-02190-7. Epub 2024 Mar 22.

SVvalidation: A long-read-based validation method for genomic structural variation.

PLoS One. 2024 Jan 5;19(1):e0291741. doi: 10.1371/journal.pone.0291741. eCollection 2024.

SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph.

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i270-i278. doi: 10.1093/bioinformatics/btad237.

A survey of algorithms for the detection of genomic structural variants from long-read sequencing data.

Nat Methods. 2023 Aug;20(8):1143-1158. doi: 10.1038/s41592-023-01932-w. Epub 2023 Jun 29.

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.

Cell. 2022 Sep 1;185(18):3426-3440.e19. doi: 10.1016/j.cell.2022.08.004.

TT-Mars: structural variants assessment based on haplotype-resolved assemblies.

Genome Biol. 2022 May 6;23(1):110. doi: 10.1186/s13059-022-02666-2.

Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data.

BMC Genomics. 2022 Apr 23;23(1):324. doi: 10.1186/s12864-022-08548-y.

Integrating whole-genome sequencing with multi-omic data reveals the impact of structural variants on gene regulation in the human brain.

Nat Neurosci. 2022 Apr;25(4):504-514. doi: 10.1038/s41593-022-01031-7. Epub 2022 Mar 14.

本文引用的文献

Discovery and genotyping of structural variation from long-read haploid genome sequence data.

Genome Res. 2017 May;27(5):677-685. doi: 10.1101/gr.214007.116. Epub 2016 Nov 28.

novoBreak: local assembly for breakpoint detection in cancer genomes.

Nat Methods. 2017 Jan;14(1):65-67. doi: 10.1038/nmeth.4084. Epub 2016 Nov 28.

Improved assembly of noisy long reads by k-mer validation.

Genome Res. 2016 Dec;26(12):1710-1720. doi: 10.1101/gr.209247.116. Epub 2016 Oct 7.

Long-read sequencing and de novo assembly of a Chinese genome.

Nat Commun. 2016 Jun 30;7:12065. doi: 10.1038/ncomms12065.

Resolving complex structural genomic rearrangements using a randomized approach.

Genome Biol. 2016 Jun 10;17(1):126. doi: 10.1186/s13059-016-0993-1.

PacBio Sequencing and Its Applications.

Genomics Proteomics Bioinformatics. 2015 Oct;13(5):278-89. doi: 10.1016/j.gpb.2015.08.002. Epub 2015 Nov 2.

An integrated map of structural variation in 2,504 human genomes.

Nature. 2015 Oct 1;526(7571):75-81. doi: 10.1038/nature15394.

A global reference for human genetic variation.

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Assembly and diploid architecture of an individual human genome via single-molecule technologies.

Nat Methods. 2015 Aug;12(8):780-6. doi: 10.1038/nmeth.3454. Epub 2015 Jun 29.

Resolving the complexity of the human genome using single-molecule sequencing.

Nature. 2015 Jan 29;517(7536):608-11. doi: 10.1038/nature13907. Epub 2014 Nov 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

一种基于重复序列的方法，用于使用长读长测序技术验证结构变异。

A recurrence-based approach for validating structural variation using long-read sequencing technology.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

一种基于重复序列的方法，用于使用长读长测序技术验证结构变异。

A recurrence-based approach for validating structural variation using long-read sequencing technology.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献