一种用于检测种系大片段缺失和插入的稳健基准

A robust benchmark for detection of germline large deletions and insertions.

机构信息

Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.

National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA.

出版信息

Nat Biotechnol. 2020 Nov;38(11):1347-1355. doi: 10.1038/s41587-020-0538-8. Epub 2020 Jun 15.

DOI:10.1038/s41587-020-0538-8

PMID:32541955

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8454654/

Abstract

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.

摘要

新技术和分析方法使得基因组结构变异 (SV) 的检测精度、分辨率和全面性不断提高。为了帮助将这些方法转化为常规研究和临床实践，我们开发了一个用于识别种系大片段插入和缺失的假阴性和假阳性的序列解析基准集。为了在个人基因组计划三胞胎中广泛同意的儿子身上创建一个广泛可用的细胞和 DNA 的基准集，基因组瓶联盟整合了来自多种技术的 19 种序列解析变异调用方法。最终的基准集包含 12745 个独立的、序列解析的插入（7281 个）和删除（5464 个）调用，长度≥50 个碱基对（bp）。Tier1 基准区域的任何额外调用都被认为是潜在的假阳性，覆盖了 251 Gbp 和 5262 个插入和 4095 个删除，这些区域得到了至少一个二倍体组装的支持。我们证明了基准集可以可靠地识别短读、链接读和长读测序以及光学图谱中高质量 SV 调用集中的假阴性和假阳性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f213/8454654/dae608f7acaa/nihms-1589143-f0008.jpg

相似文献

A robust benchmark for detection of germline large deletions and insertions.

Nat Biotechnol. 2020 Nov;38(11):1347-1355. doi: 10.1038/s41587-020-0538-8. Epub 2020 Jun 15.

svclassify: a method to establish benchmark structural variant calls.

BMC Genomics. 2016 Jan 16;17:64. doi: 10.1186/s12864-016-2366-2.

Whole-genome sequencing with long reads reveals complex structure and origin of structural variation in human genetic variations and somatic mutations in cancer.

Genome Med. 2021 Apr 29;13(1):65. doi: 10.1186/s13073-021-00883-1.

Robust Benchmark Structural Variant Calls of An Asian Using State-of-the-art Long-read Sequencing Technologies.

Genomics Proteomics Bioinformatics. 2022 Feb;20(1):192-204. doi: 10.1016/j.gpb.2020.10.006. Epub 2021 Mar 2.

Identification of genomic indels and structural variations using split reads.

BMC Genomics. 2011 Jul 25;12:375. doi: 10.1186/1471-2164-12-375.

A crowdsourced set of curated structural variants for the human genome.

PLoS Comput Biol. 2020 Jun 19;16(6):e1007933. doi: 10.1371/journal.pcbi.1007933. eCollection 2020 Jun.

Large indel detection in region-based phased diploid assemblies from linked-reads.

BMC Genomics. 2025 Mar 18;26(Suppl 2):263. doi: 10.1186/s12864-025-11398-z.

A Comparison of Structural Variant Calling from Short-Read and Nanopore-Based Whole-Genome Sequencing Using Optical Genome Mapping as a Benchmark.

Genes (Basel). 2024 Jul 16;15(7):925. doi: 10.3390/genes15070925.

SvABA: genome-wide detection of structural variants and indels by local assembly.

Genome Res. 2018 Apr;28(4):581-591. doi: 10.1101/gr.221028.117. Epub 2018 Mar 13.

Comparative assessments of indel annotations in healthy and cancer genomes with next-generation sequencing data.

BMC Med Genomics. 2020 Nov 10;13(1):170. doi: 10.1186/s12920-020-00818-6.

引用本文的文献

Multi-omics Quality Assessment in Personalized Medicine Through European Infrastructure for Translational Medicine (EATRIS): An Overview.

Phenomics. 2025 Apr 1;5(3):311-325. doi: 10.1007/s43657-024-00170-0. eCollection 2025 Jun.

BVSim: A benchmarking variation simulator mimicking human variation spectrum.

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf095.

Structural Variants: Mechanisms, Mapping, and Interpretation in Human Genetics.

Genes (Basel). 2025 Jul 29;16(8):905. doi: 10.3390/genes16080905.

A Hitchhiker Guide to Structural Variant Calling: A Comprehensive Benchmark Through Different Sequencing Technologies.

Biomedicines. 2025 Aug 9;13(8):1949. doi: 10.3390/biomedicines13081949.

Benchmarking of low coverage sequencing workflows for precision genotyping in eggplant.

BMC Plant Biol. 2025 Aug 25;25(1):1125. doi: 10.1186/s12870-025-07242-x.

From Expert Knowledge to Validation Resources: A Case for Using in Silico Approaches to Close the Gap in Available Reference Materials for Common Germline Genetic Tests.

J Mol Diagn. 2025 Aug 20. doi: 10.1016/j.jmoldx.2025.07.006.

SV-MeCa: an XGBoost-based meta-caller approach for structural variant calling from short-read data.

BMC Bioinformatics. 2025 Aug 20;26(1):218. doi: 10.1186/s12859-025-06246-6.

TRsv: simultaneous detection of tandem repeat variations, structural variations, and short indels using long read sequencing data.

Genome Biol. 2025 Aug 20;26(1):246. doi: 10.1186/s13059-025-03718-z.

Resequencing and phenotyping of the first highly inbred eggplant multiparent population reveal as a key gene associated with root morphology.

Hortic Res. 2025 Jun 26;12(9):uhaf167. doi: 10.1093/hr/uhaf167. eCollection 2025 Sep.

Whole-genome variant detection in long-read sequencing data from ultra-low input patient samples.

medRxiv. 2025 Jul 27:2025.07.25.25332067. doi: 10.1101/2025.07.25.25332067.

本文引用的文献

VALOR2: characterization of large-scale structural variants using linked-reads.

Genome Biol. 2020 Mar 19;21(1):72. doi: 10.1186/s13059-020-01975-8.

Genotyping structural variants in pangenome graphs using the vg toolkit.

Genome Biol. 2020 Feb 12;21(1):35. doi: 10.1186/s13059-020-1941-7.

Paragraph: a graph-based structural variant genotyper for short-read sequence data.

Genome Biol. 2019 Dec 19;20(1):291. doi: 10.1186/s13059-019-1909-7.

Long-Read Sequencing Emerging in Medical Genetics.

Front Genet. 2019 May 7;10:426. doi: 10.3389/fgene.2019.00426. eCollection 2019.

Multi-platform discovery of haplotype-resolved structural variation in human genomes.

Nat Commun. 2019 Apr 16;10(1):1784. doi: 10.1038/s41467-018-08148-z.

An open resource for accurately benchmarking small variant and reference calls.

Nat Biotechnol. 2019 May;37(5):561-566. doi: 10.1038/s41587-019-0074-6. Epub 2019 Apr 1.

Resolving the full spectrum of human genome variation using Linked-Reads.

Genome Res. 2019 Apr;29(4):635-645. doi: 10.1101/gr.234443.118. Epub 2019 Mar 20.

Best practices for benchmarking germline small-variant calls in human genomes.

Nat Biotechnol. 2019 May;37(5):555-560. doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.

Characterizing the Major Structural Variant Alleles of the Human Genome.

Cell. 2019 Jan 24;176(3):663-675.e19. doi: 10.1016/j.cell.2018.12.019. Epub 2019 Jan 17.

Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection.

Genome Biol. 2018 Nov 6;19(1):188. doi: 10.1186/s13059-018-1539-5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于检测种系大片段缺失和插入的稳健基准

A robust benchmark for detection of germline large deletions and insertions.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献