svclassify：一种建立基准结构变异调用的方法。

svclassify: a method to establish benchmark structural variant calls.

作者信息

Parikh Hemang, Mohiyuddin Marghoob, Lam Hugo Y K, Iyer Hariharan, Chen Desu, Pratt Mark, Bartha Gabor, Spies Noah, Losert Wolfgang, Zook Justin M, Salit Marc

机构信息

Genome-Scale Measurements Group, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8313, Gaithersburg, MD, 20899, USA.

Dakota Consulting Inc., 1110 Bonifant Street, Suite 310, Silver Spring, MD, 20910, USA.

出版信息

BMC Genomics. 2016 Jan 16;17:64. doi: 10.1186/s12864-016-2366-2.

DOI:10.1186/s12864-016-2366-2

PMID:26772178

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4715349/

Abstract

BACKGROUND

The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives.

RESULTS

We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz.

CONCLUSIONS

We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.

摘要

背景

人类基因组包含大小各异的变异，从单个小的单核苷酸多态性（SNP）到大型结构变异（SV）。美国国家标准与技术研究院（NIST）参考材料（NA12878）的高质量基准小变异位点已由基因组瓶子联盟开发出来，但该基因组尚无类似的高质量基准SV位点。由于SV检测工具输出的结果高度不一致，我们开发了一些方法，将来自多种测序技术的多种证据形式结合起来，将候选SV分类为可能的真阳性或假阳性。我们的方法（svclassify）从多种高通量测序技术的一个或多个比对的bam文件中计算注释，然后使用这些注释构建一个单类模型，将候选SV分类为可能的真阳性或假阳性。

结果

我们首先利用家系分析开发了一组高可信度的断点解析大缺失。然后，我们使用svclassify对这些缺失以及来自千人基因组计划的一组高可信度缺失和来自螺旋遗传学公司的一组断点解析复杂插入进行聚类和分类。我们发现，基于我们的注释，可能的SV与可能的非SV分别聚类，并且SV聚类为不同类型的缺失。然后，我们开发了一种有监督的单类分类方法，该方法使用一组随机的非SV区域训练集来确定候选SV是否具有与基因组大部分区域不同的异常注释。为了测试这种分类方法，我们使用基于家系的断点解析SV、经千人基因组计划验证的SV和基于组装的断点解析插入，以及使用svviz进行的半自动可视化。

结论

我们发现，来自多种技术的高分候选SV与PCR验证和正交一致性方法MetaSV具有高度一致性（一致性为99.7%），低分候选SV则存在疑问。我们从这些调用集中分发了一组2676个具有高svclassify分数的高可信度缺失和68个高可信度插入，用于基准测试SV检测工具。我们预计这些方法对于为已通过多种技术表征的基准样本建立高可信度SV调用特别有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/307d/4715349/472b77872c0c/12864_2016_2366_Fig1_HTML.jpg

相似文献

svclassify: a method to establish benchmark structural variant calls.svclassify：一种建立基准结构变异调用的方法。

BMC Genomics. 2016 Jan 16;17:64. doi: 10.1186/s12864-016-2366-2.

Robust Benchmark Structural Variant Calls of An Asian Using State-of-the-art Long-read Sequencing Technologies.利用最先进的长读测序技术对亚洲个体进行稳健的基准结构变异调用。

Genomics Proteomics Bioinformatics. 2022 Feb;20(1):192-204. doi: 10.1016/j.gpb.2020.10.006. Epub 2021 Mar 2.

VISTA: an integrated framework for structural variant discovery.VISTA：一个用于结构变异发现的集成框架。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae462.

A Comparison of Structural Variant Calling from Short-Read and Nanopore-Based Whole-Genome Sequencing Using Optical Genome Mapping as a Benchmark.基于光学基因组图谱作为基准的短读长和纳米孔全基因组测序的结构变异调用比较。

Genes (Basel). 2024 Jul 16;15(7):925. doi: 10.3390/genes15070925.

A robust benchmark for detection of germline large deletions and insertions.一种用于检测种系大片段缺失和插入的稳健基准

Nat Biotechnol. 2020 Nov;38(11):1347-1355. doi: 10.1038/s41587-020-0538-8. Epub 2020 Jun 15.

A large structural variant collection in Holstein cattle and associated database for variant discovery, characterization, and application.荷斯坦牛大型结构变异组库及相关数据库的建立，用于变异的发现、鉴定和应用。

BMC Genomics. 2024 Sep 30;25(1):903. doi: 10.1186/s12864-024-10812-2.

SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines.SV自动领航仪：结构变异发现与基准测试管道的优化自动化构建

BMC Genomics. 2015 Mar 25;16(1):238. doi: 10.1186/s12864-015-1376-9.

Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies.利用多种测序技术对癌症参考细胞系样本进行结构变异分析。

Genome Biol. 2022 Dec 13;23(1):255. doi: 10.1186/s13059-022-02816-6.

A crowdsourced set of curated structural variants for the human genome.一个人类基因组的众包精选结构变异集。

PLoS Comput Biol. 2020 Jun 19;16(6):e1007933. doi: 10.1371/journal.pcbi.1007933. eCollection 2020 Jun.

Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement.小的多态性是结构变异断点位置中祖先偏见的一个来源。

Genome Res. 2024 Feb 7;34(1):7-19. doi: 10.1101/gr.278203.123.

引用本文的文献

Comparisons of performances of structural variants detection algorithms in solitary or combination strategy.结构变异检测算法在单独或联合策略下的性能比较。

PLoS One. 2025 Feb 6;20(2):e0314982. doi: 10.1371/journal.pone.0314982. eCollection 2025.

A robust benchmark for detecting low-frequency variants in the HG002 Genome In A Bottle NIST reference material.用于检测基因组在瓶 NIST 参考材料 HG002 中低频变异的强大基准。

bioRxiv. 2024 Dec 5:2024.12.02.625685. doi: 10.1101/2024.12.02.625685.

Comprehensive assessment of long-read sequencing platforms and calling algorithms for detection of copy number variation.长读测序平台和拷贝数变异检测调用算法的综合评估。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae441.

VariantDetective: an accurate all-in-one pipeline for detecting consensus bacterial SNPs and SVs.VariantDetective：一种准确的一体化管道，用于检测共识细菌 SNPs 和 SVs。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae066.

invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.invMap：一种用于具有反转结构变体的长噪声读取的敏感映射工具。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad726.

A survey of algorithms for the detection of genomic structural variants from long-read sequencing data.长读测序数据中基因组结构变异检测算法研究综述。

Nat Methods. 2023 Aug;20(8):1143-1158. doi: 10.1038/s41592-023-01932-w. Epub 2023 Jun 29.

ONT long-read WGS for variant discovery and orthogonal confirmation of short read WGS derived genetic variants in clinical genetic testing.用于临床基因检测中变异发现以及对二代测序（short read WGS）衍生的基因变异进行正交确认的单分子纳米孔长读长全基因组测序（ONT long-read WGS）。

Front Genet. 2023 Apr 21;14:1145285. doi: 10.3389/fgene.2023.1145285. eCollection 2023.

A collection of read depth profiles at structural variant breakpoints.结构变异断点的读取深度分布图谱集。

Sci Data. 2023 Apr 6;10(1):186. doi: 10.1038/s41597-023-02076-4.

PerSVade: personalized structural variant detection in any species of interest.PerSVade：在任何感兴趣的物种中进行个性化结构变异检测。

Genome Biol. 2022 Aug 16;23(1):175. doi: 10.1186/s13059-022-02737-4.

A comprehensive benchmarking of WGS-based deletion structural variant callers.基于 WGS 的缺失结构变异调用器的综合基准测试。

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac221.

本文引用的文献

svviz: a read viewer for validating structural variants.Svviz：用于验证结构变异的读取查看器。

Bioinformatics. 2015 Dec 15;31(24):3994-6. doi: 10.1093/bioinformatics/btv478. Epub 2015 Aug 18.

Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms.对1092名人类的缺失断点进行分析，揭示了突变机制的细节。

Nat Commun. 2015 Jun 1;6:7256. doi: 10.1038/ncomms8256.

Assessing structural variation in a personal genome-towards a human reference diploid genome.评估个人基因组中的结构变异——迈向人类参考二倍体基因组

BMC Genomics. 2015 Apr 11;16(1):286. doi: 10.1186/s12864-015-1479-3.

MetaSV: an accurate and integrative structural-variant caller for next generation sequencing.MetaSV：一种用于下一代测序的准确且综合的结构变异检测工具。

Bioinformatics. 2015 Aug 15;31(16):2741-4. doi: 10.1093/bioinformatics/btv204. Epub 2015 Apr 10.

LUMPY: a probabilistic framework for structural variant discovery.LUMPY：一种用于结构变异发现的概率框架。

Genome Biol. 2014 Jun 26;15(6):R84. doi: 10.1186/gb-2014-15-6-r84.

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.整合人类序列数据集提供了一个基准 SNP 和 indel 基因型调用资源。

Nat Biotechnol. 2014 Mar;32(3):246-51. doi: 10.1038/nbt.2835. Epub 2014 Feb 16.

Rate of de novo mutations and the importance of father's age to disease risk.新突变率和父亲年龄对疾病风险的重要性。

Nature. 2012 Aug 23;488(7412):471-5. doi: 10.1038/nature11396.

Detecting and annotating genetic variations using the HugeSeq pipeline.使用HugeSeq流程检测和注释基因变异。

Nat Biotechnol. 2012 Mar 7;30(3):226-9. doi: 10.1038/nbt.2134.

Genome structural variation discovery and genotyping.基因组结构变异发现与基因分型。

Nat Rev Genet. 2011 May;12(5):363-76. doi: 10.1038/nrg2958. Epub 2011 Mar 1.

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.CNVnator：一种从家族和人群基因组测序中发现、基因分型和表征典型和非典型 CNV 的方法。

Genome Res. 2011 Jun;21(6):974-84. doi: 10.1101/gr.114876.110. Epub 2011 Feb 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

svclassify：一种建立基准结构变异调用的方法。

svclassify: a method to establish benchmark structural variant calls.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献