利用来自单个个体的长读长测序技术，为变异检测方法的基准测试提供全面资源。

Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods.

作者信息

Mu John C, Tootoonchi Afshar Pegah, Mohiyuddin Marghoob, Chen Xi, Li Jian, Bani Asadi Narges, Gerstein Mark B, Wong Wing H, Lam Hugo Y K

机构信息

Bina Technologies, Roche Sequencing, Redwood City, CA 94065, USA.

Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

出版信息

Sci Rep. 2015 Sep 28;5:14493. doi: 10.1038/srep14493.

DOI:10.1038/srep14493

PMID:26412485

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4585973/

Abstract

A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.

摘要

一个高可信度、全面的人类变异数据集对于评估测序算法的准确性至关重要，而测序算法在基于高通量测序的精准医学中起着关键作用。尽管近期的研究试图提供这样一种资源，但它们仍然没有涵盖包括结构变异（SVs）在内的所有主要变异类型。因此，我们利用来自HuRef基因组的大量高质量桑格测序数据，构建了迄今为止单个个体最全面的金标准数据集，并通过深度Illumina测序、群体数据集和成熟算法进行了交叉验证。由于HuRef基因组先前公布的变异大多是在五年前报道的，存在兼容性、组织性和准确性问题，无法直接用于基准测试，因此完全重新分析HuRef基因组是必要的。我们广泛的分析和验证产生了一个具有高特异性和敏感性的金标准数据集。与目前的NA12878或HS1011基因组金标准数据集不同，我们的金标准数据集是第一个包含小变异、长达十万碱基对的缺失SVs和插入SVs的数据集。我们展示了我们的HuRef金标准数据集在对几种已发表的SV检测工具进行基准测试方面的实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d03/4585973/7ccc2a812ae4/srep14493-f1.jpg

相似文献

Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods.利用来自单个个体的长读长测序技术，为变异检测方法的基准测试提供全面资源。

Sci Rep. 2015 Sep 28;5:14493. doi: 10.1038/srep14493.

Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.基准测试显示深度学习变异调用程序在细菌纳米孔测序数据上的优越性。

Elife. 2024 Oct 10;13:RP98300. doi: 10.7554/eLife.98300.

A Comparison of Structural Variant Calling from Short-Read and Nanopore-Based Whole-Genome Sequencing Using Optical Genome Mapping as a Benchmark.基于光学基因组图谱作为基准的短读长和纳米孔全基因组测序的结构变异调用比较。

Genes (Basel). 2024 Jul 16;15(7):925. doi: 10.3390/genes15070925.

svclassify: a method to establish benchmark structural variant calls.svclassify：一种建立基准结构变异调用的方法。

BMC Genomics. 2016 Jan 16;17:64. doi: 10.1186/s12864-016-2366-2.

Extensive and deep sequencing of the Venter/HuRef genome for developing and benchmarking genome analysis tools.对文特尔/人类参考基因组进行广泛而深入的测序，以开发和基准测试基因组分析工具。

Sci Data. 2018 Dec 18;5:180261. doi: 10.1038/sdata.2018.261.

A comprehensive benchmarking of WGS-based deletion structural variant callers.基于 WGS 的缺失结构变异调用器的综合基准测试。

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac221.

VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing.VolcanoSV 可实现基于单分子长读测序的二倍体基因组中准确稳健的结构变异 calling。

Nat Commun. 2024 Aug 13;15(1):6956. doi: 10.1038/s41467-024-51282-0.

SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines.SV自动领航仪：结构变异发现与基准测试管道的优化自动化构建

BMC Genomics. 2015 Mar 25;16(1):238. doi: 10.1186/s12864-015-1376-9.

VISTA: an integrated framework for structural variant discovery.VISTA：一个用于结构变异发现的集成框架。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae462.

Robust Benchmark Structural Variant Calls of An Asian Using State-of-the-art Long-read Sequencing Technologies.利用最先进的长读测序技术对亚洲个体进行稳健的基准结构变异调用。

Genomics Proteomics Bioinformatics. 2022 Feb;20(1):192-204. doi: 10.1016/j.gpb.2020.10.006. Epub 2021 Mar 2.

引用本文的文献

Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders.检测和分析人类基因组在不同人群和精神疾病供体大脑中的复杂结构变异。

Cell. 2024 Nov 14;187(23):6687-6706.e25. doi: 10.1016/j.cell.2024.09.014. Epub 2024 Sep 30.

Variant calling and benchmarking in an era of complete human genome sequences.全基因组序列时代的变异调用和基准测试。

Nat Rev Genet. 2023 Jul;24(7):464-483. doi: 10.1038/s41576-023-00590-0. Epub 2023 Apr 14.

A random forest-based framework for genotyping and accuracy assessment of copy number variations.一种基于随机森林的拷贝数变异基因分型及准确性评估框架。

NAR Genom Bioinform. 2020 Sep 22;2(3):lqaa071. doi: 10.1093/nargab/lqaa071. eCollection 2020 Sep.

Best practices for variant calling in clinical sequencing.临床测序中变异调用的最佳实践。

Genome Med. 2020 Oct 26;12(1):91. doi: 10.1186/s13073-020-00791-w.

A robust benchmark for detection of germline large deletions and insertions.一种用于检测种系大片段缺失和插入的稳健基准

Nat Biotechnol. 2020 Nov;38(11):1347-1355. doi: 10.1038/s41587-020-0538-8. Epub 2020 Jun 15.

Next Generation Sequencing in Newborn Screening in the United Kingdom National Health Service.英国国民医疗服务体系中新生儿筛查的下一代测序技术

Int J Neonatal Screen. 2019 Dec;5(4):40. doi: 10.3390/ijns5040040. Epub 2019 Nov 5.

Haplotypes spanning centromeric regions reveal persistence of large blocks of archaic DNA.单体型跨越着着丝粒区域，揭示了大量古老 DNA 的持续存在。

Elife. 2019 Jun 25;8:e42989. doi: 10.7554/eLife.42989.

An open resource for accurately benchmarking small variant and reference calls.用于准确基准测试小型变体和参考调用的开放资源。

Nat Biotechnol. 2019 May;37(5):561-566. doi: 10.1038/s41587-019-0074-6. Epub 2019 Apr 1.

Sci Data. 2018 Dec 18;5:180261. doi: 10.1038/sdata.2018.261.

本文引用的文献

Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms.对1092名人类的缺失断点进行分析，揭示了突变机制的细节。

Nat Commun. 2015 Jun 1;6:7256. doi: 10.1038/ncomms8256.

Assessing structural variation in a personal genome-towards a human reference diploid genome.评估个人基因组中的结构变异——迈向人类参考二倍体基因组

BMC Genomics. 2015 Apr 11;16(1):286. doi: 10.1186/s12864-015-1479-3.

MetaSV: an accurate and integrative structural-variant caller for next generation sequencing.MetaSV：一种用于下一代测序的准确且综合的结构变异检测工具。

Bioinformatics. 2015 Aug 15;31(16):2741-4. doi: 10.1093/bioinformatics/btv204. Epub 2015 Apr 10.

From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.从FastQ数据到高可信度变异检测：基因组分析工具包最佳实践流程

Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43.

Vindel: a simple pipeline for checking indel redundancy.Vindel：一个用于检查插入缺失冗余的简单流程。

BMC Bioinformatics. 2014 Nov 19;15(1):359. doi: 10.1186/s12859-014-0359-1.

LUMPY: a probabilistic framework for structural variant discovery.LUMPY：一种用于结构变异发现的概率框架。

Genome Biol. 2014 Jun 26;15(6):R84. doi: 10.1186/gb-2014-15-6-r84.

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.整合人类序列数据集提供了一个基准 SNP 和 indel 基因型调用资源。

Nat Biotechnol. 2014 Mar;32(3):246-51. doi: 10.1038/nbt.2835. Epub 2014 Feb 16.

The Database of Genomic Variants: a curated collection of structural variation in the human genome.基因组变异数据库：人类基因组中结构变异的精心整理集合。

Nucleic Acids Res. 2014 Jan;42(Database issue):D986-92. doi: 10.1093/nar/gkt958. Epub 2013 Oct 29.

A simple consensus approach improves somatic mutation prediction accuracy.一种简单的共识方法可提高体细胞突变预测准确性。

Genome Med. 2013 Sep 30;5(9):90. doi: 10.1186/gm494. eCollection 2013.

DELLY: structural variant discovery by integrated paired-end and split-read analysis.DELLY：通过整合的 paired-end 和 split-read 分析进行结构变异发现。

Bioinformatics. 2012 Sep 15;28(18):i333-i339. doi: 10.1093/bioinformatics/bts378.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用来自单个个体的长读长测序技术，为变异检测方法的基准测试提供全面资源。

Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献