为了更好地理解基于短读长的变异检测工具对插入变异的低召回率。

Towards a better understanding of the low recall of insertion variants with short-read based variant callers.

作者信息

Delage Wesley J, Thevenon Julien, Lemaitre Claire

机构信息

Univ Rennes, Inria, CNRS, IRISA, Rennes, F-35000, France.

Inserm U1209, CNRS UMR 5309, Univ. Grenoble Alpes, Institute for Advanced Biosciences, Grenoble, France & Genetics, Genomics and Reproduction Service, Centre Hospitalo-Universitaire Grenoble-Alpes, Grenoble, France.

出版信息

BMC Genomics. 2020 Nov 4;21(1):762. doi: 10.1186/s12864-020-07125-5.

DOI:10.1186/s12864-020-07125-5

PMID:33148192

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7640490/

Abstract

BACKGROUND

Since 2009, numerous tools have been developed to detect structural variants using short read technologies. Insertions >50 bp are one of the hardest type to discover and are drastically underrepresented in gold standard variant callsets. The advent of long read technologies has completely changed the situation. In 2019, two independent cross technologies studies have published the most complete variant callsets with sequence resolved insertions in human individuals. Among the reported insertions, only 17 to 28% could be discovered with short-read based tools.

RESULTS

In this work, we performed an in-depth analysis of these unprecedented insertion callsets in order to investigate the causes of such failures. We have first established a precise classification of insertion variants according to four layers of characterization: the nature and size of the inserted sequence, the genomic context of the insertion site and the breakpoint junction complexity. Because these levels are intertwined, we then used simulations to characterize the impact of each complexity factor on the recall of several structural variant callers. We showed that most reported insertions exhibited characteristics that may interfere with their discovery: 63% were tandem repeat expansions, 38% contained homology larger than 10 bp within their breakpoint junctions and 70% were located in simple repeats. Consequently, the recall of short-read based variant callers was significantly lower for such insertions (6% for tandem repeats vs 56% for mobile element insertions). Simulations showed that the most impacting factor was the insertion type rather than the genomic context, with various difficulties being handled differently among the tested structural variant callers, and they highlighted the lack of sequence resolution for most insertion calls.

CONCLUSIONS

Our results explain the low recall by pointing out several difficulty factors among the observed insertion features and provide avenues for improving SV caller algorithms and their combinations.

摘要

背景

自2009年以来，已经开发了许多使用短读长技术来检测结构变异的工具。大于50 bp的插入是最难发现的类型之一，在金标准变异调用集中的代表性严重不足。长读长技术的出现彻底改变了这种情况。2019年，两项独立的跨技术研究发表了人类个体中最完整的具有序列解析插入的变异调用集。在报告的插入中，基于短读长的工具只能发现17%至28%。

结果

在这项工作中，我们对这些前所未有的插入调用集进行了深入分析，以调查此类失败的原因。我们首先根据四层特征对插入变异进行了精确分类：插入序列的性质和大小、插入位点的基因组背景以及断点连接的复杂性。由于这些层面相互交织，我们随后使用模拟来表征每个复杂性因素对几种结构变异调用者召回率的影响。我们表明，大多数报告的插入表现出可能干扰其发现的特征：63%是串联重复扩增，38%在其断点连接内包含大于10 bp的同源性，70%位于简单重复序列中。因此，对于此类插入，基于短读长的变异调用者的召回率显著较低（串联重复为6%，而移动元件插入为56%）。模拟表明，最具影响的因素是插入类型而非基因组背景，在测试的结构变异调用者中，各种困难的处理方式不同，并且它们突出了大多数插入调用缺乏序列分辨率的问题。

结论

我们的结果通过指出观察到的插入特征中的几个困难因素解释了召回率低的原因，并为改进结构变异调用者算法及其组合提供了途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73cd/7640490/66ab9384bbca/12864_2020_7125_Fig1_HTML.jpg

相似文献

Towards a better understanding of the low recall of insertion variants with short-read based variant callers.为了更好地理解基于短读长的变异检测工具对插入变异的低召回率。

BMC Genomics. 2020 Nov 4;21(1):762. doi: 10.1186/s12864-020-07125-5.

Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement.小的多态性是结构变异断点位置中祖先偏见的一个来源。

Genome Res. 2024 Feb 7;34(1):7-19. doi: 10.1101/gr.278203.123.

A Comparison of Structural Variant Calling from Short-Read and Nanopore-Based Whole-Genome Sequencing Using Optical Genome Mapping as a Benchmark.基于光学基因组图谱作为基准的短读长和纳米孔全基因组测序的结构变异调用比较。

Genes (Basel). 2024 Jul 16;15(7):925. doi: 10.3390/genes15070925.

The impact of FASTQ and alignment read order on structural variant calling from long-read sequencing data.FASTQ 和比对读序对长读测序数据结构变异调用的影响。

PeerJ. 2024 Mar 15;12:e17101. doi: 10.7717/peerj.17101. eCollection 2024.

Performance of somatic structural variant calling in lung cancer using Oxford Nanopore sequencing technology.利用牛津纳米孔测序技术在肺癌中进行体细胞结构变异检测的性能。

BMC Genomics. 2024 Sep 30;25(1):898. doi: 10.1186/s12864-024-10792-3.

Benchmarking of structural variant detection in the tetraploid potato genome using linked-read sequencing.利用连锁读取测序对四倍体马铃薯基因组中的结构变异进行基准测试。

Genomics. 2023 Mar;115(2):110568. doi: 10.1016/j.ygeno.2023.110568. Epub 2023 Jan 23.

VISTA: an integrated framework for structural variant discovery.VISTA：一个用于结构变异发现的集成框架。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae462.

Comparison of structural variant callers for massive whole-genome sequence data.大规模全基因组序列数据结构变异调用器的比较。

BMC Genomics. 2024 Mar 28;25(1):318. doi: 10.1186/s12864-024-10239-9.

Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data.基于 Oxford nanopore 测序数据的结构变异检测的长读长比对软件和变异调用软件的基准测试。

Sci Rep. 2024 Mar 14;14(1):6160. doi: 10.1038/s41598-024-56604-2.

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data.纳米孔测序数据的种系结构变异检测方法评估

Front Genet. 2021 Nov 18;12:761791. doi: 10.3389/fgene.2021.761791. eCollection 2021.

引用本文的文献

Pooled, Long-read Sequencing for Structural Variant Characterization in Schistosome Populations.用于血吸虫种群结构变异特征分析的合并长读长测序

Genome Biol Evol. 2025 Jul 3;17(7). doi: 10.1093/gbe/evaf127.

The landscape of structural variation in aye-ayes ().指猴的结构变异图谱（）。（括号内容原文缺失，译文根据已有内容补全括号形式）

bioRxiv. 2024 Nov 11:2024.11.08.622672. doi: 10.1101/2024.11.08.622672.

Using bioinformatics to investigate functional diversity: a case study of MHC diversity in koalas.运用生物信息学探究功能多样性：以树袋熊 MHC 多样性为例的研究。

Immunogenetics. 2024 Dec;76(5-6):381-395. doi: 10.1007/s00251-024-01356-6. Epub 2024 Oct 5.

Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome.泛保守片段标签可识别人类泛基因组组装体间的超保守序列。

Cell Rep Methods. 2023 Aug 2;3(8):100543. doi: 10.1016/j.crmeth.2023.100543. eCollection 2023 Aug 28.

MTG-Link: leveraging barcode information from linked-reads to assemble specific loci.MTG-Link：利用来自链接读取的条形码信息来组装特定的基因座。

BMC Bioinformatics. 2023 Jul 14;24(1):284. doi: 10.1186/s12859-023-05395-w.

SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph.SVJedi-graph：使用变异图提高长读长对紧密和重叠结构变异的基因分型。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i270-i278. doi: 10.1093/bioinformatics/btad237.

The promise and challenges of characterizing genome-wide structural variants: A case study in a critically endangered parrot.全基因组结构变异特征分析的前景与挑战：以一种极度濒危鹦鹉为例的研究

Mol Ecol Resour. 2023 Mar 14. doi: 10.1111/1755-0998.13783.

Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes.本地化组装长读长序列可实现人类基因组中重复区域的单碱基分辨率的全基因组分析。

Hum Genomics. 2023 Mar 9;17(1):21. doi: 10.1186/s40246-023-00467-7.

Slaying (Yet Again) the Brain-Eating Zombie Called the "Isochore Theory": A Segmentation Algorithm Used to "Confirm" the Existence of Isochores Creates "Isochores" Where None Exist.再次消灭“同调理论”这个吃脑僵尸：一种用于“证实”同调存在的分割算法，在本不存在同调的地方创造了“同调”。

Int J Mol Sci. 2022 Jun 12;23(12):6558. doi: 10.3390/ijms23126558.

Assessment of linkage disequilibrium patterns between structural variants and single nucleotide polymorphisms in three commercial chicken populations.评估三个商业鸡群中结构变异与单核苷酸多态性之间的连锁不平衡模式。

BMC Genomics. 2022 Mar 9;23(1):193. doi: 10.1186/s12864-022-08418-7.

本文引用的文献

sv-callers: a highly portable parallel workflow for structural variant detection in whole-genome sequence data.SV 调用工具：一种用于全基因组序列数据中结构变异检测的高度可移植的并行工作流程。

PeerJ. 2020 Jan 6;8:e8214. doi: 10.7717/peerj.8214. eCollection 2020.

Structural variant calling: the long and the short of it.结构变异 calling：长与短。

Genome Biol. 2019 Nov 20;20(1):246. doi: 10.1186/s13059-019-1828-7.

Evaluation of computational genotyping of structural variation for clinical diagnoses.结构变异计算基因分型在临床诊断中的评估。

Gigascience. 2019 Sep 1;8(9). doi: 10.1093/gigascience/giz110.

Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software.全面评估和特征分析短读通用结构变异调用软件。

Nat Commun. 2019 Jul 19;10(1):3240. doi: 10.1038/s41467-019-11146-4.

Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing.全基因组测序结构变异检测算法的综合评估。

Genome Biol. 2019 Jun 3;20(1):117. doi: 10.1186/s13059-019-1720-5.

ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions.ExpansionHunter：一种基于序列图的工具，用于分析短串联重复区域的变异。

Bioinformatics. 2019 Nov 1;35(22):4754-4756. doi: 10.1093/bioinformatics/btz431.

Multi-platform discovery of haplotype-resolved structural variation in human genomes.多平台发现人类基因组中单体型分辨率结构变异。

Nat Commun. 2019 Apr 16;10(1):1784. doi: 10.1038/s41467-018-08148-z.

Discovery of tandem and interspersed segmental duplications using high-throughput sequencing.利用高通量测序发现串联和散在的片段重复。

Bioinformatics. 2019 Oct 15;35(20):3923-3930. doi: 10.1093/bioinformatics/btz237.

Best practices for benchmarking germline small-variant calls in human genomes.人类基因组中小变异calls 的基准测试最佳实践。

Nat Biotechnol. 2019 May;37(5):555-560. doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.

Going beyond SNPs: The role of structural genomic variants in adaptive evolution and species diversification.超越单核苷酸多态性：结构基因组变异在适应性进化和物种多样化中的作用。

Mol Ecol. 2019 Mar;28(6):1203-1209. doi: 10.1111/mec.15066.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

为了更好地理解基于短读长的变异检测工具对插入变异的低召回率。

Towards a better understanding of the low recall of insertion variants with short-read based variant callers.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献