• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用拆分读识别基因组插入/缺失和结构变异。

Identification of genomic indels and structural variations using split reads.

机构信息

Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA.

出版信息

BMC Genomics. 2011 Jul 25;12:375. doi: 10.1186/1471-2164-12-375.

DOI:10.1186/1471-2164-12-375
PMID:21787423
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3161018/
Abstract

BACKGROUND

Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection.

RESULTS

We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs.

CONCLUSIONS

Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful.

摘要

背景

最近的研究表明,插入、缺失和其他更复杂的结构变异(SVs)在人类群体中具有遗传意义。随着下一代测序技术的发展,在全基因组水平上进行 SV 的高通量调查成为可能。在这里,我们提出了基于序列的 SV 检测方法——分割读取识别,校准(SRiC)。

结果

我们首先以标准方式使用缺口比对将每个读取映射到参考基因组上。然后,为了识别 SV,我们使用一种评估策略对许多初始映射中的每一个进行评分,该策略旨在考虑测序和比对错误(例如,对在读取中心缺口的事件进行更高的评分)。所有当前的 SV 调用方法由于实验和计算上的限制,在其识别中都存在多层次的偏差(例如,比插入更频繁地调用缺失)。我们方法的一个关键方面是,我们针对从高通量测序模拟(具有现实错误模型)生成的合成数据集校准所有的调用。这使我们能够在不同的参数值情况下和不同类别的事件(例如,长缺失与短插入)下计算灵敏度和阳性预测值。我们在来自 1000 基因组计划的代表性数据上运行我们的计算。将染色体 1 上观察到的事件数量与从模拟中获得的校准值(针对不同长度的事件)相结合,使我们能够构建出一个在广泛的长度范围内对人类基因组中 SV 总数的相对无偏估计。我们特别估计,一个个体基因组包含约 670,000 个插入缺失/SVs。

结论

与现有的用于 SV 识别的读深度和读对方法相比,我们的方法可以精确定位 SV 事件的精确断点,揭示插入的实际序列内容,并覆盖缺失的整个大小谱。此外,随着产生更长读取的第三代测序技术的出现,我们预计我们的方法将更加有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/b5fa1b28c164/1471-2164-12-375-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/9a80c5b0f504/1471-2164-12-375-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/ec7e38c9ad12/1471-2164-12-375-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/26e0fe79ea29/1471-2164-12-375-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/b2cd7bce3768/1471-2164-12-375-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/c606ace9687f/1471-2164-12-375-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/ea646f50a8f3/1471-2164-12-375-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/b5fa1b28c164/1471-2164-12-375-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/9a80c5b0f504/1471-2164-12-375-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/ec7e38c9ad12/1471-2164-12-375-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/26e0fe79ea29/1471-2164-12-375-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/b2cd7bce3768/1471-2164-12-375-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/c606ace9687f/1471-2164-12-375-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/ea646f50a8f3/1471-2164-12-375-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58cd/3161018/b5fa1b28c164/1471-2164-12-375-7.jpg

相似文献

1
Identification of genomic indels and structural variations using split reads.利用拆分读识别基因组插入/缺失和结构变异。
BMC Genomics. 2011 Jul 25;12:375. doi: 10.1186/1471-2164-12-375.
2
Whole-genome sequencing with long reads reveals complex structure and origin of structural variation in human genetic variations and somatic mutations in cancer.全基因组测序与长读长揭示了人类遗传变异和癌症体细胞突变中结构变异的复杂结构和起源。
Genome Med. 2021 Apr 29;13(1):65. doi: 10.1186/s13073-021-00883-1.
3
SvABA: genome-wide detection of structural variants and indels by local assembly.SvABA:通过局部组装进行全基因组结构变异和插入缺失的检测。
Genome Res. 2018 Apr;28(4):581-591. doi: 10.1101/gr.221028.117. Epub 2018 Mar 13.
4
A Comparison of Structural Variant Calling from Short-Read and Nanopore-Based Whole-Genome Sequencing Using Optical Genome Mapping as a Benchmark.基于光学基因组图谱作为基准的短读长和纳米孔全基因组测序的结构变异调用比较。
Genes (Basel). 2024 Jul 16;15(7):925. doi: 10.3390/genes15070925.
5
The challenge of detecting indels in bacterial genomes from short-read sequencing data.从短读长测序数据中检测细菌基因组插入缺失的挑战。
J Biotechnol. 2017 May 20;250:11-15. doi: 10.1016/j.jbiotec.2017.02.026. Epub 2017 Mar 4.
6
Automated filtering of genome-wide large deletions through an ensemble deep learning framework.通过集成深度学习框架自动筛选全基因组大片段缺失。
Methods. 2022 Oct;206:77-86. doi: 10.1016/j.ymeth.2022.08.001. Epub 2022 Aug 28.
7
PRISM: pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants.PRISM:基于双读信息的分读比对算法,用于检测插入、缺失和结构变异的碱基对水平。
Bioinformatics. 2012 Oct 15;28(20):2576-83. doi: 10.1093/bioinformatics/bts484. Epub 2012 Jul 31.
8
iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data.iSVP:一种基于高通量测序数据的整合结构变异检测流程
BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S8. doi: 10.1186/1752-0509-7-S6-S8. Epub 2013 Dec 13.
9
An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data.一种利用低覆盖度测序数据进行精确高效结构变异检测的改进方法。
BMC Bioinformatics. 2012 Apr 19;13 Suppl 6(Suppl 6):S6. doi: 10.1186/1471-2105-13-S6-S6.
10
ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly.ScanIndel:一种通过间隙比对、分割读段和从头组装进行插入缺失检测的混合框架。
Genome Med. 2015 Dec 7;7:127. doi: 10.1186/s13073-015-0251-2.

引用本文的文献

1
Comparative study of tools for copy number variation detection using next-generation sequencing data.使用下一代测序数据进行拷贝数变异检测工具的比较研究
Sci Rep. 2025 Jul 1;15(1):22145. doi: 10.1038/s41598-025-06527-3.
2
Loss of heterozygosity in CCM2 cDNA revealing a structural variant causing multiple cerebral cavernous malformations.CCM2 cDNA 杂合性丢失揭示导致多发性脑 cavernous 畸形的结构变异。
Eur J Hum Genet. 2024 Jul;32(7):876-878. doi: 10.1038/s41431-024-01626-7. Epub 2024 May 16.
3
Optimizing Insertion and Deletion Detection Using Next-Generation Sequencing in the Clinical Laboratory.

本文引用的文献

1
CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.CNVnator:一种从家族和人群基因组测序中发现、基因分型和表征典型和非典型 CNV 的方法。
Genome Res. 2011 Jun;21(6):974-84. doi: 10.1101/gr.114876.110. Epub 2011 Feb 7.
2
Mapping copy number variation by population-scale genome sequencing.通过群体规模的基因组测序来绘制拷贝数变异图谱。
Nature. 2011 Feb 3;470(7332):59-65. doi: 10.1038/nature09708.
3
AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision.
利用下一代测序技术在临床实验室中优化插入和缺失检测。
J Mol Diagn. 2022 Dec;24(12):1217-1231. doi: 10.1016/j.jmoldx.2022.08.006. Epub 2022 Sep 24.
4
Identification of Copy Number Alterations from Next-Generation Sequencing Data.从下一代测序数据中鉴定拷贝数改变。
Adv Exp Med Biol. 2022;1361:55-74. doi: 10.1007/978-3-030-91836-1_4.
5
Application and Challenge of 3rd Generation Sequencing for Clinical Bacterial Studies.三代测序技术在临床细菌研究中的应用与挑战
Int J Mol Sci. 2022 Jan 26;23(3):1395. doi: 10.3390/ijms23031395.
6
CNV-MEANN: A Neural Network and Mind Evolutionary Algorithm-Based Detection of Copy Number Variations From Next-Generation Sequencing Data.CNV-MEANN:一种基于神经网络和思维进化算法的从下一代测序数据中检测拷贝数变异的方法
Front Genet. 2021 Aug 16;12:700874. doi: 10.3389/fgene.2021.700874. eCollection 2021.
7
Single-cell sequencing of the small and AT-skewed genome of malaria parasites.疟原虫小和 AT 偏斜基因组的单细胞测序。
Genome Med. 2021 May 4;13(1):75. doi: 10.1186/s13073-021-00889-9.
8
Phenotype Driven Analysis of Whole Genome Sequencing Identifies Deep Intronic Variants that Cause Retinal Dystrophies by Aberrant Exonization.基于表型驱动的全基因组测序分析鉴定出通过异常外显子化导致视网膜营养不良的深度内含子变异。
Invest Ophthalmol Vis Sci. 2020 Aug 3;61(10):36. doi: 10.1167/iovs.61.10.36.
9
CONY: A Bayesian procedure for detecting copy number variations from sequencing read depths.CONY:一种基于测序深度的拷贝数变异检测的贝叶斯方法。
Sci Rep. 2020 Jun 26;10(1):10493. doi: 10.1038/s41598-020-64353-1.
10
Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software.全面评估和特征分析短读通用结构变异调用软件。
Nat Commun. 2019 Jul 19;10(1):3240. doi: 10.1038/s41467-019-11146-4.
年龄:通过最优的缺口切除比对,在单核苷酸分辨率下定义基因组结构变异的断点。
Bioinformatics. 2011 Mar 1;27(5):595-603. doi: 10.1093/bioinformatics/btq713. Epub 2011 Jan 13.
4
Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model.基于逐步贝叶斯模型,利用阵列强度和测序读取深度检测拷贝数变异。
BMC Bioinformatics. 2010 Oct 31;11:539. doi: 10.1186/1471-2105-11-539.
5
Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。
Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.
6
Computational methods for discovering structural variation with next-generation sequencing.利用下一代测序技术发现结构变异的计算方法
Nat Methods. 2009 Nov;6(11 Suppl):S13-20. doi: 10.1038/nmeth.1374.
7
BreakDancer: an algorithm for high-resolution mapping of genomic structural variation.BreakDancer:一种用于基因组结构变异高分辨率图谱绘制的算法。
Nat Methods. 2009 Sep;6(9):677-81. doi: 10.1038/nmeth.1363. Epub 2009 Aug 9.
8
Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads.Pindel:一种基于模式增长的方法,可从配对末端短读取中检测到大的缺失和中等大小的插入的断点。
Bioinformatics. 2009 Nov 1;25(21):2865-71. doi: 10.1093/bioinformatics/btp394. Epub 2009 Jun 26.
9
MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions.MoDIL:通过分布混合从克隆末端测序中检测小插入缺失
Nat Methods. 2009 Jul;6(7):473-4. doi: 10.1038/nmeth.f.256. Epub 2009 May 31.
10
Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes.用于高通量测序基因组中结构变异检测的组合算法
Genome Res. 2009 Jul;19(7):1270-8. doi: 10.1101/gr.088633.108. Epub 2009 May 15.