• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过DNA序列比对识别插入缺失型结构变异的统计学方面

Statistical aspects of discerning indel-type structural variation via DNA sequence alignment.

作者信息

Wendl Michael C, Wilson Richard K

机构信息

The Genome Center and Department of Genetics, Washington University, St Louis, MO 63108, USA.

出版信息

BMC Genomics. 2009 Aug 5;10:359. doi: 10.1186/1471-2164-10-359.

DOI:10.1186/1471-2164-10-359
PMID:19656394
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2748092/
Abstract

BACKGROUND

Structural variations in the form of DNA insertions and deletions are an important aspect of human genetics and especially relevant to medical disorders. Investigations have shown that such events can be detected via tell-tale discrepancies in the aligned lengths of paired-end DNA sequencing reads. Quantitative aspects underlying this method remain poorly understood, despite its importance and conceptual simplicity. We report the statistical theory characterizing the length-discrepancy scheme for Gaussian libraries, including coverage-related effects that preceding models are unable to account for.

RESULTS

Deletion and insertion statistics both depend heavily on physical coverage, but otherwise differ dramatically, refuting a commonly held doctrine of symmetry. Specifically, coverage restrictions render insertions much more difficult to capture. Increased read length has the counterintuitive effect of worsening insertion detection characteristics of short inserts. Variance in library insert length is also a critical factor here and should be minimized to the greatest degree possible. Conversely, no significant improvement would be realized in lowering fosmid variances beyond current levels. Detection power is examined under a straightforward alternative hypothesis and found to be generally acceptable. We also consider the proposition of characterizing variation over the entire spectrum of variant sizes under constant risk of false-positive errors. At 1% risk, many designs will leave a significant gap in the 100 to 200 bp neighborhood, requiring unacceptably high redundancies to compensate. We show that a few modifications largely close this gap and we give a few examples of feasible spectrum-covering designs.

CONCLUSION

The theory resolves several outstanding issues and furnishes a general methodology for designing future projects from the standpoint of a spectrum-wide constant risk.

摘要

背景

DNA插入和缺失形式的结构变异是人类遗传学的一个重要方面,与医学疾病尤其相关。研究表明,此类事件可通过双末端DNA测序读数比对长度中的明显差异来检测。尽管该方法具有重要性且概念简单,但其背后的定量方面仍知之甚少。我们报告了表征高斯文库长度差异方案的统计理论,包括先前模型无法解释的与覆盖度相关的效应。

结果

缺失和插入统计都严重依赖于物理覆盖度,但在其他方面差异巨大,这反驳了一种普遍持有的对称学说。具体而言,覆盖度限制使得插入更难捕获。增加读长对短插入片段的插入检测特征具有适得其反的影响。文库插入片段长度的方差也是一个关键因素,应尽可能将其最小化。相反,将fosmid方差降低到当前水平以下不会实现显著改善。在一个直接的备择假设下检验了检测能力,发现其总体上是可接受的。我们还考虑了在恒定假阳性错误风险下表征整个变异大小谱上变异的提议。在1%的风险水平下,许多设计在100至200 bp范围内会留下显著差距,需要高得不可接受的冗余度来弥补。我们表明,一些修改在很大程度上缩小了这个差距,并给出了一些可行的覆盖谱设计示例。

结论

该理论解决了几个突出问题,并从全谱恒定风险的角度提供了一种用于设计未来项目的通用方法。

相似文献

1
Statistical aspects of discerning indel-type structural variation via DNA sequence alignment.通过DNA序列比对识别插入缺失型结构变异的统计学方面
BMC Genomics. 2009 Aug 5;10:359. doi: 10.1186/1471-2164-10-359.
2
General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?通过插入/缺失进行序列进化的一般连续时间马尔可夫模型:比对概率是否可分解?
BMC Bioinformatics. 2016 Aug 11;17:304. doi: 10.1186/s12859-016-1105-7.
3
ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly.ScanIndel:一种通过间隙比对、分割读段和从头组装进行插入缺失检测的混合框架。
Genome Med. 2015 Dec 7;7:127. doi: 10.1186/s13073-015-0251-2.
4
Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS.使用 SplazerS 在单端和配对末端测序数据中检测具有精确断点的基因组插入缺失变体。
Bioinformatics. 2012 Mar 1;28(5):619-27. doi: 10.1093/bioinformatics/bts019. Epub 2012 Jan 11.
5
A general coverage theory for shotgun DNA sequencing.一种用于鸟枪法DNA测序的通用覆盖理论。
J Comput Biol. 2006 Jul-Aug;13(6):1177-96. doi: 10.1089/cmb.2006.13.1177.
6
Robust and exact structural variation detection with paired-end and soft-clipped alignments: SoftSV compared with eight algorithms.利用双端和软剪切比对进行稳健且精确的结构变异检测:SoftSV与八种算法的比较
Brief Bioinform. 2016 Jan;17(1):51-62. doi: 10.1093/bib/bbv028. Epub 2015 May 20.
7
Amplicon Indel Hunter Is a Novel Bioinformatics Tool to Detect Large Somatic Insertion/Deletion Mutations in Amplicon-Based Next-Generation Sequencing Data.扩增子插入缺失猎手是一种用于检测基于扩增子的下一代测序数据中大型体细胞插入/缺失突变的新型生物信息学工具。
J Mol Diagn. 2015 Nov;17(6):635-43. doi: 10.1016/j.jmoldx.2015.06.005. Epub 2015 Aug 28.
8
Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance.设计深度测序实验:检测结构变异和估计转录本丰度。
BMC Genomics. 2010 Jun 18;11:385. doi: 10.1186/1471-2164-11-385.
9
mInDel: a high-throughput and efficient pipeline for genome-wide InDel marker development.mInDel:一种用于全基因组插入缺失标记开发的高通量高效流程
BMC Genomics. 2016 Apr 14;17:290. doi: 10.1186/s12864-016-2614-5.
10
Enhanced whole exome sequencing by higher DNA insert lengths.通过增加DNA插入长度增强全外显子组测序
BMC Genomics. 2016 May 25;17:399. doi: 10.1186/s12864-016-2698-y.

引用本文的文献

1
A Primer on Infectious Disease Bacterial Genomics.传染病细菌基因组学入门
Clin Microbiol Rev. 2016 Oct;29(4):881-913. doi: 10.1128/CMR.00001-16. Epub 2016 Sep 7.
2
A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing.一种使用双末端测序进行二倍体基因组重建的遗传算法。
PLoS One. 2016 Nov 18;11(11):e0166721. doi: 10.1371/journal.pone.0166721. eCollection 2016.
3
Expanding the computational toolbox for mining cancer genomes.拓展癌症基因组挖掘的计算工具包。

本文引用的文献

1
A large genome center's improvements to the Illumina sequencing system.一个大型基因组中心对Illumina测序系统的改进。
Nat Methods. 2008 Dec;5(12):1005-10. doi: 10.1038/nmeth.1270.
2
DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome.细胞遗传学正常的急性髓系白血病基因组的DNA测序
Nature. 2008 Nov 6;456(7218):66-72. doi: 10.1038/nature07485.
3
Accurate whole human genome sequencing using reversible terminator chemistry.使用可逆终止子化学法进行准确的全人类基因组测序。
Nat Rev Genet. 2014 Aug;15(8):556-70. doi: 10.1038/nrg3767. Epub 2014 Jul 8.
4
Advances for studying clonal evolution in cancer.癌症克隆进化研究进展。
Cancer Lett. 2013 Nov 1;340(2):212-9. doi: 10.1016/j.canlet.2012.12.028. Epub 2013 Jan 23.
5
Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens' theorem.基于史蒂文斯定理推广的宏基因组DNA测序覆盖理论。
J Math Biol. 2013 Nov;67(5):1141-61. doi: 10.1007/s00285-012-0586-x. Epub 2012 Sep 11.
6
Consistency-based detection of potential tumor-specific deletions in matched normal/tumor genomes.基于一致性的方法检测匹配正常/肿瘤基因组中的潜在肿瘤特异性缺失。
BMC Bioinformatics. 2011 Oct 5;12 Suppl 9(Suppl 9):S21. doi: 10.1186/1471-2105-12-S9-S21.
7
Efficient study design for next generation sequencing.下一代测序的高效研究设计
Genet Epidemiol. 2011 May;35(4):269-77. doi: 10.1002/gepi.20575.
8
Challenges of sequencing human genomes.人类基因组测序的挑战。
Brief Bioinform. 2010 Sep;11(5):484-98. doi: 10.1093/bib/bbq016. Epub 2010 Jun 2.
Nature. 2008 Nov 6;456(7218):53-9. doi: 10.1038/nature07517.
4
Comprehensive genomic characterization defines human glioblastoma genes and core pathways.全面的基因组特征分析确定了人类胶质母细胞瘤的基因和核心通路。
Nature. 2008 Oct 23;455(7216):1061-8. doi: 10.1038/nature07385. Epub 2008 Sep 4.
5
Mapping short DNA sequencing reads and calling variants using mapping quality scores.使用比对质量分数比对短DNA测序读数并识别变异。
Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.
6
A robust framework for detecting structural variations in a genome.一种用于检测基因组结构变异的强大框架。
Bioinformatics. 2008 Jul 1;24(13):i59-67. doi: 10.1093/bioinformatics/btn176.
7
Mapping and sequencing of structural variation from eight human genomes.来自八个人类基因组的结构变异的图谱绘制与测序
Nature. 2008 May 1;453(7191):56-64. doi: 10.1038/nature06862.
8
Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing.使用全基因组大规模平行双末端测序鉴定癌症中的体细胞获得性重排。
Nat Genet. 2008 Jun;40(6):722-9. doi: 10.1038/ng.128. Epub 2008 Apr 27.
9
Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer.评估用于检测癌症基因组重排的双末端测序策略。
PLoS Comput Biol. 2008 Apr 25;4(4):e1000051. doi: 10.1371/journal.pcbi.1000051.
10
A sequence-based survey of the complex structural organization of tumor genomes.一项基于序列的肿瘤基因组复杂结构组织调查。
Genome Biol. 2008;9(3):R59. doi: 10.1186/gb-2008-9-3-r59. Epub 2008 Mar 25.