使用配对缺口读段进行人类基因组重测序的计算技术。

Computational techniques for human genome resequencing using mated gapped reads.

作者信息

Carnevali Paolo, Baccash Jonathan, Halpern Aaron L, Nazarenko Igor, Nilsen Geoffrey B, Pant Krishna P, Ebert Jessica C, Brownley Anushka, Morenzoni Matt, Karpinchyk Vitali, Martin Bruce, Ballinger Dennis G, Drmanac Radoje

机构信息

Complete Genomics Inc., Mountain View, California 94043, USA.

出版信息

J Comput Biol. 2012 Mar;19(3):279-92. doi: 10.1089/cmb.2011.0201. Epub 2011 Dec 16.

DOI:10.1089/cmb.2011.0201

PMID:22175250

Abstract

Unchained base reads on self-assembling DNA nanoarrays have recently emerged as a promising approach to low-cost, high-quality resequencing of human genomes. Because of unique characteristics of these mated pair reads, existing computational methods for resequencing assembly, such as those based on map-consensus calling, are not adequate for accurate variant calling. We describe novel computational methods developed for accurate calling of SNPs and short substitutions and indels (<100 bp); the same methods apply to evaluation of hypothesized larger, structural variations. We use an optimization process that iteratively adjusts the genome sequence to maximize its a posteriori probability given the observed reads. For each candidate sequence, this probability is computed using Bayesian statistics with a simple read generation model and simplifying assumptions that make the problem computationally tractable. The optimization process iteratively applies one-base substitutions, insertions, and deletions until convergence is achieved to an optimum diploid sequence. A local de novo assembly procedure that generalizes approaches based on De Bruijn graphs is used to seed the optimization process in order to reduce the chance of converging to local optima. Finally, a correlation-based filter is applied to reduce the false positive rate caused by the presence of repetitive regions in the reference genome.

摘要

基于自组装DNA纳米阵列的非连锁碱基读数最近已成为一种有前景的方法，用于人类基因组的低成本、高质量重测序。由于这些配对读数的独特特性，现有的重测序组装计算方法，如基于图谱一致调用的方法，不足以进行准确的变异调用。我们描述了为准确调用单核苷酸多态性（SNP）、短替换和插入缺失（<100bp）而开发的新型计算方法；同样的方法适用于评估假设的更大的结构变异。我们使用一种优化过程，该过程迭代调整基因组序列，以在给定观察到的读数的情况下最大化其后验概率。对于每个候选序列，使用贝叶斯统计和简单的读数生成模型以及简化假设来计算该概率，这些假设使问题在计算上易于处理。优化过程迭代应用单碱基替换、插入和缺失，直到收敛到最优二倍体序列。一种基于De Bruijn图的局部从头组装程序被用于为优化过程提供种子，以减少收敛到局部最优的机会。最后，应用基于相关性的过滤器来降低由参考基因组中重复区域的存在导致的假阳性率。

相似文献

Computational techniques for human genome resequencing using mated gapped reads.

J Comput Biol. 2012 Mar;19(3):279-92. doi: 10.1089/cmb.2011.0201. Epub 2011 Dec 16.

Positional bias in variant calls against draft reference assemblies.

BMC Genomics. 2017 Mar 28;18(1):263. doi: 10.1186/s12864-017-3637-2.

Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers.

J Comput Biol. 2011 Nov;18(11):1625-34. doi: 10.1089/cmb.2011.0151. Epub 2011 Oct 14.

Coverage-based consensus calling (CbCC) of short sequence reads and comparison of CbCC results to identify SNPs in chickpea (Cicer arietinum; Fabaceae), a crop species without a reference genome.

Am J Bot. 2012 Feb;99(2):186-92. doi: 10.3732/ajb.1100419. Epub 2012 Feb 1.

HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies.

Genome Res. 2017 May;27(5):793-800. doi: 10.1101/gr.214767.116. Epub 2017 Jan 19.

SLIQ: simple linear inequalities for efficient contig scaffolding.

J Comput Biol. 2012 Oct;19(10):1162-75. doi: 10.1089/cmb.2011.0263.

ComB: SNP calling and mapping analysis for color and nucleotide space platforms.

J Comput Biol. 2011 Jun;18(6):795-807. doi: 10.1089/cmb.2011.0027. Epub 2011 May 12.

Direct comparison of performance of single nucleotide variant calling in human genome with alignment-based and assembly-based approaches.

Sci Rep. 2017 Sep 8;7(1):10963. doi: 10.1038/s41598-017-10826-9.

Resolving Conflicting Predictions from Multimapping Reads.

J Comput Biol. 2016 Mar;23(3):203-17. doi: 10.1089/cmb.2015.0164. Epub 2016 Jan 8.

Consensus generation and variant detection by Celera Assembler.

Bioinformatics. 2008 Apr 15;24(8):1035-40. doi: 10.1093/bioinformatics/btn074. Epub 2008 Mar 4.

引用本文的文献

Enhancer RNAs contribute to genome reprogramming driven by a GATA3 noncoding variant in leukaemia.

Sci Rep. 2025 Aug 9;15(1):29153. doi: 10.1038/s41598-025-10262-0.

Analysis of 1386 epileptogenic brain lesions reveals association with DYRK1A and EGFR.

Nat Commun. 2024 Nov 30;15(1):10429. doi: 10.1038/s41467-024-54911-w.

Diverse tumorigenic consequences of human papillomavirus integration in primary oropharyngeal cancers.

Genome Res. 2022 Jan;32(1):55-70. doi: 10.1101/gr.275911.121. Epub 2021 Dec 13.

Whole-Genome Sequencing of Common Salivary Gland Carcinomas: Subtype-Restricted and Shared Genetic Alterations.

Clin Cancer Res. 2021 Jul 15;27(14):3960-3969. doi: 10.1158/1078-0432.CCR-20-4071. Epub 2021 May 19.

Genomic and transcriptomic landscape of conjunctival melanoma.

PLoS Genet. 2020 Dec 31;16(12):e1009201. doi: 10.1371/journal.pgen.1009201. eCollection 2020 Dec.

Genome-wide mutational signatures revealed distinct developmental paths for human B cell lymphomas.

J Exp Med. 2021 Feb 1;218(2). doi: 10.1084/jem.20200573.

A novel nicastrin mutation in a three-generation Dutch family with hidradenitis suppurativa: a search for functional significance.

J Eur Acad Dermatol Venereol. 2020 Oct;34(10):2353-2361. doi: 10.1111/jdv.16310. Epub 2020 Mar 12.

Genome Sequencing Explores Complexity of Chromosomal Abnormalities in Recurrent Miscarriage.

Am J Hum Genet. 2019 Dec 5;105(6):1102-1111. doi: 10.1016/j.ajhg.2019.10.003. Epub 2019 Oct 31.

Development of coupling controlled polymerizations by adapter-ligation in mate-pair sequencing for detection of various genomic variants in one single assay.

DNA Res. 2019 Aug 1;26(4):313-325. doi: 10.1093/dnares/dsz011.

The landscape of somatic mutation in sporadic Chinese colorectal cancer.

Oncotarget. 2018 Jun 8;9(44):27412-27422. doi: 10.18632/oncotarget.25287.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用配对缺口读段进行人类基因组重测序的计算技术。

Computational techniques for human genome resequencing using mated gapped reads.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献