关于具有单倍型信息性读段的队列的下一代测序基因分型的设计与分析。

On the design and analysis of next-generation sequencing genotyping for a cohort with haplotype-informative reads.

作者信息

Zhi Degui, Liu Nianjun, Zhang Kui

机构信息

Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, United States.

出版信息

Methods. 2015 Jun;79-80:41-6. doi: 10.1016/j.ymeth.2015.01.016. Epub 2015 Jan 30.

DOI:10.1016/j.ymeth.2015.01.016

PMID:25644447

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4437872/

Abstract

Next-generation sequencing (NGS) technologies, which can provide base-pair resolution genetic information for all types of genetic variations, are increasingly used in genetics research. However, due to the complex nature of NGS technologies and analytics and their relatively high cost, investigators face practical challenges for both design and analysis. These challenges are further complicated by recent methodological developments that make it possible to use haplotype information in sequencing reads. In light of these developments, we conducted comprehensive simulations to evaluate the effects of sequencing coverage, insert size of paired-end reads, and sample size on genotype calling and haplotype phasing in NGS studies. In contrast to previous studies that typically use idealized scenarios to tease out the effects of individual design and analytic decisions, we used a complete analytical pipeline from read mapping and variant detection to genotype calling and haplotype phasing so that we can assess the joint effects of multiple decisions and thus make more realistic recommendations to investigators. Consistent with previous studies, we found that the use of haplotype information in reads can improve the accuracy of genotype calling and haplotype phasing, and we also found that a mixture of short and long insert sizes of paired-end reads may offer even greater accuracy. However, this benefit is only clear in high coverage sequencing where variant detection is close to perfect. Finally, we observed that LD-based refinement methods do not always outperform single site based methods for genotype calling. Therefore, we should choose analytical methods that are appropriate to the sequencing coverage and sample size in order to use haplotype information in sequencing reads.

摘要

新一代测序（NGS）技术能够为所有类型的基因变异提供碱基对分辨率的遗传信息，在遗传学研究中的应用越来越广泛。然而，由于NGS技术及其分析方法的复杂性以及相对较高的成本，研究人员在设计和分析方面面临实际挑战。最近的方法学发展使得在测序读数中使用单倍型信息成为可能，这进一步加剧了这些挑战。鉴于这些发展，我们进行了全面的模拟，以评估测序覆盖度、双端读数的插入片段大小和样本量对NGS研究中基因型分型和单倍型定相的影响。与以往通常使用理想化场景来梳理单个设计和分析决策影响的研究不同，我们使用了从读段比对、变异检测到基因型分型和单倍型定相的完整分析流程，以便能够评估多个决策的联合影响，从而为研究人员提出更现实的建议。与以往研究一致，我们发现利用读数中的单倍型信息可以提高基因型分型和单倍型定相准确性，并且我们还发现双端读数中短插入片段大小和长插入片段大小混合使用可能会提供更高的准确性。然而，这种优势仅在变异检测接近完美的高覆盖度测序中才明显。最后，我们观察到基于连锁不平衡的优化方法在基因型分型方面并不总是优于基于单位点的方法。因此，为了在测序读数中使用单倍型信息，我们应该选择适合测序覆盖度和样本量的分析方法。

相似文献

On the design and analysis of next-generation sequencing genotyping for a cohort with haplotype-informative reads.

Methods. 2015 Jun;79-80:41-6. doi: 10.1016/j.ymeth.2015.01.016. Epub 2015 Jan 30.

Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads.

Bioinformatics. 2013 Oct 1;29(19):2427-34. doi: 10.1093/bioinformatics/btt418. Epub 2013 Aug 13.

Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.

Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.

Genotype calling from next-generation sequencing data using haplotype information of reads.

Bioinformatics. 2012 Apr 1;28(7):938-46. doi: 10.1093/bioinformatics/bts047. Epub 2012 Jan 27.

Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold.

Bioinformatics. 2013 Jan 1;29(1):84-91. doi: 10.1093/bioinformatics/bts632. Epub 2012 Oct 23.

A Long-Read Sequencing Approach for Direct Haplotype Phasing in Clinical Settings.

Int J Mol Sci. 2020 Dec 1;21(23):9177. doi: 10.3390/ijms21239177.

A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing.

Bioinformatics. 2013 Apr 1;29(7):878-85. doi: 10.1093/bioinformatics/btt065. Epub 2013 Feb 13.

Haplotype estimation using sequencing reads.

Am J Hum Genet. 2013 Oct 3;93(4):687-96. doi: 10.1016/j.ajhg.2013.09.002.

A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads.

Bioinformatics. 2013 Nov 15;29(22):2835-43. doi: 10.1093/bioinformatics/btt503. Epub 2013 Sep 3.

PERHAPS: Paired-End short Reads-based HAPlotyping from next-generation Sequencing data.

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa320.

本文引用的文献

Sequencing depth and coverage: key considerations in genomic analyses.

Nat Rev Genet. 2014 Feb;15(2):121-32. doi: 10.1038/nrg3642.

Assessing the effect of sequencing depth and sample size in population genetics inferences.

PLoS One. 2013 Nov 18;8(11):e79667. doi: 10.1371/journal.pone.0079667. eCollection 2013.

Haplotype estimation using sequencing reads.

Am J Hum Genet. 2013 Oct 3;93(4):687-96. doi: 10.1016/j.ajhg.2013.09.002.

Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads.

Bioinformatics. 2013 Oct 1;29(19):2427-34. doi: 10.1093/bioinformatics/btt418. Epub 2013 Aug 13.

Rare variant detection using family-based sequencing analysis.

Proc Natl Acad Sci U S A. 2013 Mar 5;110(10):3985-90. doi: 10.1073/pnas.1222158110. Epub 2013 Feb 20.

Population genomics based on low coverage sequencing: how low should we go?

Mol Ecol. 2013 Jun;22(11):3028-35. doi: 10.1111/mec.12105. Epub 2012 Nov 22.

An integrated map of genetic variation from 1,092 human genomes.

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold.

Bioinformatics. 2013 Jan 1;29(1):84-91. doi: 10.1093/bioinformatics/bts632. Epub 2012 Oct 23.

Analysis and optimal design for association studies using next-generation sequencing with case-control pools.

Genet Epidemiol. 2012 Dec;36(8):870-81. doi: 10.1002/gepi.21681. Epub 2012 Sep 12.

A high-coverage genome sequence from an archaic Denisovan individual.

Science. 2012 Oct 12;338(6104):222-6. doi: 10.1126/science.1224344. Epub 2012 Aug 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

关于具有单倍型信息性读段的队列的下一代测序基因分型的设计与分析。

On the design and analysis of next-generation sequencing genotyping for a cohort with haplotype-informative reads.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献