使用下一代测序reads 和单倍型支架进行基因型调用和相位分析。

Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold.

机构信息

Department of Statistics, University of Oxford, Oxford OX1 3TG, UK.

出版信息

Bioinformatics. 2013 Jan 1;29(1):84-91. doi: 10.1093/bioinformatics/bts632. Epub 2012 Oct 23.

DOI:10.1093/bioinformatics/bts632

PMID:23093610

Abstract

MOTIVATION

Given the current costs of next-generation sequencing, large studies carry out low-coverage sequencing followed by application of methods that leverage linkage disequilibrium to infer genotypes. We propose a novel method that assumes study samples are sequenced at low coverage and genotyped on a genome-wide microarray, as in the 1000 Genomes Project (1KGP). We assume polymorphic sites have been detected from the sequencing data and that genotype likelihoods are available at these sites. We also assume that the microarray genotypes have been phased to construct a haplotype scaffold. We then phase each polymorphic site using an MCMC algorithm that iteratively updates the unobserved alleles based on the genotype likelihoods at that site and local haplotype information. We use a multivariate normal model to capture both allele frequency and linkage disequilibrium information around each site. When sequencing data are available from trios, Mendelian transmission constraints are easily accommodated into the updates. The method is highly parallelizable, as it analyses one position at a time.

RESULTS

We illustrate the performance of the method compared with other methods using data from Phase 1 of the 1KGP in terms of genotype accuracy, phasing accuracy and downstream imputation performance. We show that the haplotype panel we infer in African samples, which was based on a trio-phased scaffold, increases downstream imputation accuracy for rare variants (R2 increases by >0.05 for minor allele frequency <1%), and this will translate into a boost in power to detect associations. These results highlight the value of incorporating microarray genotypes when calling variants from next-generation sequence data.

AVAILABILITY

The method (called MVNcall) is implemented in a C++ program and is available from http://www.stats.ox.ac.uk/∼marchini/#software.

摘要

动机

鉴于下一代测序的当前成本，大型研究进行低覆盖率测序，然后应用利用连锁不平衡推断基因型的方法。我们提出了一种新方法，假设研究样本以低覆盖率进行测序，并在全基因组微阵列上进行基因分型，如在 1000 基因组计划（1KGP）中。我们假设从测序数据中检测到多态性位点，并且在这些位点处存在基因型可能性。我们还假设微阵列基因型已经被相位化以构建单倍型支架。然后，我们使用 MCMC 算法对每个多态性位点进行相位化，该算法基于该位点和局部单倍型信息迭代更新未观察到的等位基因的基因型可能性。我们使用多元正态模型来捕获每个位点周围的等位基因频率和连锁不平衡信息。当来自三亲的测序数据可用时，孟德尔传递约束很容易适应更新。该方法高度并行化，因为它一次分析一个位置。

结果

我们使用 Phase 1 的 1KGP 中的数据来说明该方法与其他方法相比的性能，根据基因型准确性、相位准确性和下游插补性能。我们表明，我们在非洲样本中推断的单倍型面板，基于三亲相位化支架，增加了罕见变体的下游插补准确性（对于次要等位基因频率<1%的变体，R2 增加了>0.05），这将转化为检测关联的能力提高。这些结果强调了在调用来自下一代序列数据的变体时纳入微阵列基因型的价值。

可用性

该方法（称为 MVNcall）是用 C++程序实现的，可从 http://www.stats.ox.ac.uk/∼marchini/#software 获得。

相似文献

Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold.

Bioinformatics. 2013 Jan 1;29(1):84-91. doi: 10.1093/bioinformatics/bts632. Epub 2012 Oct 23.

Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.

Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.

Genotype calling and haplotyping in parent-offspring trios.

Genome Res. 2013 Jan;23(1):142-51. doi: 10.1101/gr.142455.112. Epub 2012 Oct 11.

A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels.

Genet Sel Evol. 2017 May 16;49(1):46. doi: 10.1186/s12711-017-0321-6.

Phasing for medical sequencing using rare variants and large haplotype reference panels.

Bioinformatics. 2016 Jul 1;32(13):1974-80. doi: 10.1093/bioinformatics/btw065. Epub 2016 Feb 27.

A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing.

Bioinformatics. 2013 Apr 1;29(7):878-85. doi: 10.1093/bioinformatics/btt065. Epub 2013 Feb 13.

Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads.

Bioinformatics. 2013 Oct 1;29(19):2427-34. doi: 10.1093/bioinformatics/btt418. Epub 2013 Aug 13.

trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios.

BMC Bioinformatics. 2021 Nov 22;22(1):559. doi: 10.1186/s12859-021-04470-4.

Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.

BMC Bioinformatics. 2015 Jul 16;16:223. doi: 10.1186/s12859-015-0651-8.

Genotype calling from next-generation sequencing data using haplotype information of reads.

Bioinformatics. 2012 Apr 1;28(7):938-46. doi: 10.1093/bioinformatics/bts047. Epub 2012 Jan 27.

引用本文的文献

Assessment of a microhaplotype panel for human identification and ancestry inference in Brazil.

Int J Legal Med. 2025 Aug 22. doi: 10.1007/s00414-025-03573-4.

cLD: Rare-variant linkage disequilibrium between genomic regions identifies novel genomic interactions.

PLoS Genet. 2023 Dec 18;19(12):e1011074. doi: 10.1371/journal.pgen.1011074. eCollection 2023 Dec.

The contributions of mitochondrial and nuclear mitochondrial genetic variation to neuroticism.

Nat Commun. 2023 May 30;14(1):3146. doi: 10.1038/s41467-023-38480-y.

Long-read sequencing for molecular diagnostics in constitutional genetic disorders.

Hum Mutat. 2022 Nov;43(11):1531-1544. doi: 10.1002/humu.24465. Epub 2022 Sep 18.

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.

Nat Genet. 2022 Apr;54(4):518-525. doi: 10.1038/s41588-022-01043-w. Epub 2022 Apr 11.

GCAT|Panel, a comprehensive structural variant haplotype map of the Iberian population from high-coverage whole-genome sequencing.

Nucleic Acids Res. 2022 Mar 21;50(5):2464-2479. doi: 10.1093/nar/gkac076.

A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics.

Mol Ecol. 2021 Dec;30(23):6021-6035. doi: 10.1111/mec.16240. Epub 2021 Oct 31.

The genetic architecture of target-site resistance to pyrethroid insecticides in the African malaria vectors Anopheles gambiae and Anopheles coluzzii.

Mol Ecol. 2021 Nov;30(21):5303-5317. doi: 10.1111/mec.15845. Epub 2021 Mar 8.

An integrated Asian human SNV and indel benchmark established using multiple sequencing methods.

Sci Rep. 2020 Jun 17;10(1):9821. doi: 10.1038/s41598-020-66605-6.

Determining the impact of uncharacterized inversions in the human genome by droplet digital PCR.

Genome Res. 2020 May;30(5):724-735. doi: 10.1101/gr.255273.119. Epub 2020 May 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用下一代测序reads 和单倍型支架进行基因型调用和相位分析。

Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献