从具有不确定性的高通量数据推断拷贝数变异的单倍型。

Inferring haplotypes of copy number variations from high-throughput data with uncertainty.

出版信息

G3 (Bethesda). 2011 Jun;1(1):35-42. doi: 10.1534/g3.111.000174. Epub 2011 Jun 1.

DOI:10.1534/g3.111.000174

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3276117/

Abstract

Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals' diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1-2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12-18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs.

摘要

准确的单倍型和二倍型（单倍型对）信息对于群体遗传学分析是必需的；然而，微阵列不能提供拷贝数变异（CNV）基因座的单倍型或二倍型的数据；它们只能提供二倍型或无相位序列基因型（例如，AAB，不同于单核苷酸多态性的 AB）的总拷贝数的数据。此外，由于噪声的影响，当源自不同拷贝数或基因型的微阵列信号强度不能明显分离时，这些拷贝数或基因型通常会被错误地确定。在这里，我们报告了一种算法，该算法利用信号强度可能源自不同潜在拷贝数或基因型的概率，从嘈杂的微阵列数据中推断出多个基因座的 CNV 单倍型和个体的二倍型。基于已知的二倍型和从真实微阵列数据中获得的误差模型进行模拟研究，我们证明了这种概率方法能够成功地从嘈杂的数据中进行准确的推断（错误率：1-2%），而以前的确定性方法则失败了（错误率：12-18%）。将该算法应用于真实的微阵列数据，我们估计了 100 个人的 1486 个 CNV 区域中的单倍型频率和二倍型。我们的算法将有助于准确的群体遗传学分析和强大的 CNV 疾病关联研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3719/3276117/cbe8bf023b38/35f1.jpg

相似文献

Inferring haplotypes of copy number variations from high-throughput data with uncertainty.

G3 (Bethesda). 2011 Jun;1(1):35-42. doi: 10.1534/g3.111.000174. Epub 2011 Jun 1.

HaplotypeCN: copy number haplotype inference with Hidden Markov Model and localized haplotype clustering.

PLoS One. 2014 May 21;9(5):e96841. doi: 10.1371/journal.pone.0096841. eCollection 2014.

MOCSphaser: a haplotype inference tool from a mixture of copy number variation and single nucleotide polymorphism data.

Bioinformatics. 2008 Jul 15;24(14):1645-6. doi: 10.1093/bioinformatics/btn242. Epub 2008 May 20.

Inference of haplotypic phase and missing genotypes in polyploid organisms and variable copy number genomic regions.

BMC Bioinformatics. 2008 Dec 1;9:513. doi: 10.1186/1471-2105-9-513.

Haplotype phasing and inheritance of copy number variants in nuclear families.

PLoS One. 2015 Apr 8;10(4):e0122713. doi: 10.1371/journal.pone.0122713. eCollection 2015.

Concordance rate between copy number variants detected using either high- or medium-density single nucleotide polymorphism genotype panels and the potential of imputing copy number variants from flanking high density single nucleotide polymorphism haplotypes in cattle.

BMC Genomics. 2020 Mar 4;21(1):205. doi: 10.1186/s12864-020-6627-8.

An algorithm for inferring complex haplotypes in a region of copy-number variation.

Am J Hum Genet. 2008 Aug;83(2):157-69. doi: 10.1016/j.ajhg.2008.06.021. Epub 2008 Jul 17.

Inferring combined CNV/SNP haplotypes from genotype data.

Bioinformatics. 2010 Jun 1;26(11):1437-45. doi: 10.1093/bioinformatics/btq157. Epub 2010 Apr 20.

Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.

BMC Bioinformatics. 2015 Jul 16;16:223. doi: 10.1186/s12859-015-0651-8.

A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data.

EURASIP J Bioinform Syst Biol. 2014;2014(1):7. doi: 10.1186/1687-4153-2014-7. Epub 2014 Apr 24.

引用本文的文献

A Killer Immunoglobulin - Like Receptor Gene - Content Haplotype and A Cognate Human Leukocyte Antigen Ligand are Associated with Autism.

Autism Open Access. 2016 Apr;6(2). doi: 10.4172/2165-7890.1000171. Epub 2016 Mar 28.

Haplotype phasing and inheritance of copy number variants in nuclear families.

PLoS One. 2015 Apr 8;10(4):e0122713. doi: 10.1371/journal.pone.0122713. eCollection 2015.

Estimating copy numbers of alleles from population-scale high-throughput sequencing data.

BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2105-16-S1-S4. Epub 2015 Jan 21.

A hidden Markov model for haplotype inference for present-absent data of clustered genes using identified haplotypes and haplotype patterns.

Front Genet. 2014 Aug 12;5:267. doi: 10.3389/fgene.2014.00267. eCollection 2014.

本文引用的文献

Mapping copy number variation by population-scale genome sequencing.

Nature. 2011 Feb 3;470(7332):59-65. doi: 10.1038/nature09708.

cnvHap: an integrative population and haplotype-based multiplatform model of SNPs and CNVs.

Nat Methods. 2010 Jul;7(7):541-6. doi: 10.1038/nmeth.1466. Epub 2010 May 30.

Inferring combined CNV/SNP haplotypes from genotype data.

Bioinformatics. 2010 Jun 1;26(11):1437-45. doi: 10.1093/bioinformatics/btq157. Epub 2010 Apr 20.

Population-genetic nature of copy number variations in the human genome.

Hum Mol Genet. 2010 Mar 1;19(5):761-73. doi: 10.1093/hmg/ddp541. Epub 2009 Dec 5.

Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies.

Am J Hum Genet. 2009 Dec;85(6):847-61. doi: 10.1016/j.ajhg.2009.11.004.

Copy-number variants in neurodevelopmental disorders: promises and challenges.

Trends Genet. 2009 Dec;25(12):536-44. doi: 10.1016/j.tig.2009.10.006. Epub 2009 Nov 10.

Origins and functional impact of copy number variation in the human genome.

Nature. 2010 Apr 1;464(7289):704-12. doi: 10.1038/nature08516. Epub 2009 Oct 7.

Personalized copy number and segmental duplication maps using next-generation sequencing.

Nat Genet. 2009 Oct;41(10):1061-7. doi: 10.1038/ng.437. Epub 2009 Aug 30.

Sensitive and accurate detection of copy number variants using read depth of coverage.

Genome Res. 2009 Sep;19(9):1586-92. doi: 10.1101/gr.092981.109. Epub 2009 Aug 5.

CYP2D6 genotyping for functional-gene dosage analysis by allele copy number detection.

Clin Chem. 2009 Aug;55(8):1546-54. doi: 10.1373/clinchem.2009.123620. Epub 2009 Jun 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从具有不确定性的高通量数据推断拷贝数变异的单倍型。

Inferring haplotypes of copy number variations from high-throughput data with uncertainty.

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献