在单核苷酸多态性的单倍型推断中纳入基因分型不确定性。

Incorporating genotyping uncertainty in haplotype inference for single-nucleotide polymorphisms.

作者信息

Kang Hosung, Qin Zhaohui S, Niu Tianhua, Liu Jun S

机构信息

Department of Statistics, Harvard University, Cambridge, MA 02138, USA.

出版信息

Am J Hum Genet. 2004 Mar;74(3):495-510. doi: 10.1086/382284. Epub 2004 Feb 13.

DOI:10.1086/382284

PMID:14966673

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1182263/

Abstract

The accuracy of the vast amount of genotypic information generated by high-throughput genotyping technologies is crucial in haplotype analyses and linkage-disequilibrium mapping for complex diseases. To date, most automated programs lack quality measures for the allele calls; therefore, human interventions, which are both labor intensive and error prone, have to be performed. Here, we propose a novel genotype clustering algorithm, GeneScore, based on a bivariate t-mixture model, which assigns a set of probabilities for each data point belonging to the candidate genotype clusters. Furthermore, we describe an expectation-maximization (EM) algorithm for haplotype phasing, GenoSpectrum (GS)-EM, which can use probabilistic multilocus genotype matrices (called "GenoSpectrum") as inputs. Combining these two model-based algorithms, we can perform haplotype inference directly on raw readouts from a genotyping machine, such as the TaqMan assay. By using both simulated and real data sets, we demonstrate the advantages of our probabilistic approach over the current genotype scoring methods, in terms of both the accuracy of haplotype inference and the statistical power of haplotype-based association analyses.

摘要

高通量基因分型技术所产生的大量基因型信息的准确性，对于复杂疾病的单倍型分析和连锁不平衡图谱绘制至关重要。到目前为止，大多数自动化程序缺乏对等位基因调用的质量评估；因此，必须进行人工干预，而这既耗费人力又容易出错。在此，我们基于双变量t混合模型提出了一种新颖的基因型聚类算法GeneScore，该算法为属于候选基因型簇的每个数据点分配一组概率。此外，我们描述了一种用于单倍型分型的期望最大化（EM）算法，即GenoSpectrum（GS）-EM，它可以使用概率多位点基因型矩阵（称为“GenoSpectrum”）作为输入。将这两种基于模型的算法相结合，我们可以直接对基因分型机器（如TaqMan分析）的原始读数进行单倍型推断。通过使用模拟数据集和真实数据集，我们证明了我们的概率方法在单倍型推断准确性和基于单倍型的关联分析统计功效方面优于当前的基因型评分方法。

相似文献

Incorporating genotyping uncertainty in haplotype inference for single-nucleotide polymorphisms.在单核苷酸多态性的单倍型推断中纳入基因分型不确定性。

Am J Hum Genet. 2004 Mar;74(3):495-510. doi: 10.1086/382284. Epub 2004 Feb 13.

Haplotype inference for population data with genotyping errors.针对存在基因分型错误的群体数据的单倍型推断

Biom J. 2009 Aug;51(4):644-58. doi: 10.1002/bimj.200800215.

Incorporating genotyping uncertainty in haplotype frequency estimation in pedigree studies.在系谱研究中，将基因分型不确定性纳入单倍型频率估计。

Hum Hered. 2007;64(3):172-81. doi: 10.1159/000102990. Epub 2007 May 25.

Algorithms for inferring haplotypes.单倍型推断算法。

Genet Epidemiol. 2004 Dec;27(4):334-47. doi: 10.1002/gepi.20024.

Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data.通过针对未分型二倍体基因型数据的期望最大化算法，对等位基因位点单倍型频率估计的准确性。

Am J Hum Genet. 2000 Oct;67(4):947-59. doi: 10.1086/303069. Epub 2000 Aug 22.

Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.通过序贯蒙特卡罗算法进行联合单倍型组装和基因型分型

BMC Bioinformatics. 2015 Jul 16;16:223. doi: 10.1186/s12859-015-0651-8.

Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies.利用基因型数据进行单倍型块划分和标签单核苷酸多态性选择及其在关联研究中的应用。

Genome Res. 2004 May;14(5):908-16. doi: 10.1101/gr.1837404. Epub 2004 Apr 12.

The impact of missing and erroneous genotypes on tagging SNP selection and power of subsequent association tests.缺失和错误基因型对标签单核苷酸多态性选择及后续关联检验效能的影响。

Hum Hered. 2006;61(1):31-44. doi: 10.1159/000092141. Epub 2006 Mar 23.

Inference of missing SNPs and information quantity measurements for haplotype blocks.单倍型块中缺失单核苷酸多态性的推断及信息量测量

Bioinformatics. 2005 May 1;21(9):2001-7. doi: 10.1093/bioinformatics/bti261. Epub 2005 Feb 4.

Little loss of information due to unknown phase for fine-scale linkage-disequilibrium mapping with single-nucleotide-polymorphism genotype data.利用单核苷酸多态性基因型数据进行精细尺度连锁不平衡定位时，因未知相位导致的信息损失较小。

Am J Hum Genet. 2004 May;74(5):945-53. doi: 10.1086/420773. Epub 2004 Apr 7.

引用本文的文献

Haplotype phasing of CYP2D6: an allelic ratio method using Agena MassARRAY data.CYP2D6 单倍型相位分析：一种使用 Agena MassARRAY 数据的等位基因比例方法。

Transl Psychiatry. 2024 Feb 12;14(1):91. doi: 10.1038/s41398-024-02809-y.

A Continuous Statistical Phasing Framework for the Analysis of Forensic Mitochondrial DNA Mixtures.用于法医线粒体 DNA 混合分析的连续统计相位框架。

Genes (Basel). 2021 Jan 20;12(2):128. doi: 10.3390/genes12020128.

Reducing bias of allele frequency estimates by modeling SNP genotype data with informative missingness.通过对具有信息性缺失的单核苷酸多态性（SNP）基因型数据进行建模来减少等位基因频率估计的偏差。

Front Genet. 2012 Jun 18;3:107. doi: 10.3389/fgene.2012.00107. eCollection 2012.

Inferring haplotypes of copy number variations from high-throughput data with uncertainty.从具有不确定性的高通量数据推断拷贝数变异的单倍型。

G3 (Bethesda). 2011 Jun;1(1):35-42. doi: 10.1534/g3.111.000174. Epub 2011 Jun 1.

Accurate and flexible power calculations on the spot: Applications to genomic research.现场准确且灵活的功效计算：在基因组研究中的应用。

Stat Interface. 2011;4(3):353-358. doi: 10.4310/sii.2011.v4.n3.a9.

Haplotype estimation from fuzzy genotypes using penalized likelihood.使用惩罚似然估计模糊基因型的单体型。

PLoS One. 2011;6(9):e24219. doi: 10.1371/journal.pone.0024219. Epub 2011 Sep 8.

Haplotype phasing: existing methods and new developments.单体型相位确定：现有方法和新进展。

Nat Rev Genet. 2011 Sep 16;12(10):703-14. doi: 10.1038/nrg3054.

Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies.同时进行基因型调用和单倍型相位分析可提高全基因组关联研究的基因型准确性，并减少假阳性关联。

Am J Hum Genet. 2009 Dec;85(6):847-61. doi: 10.1016/j.ajhg.2009.11.004.

Genotype determination for polymorphisms in linkage disequilibrium.连锁不平衡中多态性的基因型测定。

BMC Bioinformatics. 2009 Feb 20;10:63. doi: 10.1186/1471-2105-10-63.

Linkage disequilibrium-based quality control for large-scale genetic studies.基于连锁不平衡的大规模基因研究质量控制

PLoS Genet. 2008 Aug 1;4(8):e1000147. doi: 10.1371/journal.pgen.1000147.

本文引用的文献

The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models.选择与连锁的相互作用。I. 一般考量；杂种优势模型。

Genetics. 1964 Jan;49(1):49-67. doi: 10.1093/genetics/49.1.49.

A prospective study of XRCC1 haplotypes and their interaction with plasma carotenoids on breast cancer risk.一项关于XRCC1单倍型及其与血浆类胡萝卜素相互作用对乳腺癌风险影响的前瞻性研究。

Cancer Res. 2003 Dec 1;63(23):8536-41.

Assessing optimal neural network architecture for identifying disease-associated multi-marker genotypes using a permutation test, and application to calpain 10 polymorphisms associated with diabetes.使用置换检验评估用于识别疾病相关多标记基因型的最佳神经网络架构，并应用于与糖尿病相关的钙蛋白酶10多态性。

Ann Hum Genet. 2003 Jul;67(Pt 4):348-56. doi: 10.1046/j.1469-1809.2003.00030.x.

Haplotypes at the OPRM1 locus are associated with susceptibility to substance dependence in European-Americans.OPRM1基因座的单倍型与欧裔美国人对物质依赖的易感性相关。

Am J Med Genet B Neuropsychiatr Genet. 2003 Jul 1;120B(1):97-108. doi: 10.1002/ajmg.b.20034.

FP-TDI SNP scoring by manual and statistical procedures: a study of error rates and types.通过手动和统计程序进行的FP-TDI SNP评分：错误率和类型的研究

Biotechniques. 2003 Mar;34(3):610-6, 618-20, 622 passim. doi: 10.2144/03343dd04.

Angiotensinogen gene haplotype and hypertension: interaction with ACE gene I allele.血管紧张素原基因单倍型与高血压：与ACE基因I等位基因的相互作用。

Hypertension. 2003 Jan;41(1):9-15. doi: 10.1161/01.hyp.0000045080.28739.12.

Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms.用于单核苷酸多态性单倍型推断的分割-连接-期望最大化算法

Am J Hum Genet. 2002 Nov;71(5):1242-7. doi: 10.1086/344207.

SNP genotyping on a genome-wide amplified DOP-PCR template.在全基因组扩增的DOP-PCR模板上进行单核苷酸多态性基因分型。

Nucleic Acids Res. 2002 Nov 15;30(22):e125. doi: 10.1093/nar/gnf125.

Haplotype inference in random population samples.随机人群样本中的单倍型推断

Am J Hum Genet. 2002 Nov;71(5):1129-37. doi: 10.1086/344347. Epub 2002 Oct 17.

The impact of genotyping error on haplotype reconstruction and frequency estimation.基因分型错误对单倍型重建和频率估计的影响。

Eur J Hum Genet. 2002 Oct;10(10):616-22. doi: 10.1038/sj.ejhg.5200855.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验