Suppr超能文献

用于高密度基因分型阵列的贝叶斯高斯混合模型

Bayesian Gaussian Mixture Models for High-Density Genotyping Arrays.

作者信息

Sabatti Chiara, Lange Kenneth

机构信息

Departments of Human Genetics and Statistics, University of California, Los Angeles, CA 90095.

出版信息

J Am Stat Assoc. 2008 Mar 1;103(481):89-100. doi: 10.1198/016214507000000338..

Abstract

Affymetrix's SNP (single-nucleotide polymorphism) genotyping chips have increased the scope and decreased the cost of gene-mapping studies. Because each SNP is queried by multiple DNA probes, the chips present interesting challenges in genotype calling. Traditional clustering methods distinguish the three genotypes of an SNP fairly well given a large enough sample of unrelated individuals or a training sample of known genotypes. This article describes our attempt to improve genotype calling by constructing Gaussian mixture models with empirically derived priors. The priors stabilize parameter estimation and borrow information collectively gathered on tens of thousands of SNPs. When data from related family members are available, our models capture the correlations in signals between relatives. With these advantages in mind, we apply the models to Affymetrix probe intensity data on 10,000 SNPs gathered on 63 genotyped individuals spread over eight pedigrees. We integrate the genotype-calling model with pedigree analysis and examine a sequence of symmetry hypotheses involving the correlated probe signals. The symmetry hypotheses raise novel mathematical issues of parameterization. Using the Bayesian information criterion, we select the best combination of symmetry assumptions. Compared to Affymetrix's software, our model leads to a reduction in no-calls with little sacrifice in overall calling accuracy.

摘要

Affymetrix公司的单核苷酸多态性(SNP)基因分型芯片扩大了基因图谱研究的范围并降低了其成本。由于每个SNP由多个DNA探针进行检测,这些芯片在基因型判定方面带来了有趣的挑战。在有足够多无关个体样本或已知基因型训练样本的情况下,传统聚类方法能较好地区分SNP的三种基因型。本文描述了我们通过构建具有经验推导先验概率的高斯混合模型来改进基因型判定的尝试。这些先验概率稳定了参数估计,并借鉴了在数万个SNP上共同收集的信息。当有来自相关家庭成员的数据时,我们的模型能够捕捉亲属间信号的相关性。基于这些优势,我们将模型应用于在八个家系中63个已基因分型个体上收集的10000个SNP的Affymetrix探针强度数据。我们将基因型判定模型与系谱分析相结合,并检验一系列涉及相关探针信号的对称性假设。这些对称性假设引发了参数化方面新的数学问题。使用贝叶斯信息准则,我们选择对称性假设的最佳组合。与Affymetrix的软件相比,我们的模型在总体判定准确性几乎没有牺牲的情况下,减少了无法判定的情况。

相似文献

7
A genotype calling algorithm for affymetrix SNP arrays.一种用于Affymetrix SNP阵列的基因型分型算法。
Bioinformatics. 2006 Jan 1;22(1):7-12. doi: 10.1093/bioinformatics/bti741. Epub 2005 Nov 2.
8
Smarter clustering methods for SNP genotype calling.用于单核苷酸多态性(SNP)基因分型的更智能聚类方法。
Bioinformatics. 2008 Dec 1;24(23):2665-71. doi: 10.1093/bioinformatics/btn509. Epub 2008 Sep 29.

本文引用的文献

1
A dictionary model for haplotyping, genotype calling, and association testing.
Genet Epidemiol. 2007 Nov;31(7):672-83. doi: 10.1002/gepi.20232.
2
A genotype calling algorithm for affymetrix SNP arrays.一种用于Affymetrix SNP阵列的基因型分型算法。
Bioinformatics. 2006 Jan 1;22(1):7-12. doi: 10.1093/bioinformatics/bti741. Epub 2005 Nov 2.
10
Genotyping of single nucleotide polymorphism using model-based clustering.
Bioinformatics. 2004 Mar 22;20(5):718-26. doi: 10.1093/bioinformatics/btg475. Epub 2004 Jan 29.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验