基于动态模型的寡核苷酸微阵列上100K以上单核苷酸多态性(SNP)筛选和基因分型算法

Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays.

作者信息

Di Xiaojun, Matsuzaki Hajime, Webster Teresa A, Hubbell Earl, Liu Guoying, Dong Shoulian, Bartell Dan, Huang Jing, Chiles Richard, Yang Geoffrey, Shen Mei-mei, Kulp David, Kennedy Giulia C, Mei Rui, Jones Keith W, Cawley Simon

机构信息

Affymetrix, Inc., Santa Clara, CA 95051, USA.

出版信息

Bioinformatics. 2005 May 1;21(9):1958-63. doi: 10.1093/bioinformatics/bti275. Epub 2005 Jan 18.

Abstract

MOTIVATION

A high density of single nucleotide polymorphism (SNP) coverage on the genome is desirable and often an essential requirement for population genetics studies. Region-specific or chromosome-specific linkage studies also benefit from the availability of as many high quality SNPs as possible. The availability of millions of SNPs from both Perlegen and the public domain and the development of an efficient microarray-based assay for genotyping SNPs has brought up some interesting analytical challenges. Effective methods for the selection of optimal subsets of SNPs spanning the genome and methods for accurately calling genotypes from probe hybridization patterns have enabled the development of a new microarray-based system for robustly genotyping over 100,000 SNPs per sample.

RESULTS

We introduce a new dynamic model-based algorithm (DM) for screening over 3 million SNPs and genotyping over 100,000 SNPs. The model is based on four possible underlying states: Null, A, AB and B for each probe quartet. We calculate a probe-level log likelihood for each model and then select between the four competing models with an SNP-level statistical aggregation across multiple probe quartets to provide a high-quality genotype call along with a quality measure of the call. We assess performance with HapMap reference genotypes, informative Mendelian inheritance relationship in families, and consistency between DM and another genotype classification method. At a call rate of 95.91% the concordance with reference genotypes from the HapMap Project is 99.81% based on over 1.5 million genotypes, the Mendelian error rate is 0.018% based on 10 trios, and the consistency between DM and MPAM is 99.90% at a comparable rate of 97.18%. We also develop methods for SNP selection and optimal probe selection.

AVAILABILITY

The DM algorithm is available in Affymetrix's Genotyping Tools software package and in Affymetrix's GDAS software package. See http://www.affymetrix.com for further information. 10 K and 100 K mapping array data are available on the Affymetrix website.

摘要

动机

基因组上高密度的单核苷酸多态性(SNP)覆盖是理想的,并且通常是群体遗传学研究的一项基本要求。区域特异性或染色体特异性连锁研究也受益于尽可能多的高质量SNP的可用性。来自Perlegen和公共领域的数百万个SNP的可用性以及用于SNP基因分型的基于微阵列的高效检测方法的开发带来了一些有趣的分析挑战。用于选择覆盖基因组的SNP最佳子集的有效方法以及从探针杂交模式准确调用基因型的方法,促成了一种新的基于微阵列的系统的开发,该系统能够对每个样本的超过100,000个SNP进行可靠的基因分型。

结果

我们引入了一种基于动态模型的新算法(DM),用于筛选超过300万个SNP并对超过100,000个SNP进行基因分型。该模型基于每个探针四重奏的四种可能的潜在状态:空值、A、AB和B。我们为每个模型计算探针水平的对数似然度,然后通过跨多个探针四重奏的SNP水平统计聚合在四个竞争模型之间进行选择,以提供高质量的基因型调用以及调用的质量度量。我们使用HapMap参考基因型、家族中的信息性孟德尔遗传关系以及DM与另一种基因型分类方法之间的一致性来评估性能。在95.91%的调用率下,基于超过150万个基因型,与HapMap项目参考基因型的一致性为99.81%,基于10个三联体的孟德尔错误率为0.018%,在97.18%的可比率下,DM与MPAM之间的一致性为99.90%。我们还开发了SNP选择和最佳探针选择的方法。

可用性

DM算法可在Affymetrix的基因分型工具软件包和Affymetrix的GDAS软件包中获得。有关更多信息,请访问http://www.affymetrix.com。Affymetrix网站上提供10K和100K映射阵列数据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索