Hua Jianping, Craig David W, Brun Marcel, Webster Jennifer, Zismann Victoria, Tembe Waibhav, Joshipura Keta, Huentelman Matthew J, Dougherty Edward R, Stephan Dietrich A
Computational Biology Division Phoenix, 445 N 5th Street, Phoenix, AZ, USA.
Bioinformatics. 2007 Jan 1;23(1):57-63. doi: 10.1093/bioinformatics/btl536. Epub 2006 Oct 24.
The technology to genotype single nucleotide polymorphisms (SNPs) at extremely high densities provides for hypothesis-free genome-wide scans for common polymorphisms associated with complex disease. However, we find that some errors introduced by commonly employed genotyping algorithms may lead to inflation of false associations between markers and phenotype.
We have developed a novel SNP genotype calling program, SNiPer-High Density (SNiPer-HD), for highly accurate genotype calling across hundreds of thousands of SNPs. The program employs an expectation-maximization (EM) algorithm with parameters based on a training sample set. The algorithm choice allows for highly accurate genotyping for most SNPs. Also, we introduce a quality control metric for each assayed SNP, such that poor-behaving SNPs can be filtered using a metric correlating to genotype class separation in the calling algorithm. SNiPer-HD is superior to the standard dynamic modeling algorithm and is complementary and non-redundant to other algorithms, such as BRLMM. Implementing multiple algorithms together may provide highly accurate genotyping calls, without inflation of false positives due to systematically miss-called SNPs. A reliable and accurate set of SNP genotypes for increasingly dense panels will eliminate some false association signals and false negative signals, allowing for rapid identification of disease susceptibility loci for complex traits.
SNiPer-HD is available at TGen's website: http://www.tgen.org/neurogenomics/data.
以极高密度对单核苷酸多态性(SNP)进行基因分型的技术,为无假设的全基因组扫描常见复杂疾病相关多态性提供了可能。然而,我们发现常用基因分型算法引入的一些错误可能会导致标记与表型之间的错误关联增加。
我们开发了一种新型SNP基因分型程序,即高密度SNP分型程序(SNiPer-HD),用于对数十万SNP进行高精度基因分型。该程序采用基于训练样本集参数的期望最大化(EM)算法。这种算法选择能对大多数SNP进行高精度基因分型。此外,我们为每个检测的SNP引入了一个质量控制指标,这样行为不佳的SNP可以使用与分型算法中基因型类别分离相关的指标进行过滤。SNiPer-HD优于标准动态建模算法,并且与其他算法(如BRLMM)互补且不冗余。一起使用多种算法可能会提供高精度的基因分型结果,而不会因系统性误判的SNP导致假阳性增加。对于越来越密集的面板,一套可靠且准确的SNP基因型将消除一些错误关联信号和假阴性信号,从而能够快速识别复杂性状的疾病易感位点。
SNiPer-HD可在TGen网站获取:http://www.tgen.org/neurogenomics/data 。