Wan Lin, Sun Kelian, Ding Qi, Cui Yuehua, Li Ming, Wen Yalu, Elston Robert C, Qian Minping, Fu Wenjiang J
School of Mathematical Sciences, Peking University, Beijing 100871 China.
Nucleic Acids Res. 2009 Sep;37(17):e117. doi: 10.1093/nar/gkp559. Epub 2009 Jul 7.
Affymetrix SNP arrays have been widely used for single-nucleotide polymorphism (SNP) genotype calling and DNA copy number variation inference. Although numerous methods have achieved high accuracy in these fields, most studies have paid little attention to the modeling of hybridization of probes to off-target allele sequences, which can affect the accuracy greatly. In this study, we address this issue and demonstrate that hybridization with mismatch nucleotides (HWMMN) occurs in all SNP probe-sets and has a critical effect on the estimation of allelic concentrations (ACs). We study sequence binding through binding free energy and then binding affinity, and develop a probe intensity composite representation (PICR) model. The PICR model allows the estimation of ACs at a given SNP through statistical regression. Furthermore, we demonstrate with cell-line data of known true copy numbers that the PICR model can achieve reasonable accuracy in copy number estimation at a single SNP locus, by using the ratio of the estimated AC of each sample to that of the reference sample, and can reveal subtle genotype structure of SNPs at abnormal loci. We also demonstrate with HapMap data that the PICR model yields accurate SNP genotype calls consistently across samples, laboratories and even across array platforms.
Affymetrix单核苷酸多态性(SNP)芯片已被广泛用于单核苷酸多态性(SNP)基因分型和DNA拷贝数变异推断。尽管众多方法在这些领域已实现了高精度,但大多数研究很少关注探针与脱靶等位基因序列杂交的建模,而这会极大地影响准确性。在本研究中,我们解决了这个问题,并证明在所有SNP探针组中都存在与错配核苷酸的杂交(HWMMN),且其对等位基因浓度(AC)的估计有关键影响。我们通过结合自由能进而结合亲和力来研究序列结合,并开发了一种探针强度复合表示(PICR)模型。该PICR模型允许通过统计回归估计给定SNP处的AC。此外,我们利用已知真实拷贝数的细胞系数据证明,通过使用每个样本与参考样本的估计AC之比,PICR模型在单个SNP位点的拷贝数估计中可以达到合理的准确性,并且可以揭示异常位点处SNP的细微基因型结构。我们还用HapMap数据证明,PICR模型在不同样本、实验室甚至不同芯片平台上都能始终如一地产生准确的SNP基因分型结果。