Molla Michael, Shavlik Jude, Richmond Todd, Smith Steven
University of Wisconsin-Madison, USA.
Proc IEEE Comput Syst Bioinform Conf. 2004:69-79. doi: 10.1109/csb.2004.1332419.
Current methods for interpreting oligonucleotide-based SNP-detection microarrays, SNP chips, are based on statistics and require extensive parameter tuning as well as extremely high-resolution images of the chip being processed. We present a method, based on a simple data-classification technique called nearest-neighbors that, on haploid organisms, produces results comparable to the published results of the leading statistical methods and requires very little in the way of parameter tuning. Furthermore, it can interpret SNP chips using lower-resolution scanners of the type more typically used in current microarray experiments. Along with our algorithm, we present the results of a SNP-detection experiment where, when independently applying this algorithm to six identical SARS SNP chips, we correctly identify all 24 SNPs in a particular strain of the SARS virus, with between 6 and 13 false positives across the six experiments.
目前用于解读基于寡核苷酸的单核苷酸多态性(SNP)检测微阵列(即SNP芯片)的方法是基于统计学的,需要进行大量参数调整,并且需要正在处理的芯片的超高分辨率图像。我们提出了一种基于一种称为最近邻的简单数据分类技术的方法,对于单倍体生物,该方法产生的结果与领先统计方法的已发表结果相当,并且几乎不需要进行参数调整。此外,它可以使用当前微阵列实验中更常用的低分辨率扫描仪来解读SNP芯片。连同我们的算法,我们展示了一个SNP检测实验的结果,在该实验中,当将此算法独立应用于六个相同的SARS SNP芯片时,我们正确识别了SARS病毒特定毒株中的所有24个SNP,在六个实验中出现了6到13个假阳性。