Lovmar Lovisa, Ahlford Annika, Jonsson Mats, Syvänen Ann-Christine
Molecular Medicine, Department of Medical Sciences, Uppsala University, Uppsala, Sweden.
BMC Genomics. 2005 Mar 10;6:35. doi: 10.1186/1471-2164-6-35.
High-throughput genotyping of single nucleotide polymorphisms (SNPs) generates large amounts of data. In many SNP genotyping assays, the genotype assignment is based on scatter plots of signals corresponding to the two SNP alleles. In a robust assay the three clusters that define the genotypes are well separated and the distances between the data points within a cluster are short. "Silhouettes" is a graphical aid for interpretation and validation of data clusters that provides a measure of how well a data point was classified when it was assigned to a cluster. Thus "Silhouettes" can potentially be used as a quality measure for SNP genotyping results and for objective comparison of the performance of SNP assays at different circumstances.
We created a program (ClusterA) for calculating "Silhouette scores", and applied it to assess the quality of SNP genotype clusters obtained by single nucleotide primer extension ("minisequencing") in the Tag-microarray format. A Silhouette score condenses the quality of the genotype assignment for each SNP assay into a single numeric value, which ranges from 1.0, when the genotype assignment is unequivocal, down to -1.0, when the genotype assignment has been arbitrary. In the present study we applied Silhouette scores to compare the performance of four DNA polymerases in our minisequencing system by analyzing 26 SNPs in both DNA polarities in 16 DNA samples. We found Silhouettes to provide a relevant measure for the quality of SNP assays at different reaction conditions, illustrated by the four DNA polymerases here. According to our result, the genotypes can be unequivocally assigned without manual inspection when the Silhouette score for a SNP assay is > 0.65. All four DNA polymerases performed satisfactorily in our Tag-array minisequencing system.
"Silhouette scores" for assessing the quality of SNP genotyping clusters is convenient for evaluating the quality of SNP genotype assignment, and provides an objective, numeric measure for comparing the performance of SNP assays. The program we created for calculating Silhouette scores is freely available, and can be used for quality assessment of the results from all genotyping systems, where the genotypes are assigned by cluster analysis using scatter plots.
单核苷酸多态性(SNP)的高通量基因分型产生了大量数据。在许多SNP基因分型检测中,基因型的确定是基于对应于两个SNP等位基因的信号散点图。在可靠的检测中,定义基因型的三个聚类分得很开,且聚类内数据点之间的距离很短。“轮廓系数”是一种用于数据聚类解释和验证的图形工具,它提供了一个数据点在被分配到一个聚类时分类效果的度量。因此,“轮廓系数”有可能被用作SNP基因分型结果的质量度量,以及在不同情况下对SNP检测性能进行客观比较。
我们创建了一个用于计算“轮廓系数得分”的程序(ClusterA),并将其应用于评估通过单核苷酸引物延伸(“微测序”)以标签微阵列形式获得的SNP基因型聚类的质量。轮廓系数得分将每个SNP检测的基因型确定质量浓缩为一个单一数值,范围从基因型确定明确时的1.0到基因型确定随意时的 -1.0。在本研究中,我们通过分析16个DNA样本中两个DNA极性的26个SNP,应用轮廓系数得分来比较我们微测序系统中四种DNA聚合酶的性能。我们发现轮廓系数为不同反应条件下SNP检测的质量提供了一种相关度量,这里以四种DNA聚合酶为例进行说明。根据我们的结果,当一个SNP检测的轮廓系数得分> 0.65时,无需人工检查即可明确确定基因型。在我们的标签阵列微测序系统中,所有四种DNA聚合酶的表现都令人满意。
用于评估SNP基因分型聚类质量的“轮廓系数得分”便于评估SNP基因型确定的质量,并为比较SNP检测的性能提供了一个客观的数值度量。我们创建的用于计算轮廓系数得分的程序可免费获取,可用于所有通过使用散点图进行聚类分析来确定基因型的基因分型系统结果的质量评估。