Plagnol Vincent, Cooper Jason D, Todd John A, Clayton David G
Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom.
PLoS Genet. 2007 May 18;3(5):e74. doi: 10.1371/journal.pgen.0030074. Epub 2007 Apr 5.
In a previous paper we have shown that, when DNA samples for cases and controls are prepared in different laboratories prior to high-throughput genotyping, scoring inaccuracies can lead to differential misclassification and, consequently, to increased false-positive rates. Different DNA sourcing is often unavoidable in large-scale disease association studies of multiple case and control sets. Here, we describe methodological improvements to minimise such biases. These fall into two categories: improvements to the basic clustering methods for identifying genotypes from fluorescence intensities, and use of "fuzzy" calls in association tests in order to make appropriate allowance for call uncertainty. We find that the main improvement is a modification of the calling algorithm that links the clustering of cases and controls while allowing for different DNA sourcing. We also find that, in the presence of different DNA sourcing, biases associated with missing data can increase the false-positive rate. Therefore, we propose the use of "fuzzy" calls to deal with uncertain genotypes that would otherwise be labeled as missing.
在之前的一篇论文中我们已经表明,当病例组和对照组的DNA样本在高通量基因分型之前于不同实验室制备时,评分不准确会导致差异性错误分类,进而导致假阳性率增加。在多个病例组和对照组的大规模疾病关联研究中,不同的DNA来源往往不可避免。在此,我们描述了将此类偏差降至最低的方法改进。这些改进分为两类:对从荧光强度识别基因型的基本聚类方法的改进,以及在关联测试中使用“模糊”调用以便适当考虑调用不确定性。我们发现主要的改进是对调用算法的修改,该算法在允许不同DNA来源的同时将病例组和对照组的聚类联系起来。我们还发现,在存在不同DNA来源的情况下,与缺失数据相关的偏差会增加假阳性率。因此,我们建议使用“模糊”调用处理否则会被标记为缺失的不确定基因型。