Wang Xujing, Hessner Martin J, Wu Yan, Pati Nirupma, Ghosh Soumitra
Max McGee National Research Center for Juvenile Diabetes, Department of Pediatrics, Medical College and Children's Hospital of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226, USA.
Bioinformatics. 2003 Jul 22;19(11):1341-7. doi: 10.1093/bioinformatics/btg154.
Data preprocessing including proper normalization and adequate quality control before complex data mining is crucial for studies using the cDNA microarray technology. We have developed a simple procedure that integrates data filtering and normalization with quantitative quality control of microarray experiments. Previously we have shown that data variability in a microarray experiment can be very well captured by a quality score q(com) that is defined for every spot, and the ratio distribution depends on q(com). Utilizing this knowledge, our data-filtering scheme allows the investigator to decide on the filtering stringency according to desired data variability, and our normalization procedure corrects the q(com)-dependent dye biases in terms of both the location and the spread of the ratio distribution. In addition, we propose a statistical model for false positive rate determination based on the design and the quality of a microarray experiment. The model predicts that a lower limit of 0.5 for the replicate concordance rate is needed in order to be certain of true positives. Our work demonstrates the importance and advantages of having a quantitative quality control scheme for microarrays.
在使用cDNA微阵列技术的研究中,复杂数据挖掘之前的数据预处理(包括适当的标准化和充分的质量控制)至关重要。我们开发了一种简单的程序,该程序将数据过滤和标准化与微阵列实验的定量质量控制相结合。之前我们已经表明,微阵列实验中的数据变异性可以通过为每个斑点定义的质量评分q(com)很好地捕获,并且比率分布取决于q(com)。利用这一知识,我们的数据过滤方案允许研究人员根据所需的数据变异性决定过滤的严格程度,并且我们的标准化程序在比率分布的位置和离散度方面校正了依赖于q(com)的染料偏差。此外,我们基于微阵列实验的设计和质量提出了一种用于确定假阳性率的统计模型。该模型预测,为了确定真阳性,重复一致性率需要下限为0.5。我们的工作证明了对微阵列进行定量质量控制方案的重要性和优势。