Novikov Eugene, Barillot Emmanuel
Service Bioinformatique, Institut Curie, 26 Rue d'Ulm, 75248 Paris Cedex 05, France.
BMC Bioinformatics. 2005 Dec 9;6:293. doi: 10.1186/1471-2105-6-293.
Although DNA microarray technologies are very powerful for the simultaneous quantitative characterization of thousands of genes, the quality of the obtained experimental data is often far from ideal. The measured microarrays images represent a regular collection of spots, and the intensity of light at each spot is proportional to the DNA copy number or to the expression level of the gene whose DNA clone is spotted. Spot quality control is an essential part of microarray image analysis, which must be carried out at the level of individual spot identification. The problem is difficult to formalize due to the diversity of instrumental and biological factors that can influence the result.
For each spot we estimate the ratio of measured fluorescence intensities revealing differential gene expression or change in DNA copy numbers between the test and control samples. We also define a set of quality characteristics and a model for combining these characteristics into an overall spot quality value. We have developed a training procedure to evaluate the contribution of each individual characteristic in the overall quality. This procedure uses information available from replicated spots, located in the same array or over a set of replicated arrays. It is assumed that unspoiled replicated spots must have very close ratios, whereas poor spots yield greater diversity in the obtained ratio estimates.
The developed procedure provides an automatic tool to quantify spot quality and to identify different types of spot deficiency occurring in DNA microarray technology. Quality values assigned to each spot can be used either to eliminate spots or to weight contribution of each ratio estimate in follow-up analysis procedures.
尽管DNA微阵列技术对于同时对数千个基因进行定量表征非常强大,但所获得的实验数据质量往往远非理想。所测量的微阵列图像是斑点的规则集合,每个斑点处的光强度与DNA拷贝数或与被点样的DNA克隆所在基因的表达水平成比例。斑点质量控制是微阵列图像分析的重要组成部分,必须在单个斑点识别层面进行。由于可能影响结果的仪器和生物因素的多样性,该问题难以形式化。
对于每个斑点,我们估计测试样本和对照样本之间揭示差异基因表达或DNA拷贝数变化的测量荧光强度的比率。我们还定义了一组质量特征以及一个将这些特征组合成整体斑点质量值的模型。我们开发了一种训练程序来评估每个个体特征在整体质量中的贡献。该程序使用来自位于同一阵列或一组重复阵列中的重复斑点的可用信息。假设未受损的重复斑点必须具有非常接近的比率,而质量差的斑点在所获得的比率估计中产生更大的差异。
所开发的程序提供了一种自动工具,用于量化斑点质量并识别DNA微阵列技术中出现的不同类型的斑点缺陷。分配给每个斑点的质量值可用于消除斑点或在后续分析程序中权衡每个比率估计的贡献。