Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany.
BMC Bioinformatics. 2012 May 30;13:114. doi: 10.1186/1471-2105-13-114.
A recent large-scale analysis of Gene Expression Omnibus (GEO) data found frequent evidence for spatial defects in a substantial fraction of Affymetrix microarrays in the GEO. Nevertheless, in contrast to quality assessment, artefact detection is not widely used in standard gene expression analysis pipelines. Furthermore, although approaches have been proposed to detect diverse types of spatial noise on arrays, the correction of these artefacts is mostly left to either summarization methods or the corresponding arrays are completely discarded.
We show that state-of-the-art robust summarization procedures are vulnerable to artefacts on arrays and cannot appropriately correct for these. To address this problem, we present a simple approach to detect artefacts with high recall and precision, which we further improve by taking into account the spatial layout of arrays. Finally, we propose two correction methods for these artefacts that either substitute values of defective probes using probeset information or filter corrupted probes. We show that our approach can identify and correct defective probe measurements appropriately and outperforms existing tools.
While summarization is insufficient to correct for defective probes, this problem can be addressed in a straightforward way by the methods we present for identification and correction of defective probes. As these methods output CEL files with corrected probe values that serve as input to standard normalization and summarization procedures, they can be easily integrated into existing microarray analysis pipelines as an additional pre-processing step. An R package is freely available from http://www.bio.ifi.lmu.de/artefact-correction.
最近对基因表达综合数据库(GEO)数据的大规模分析发现,GEO 中相当一部分 Affymetrix 微阵列存在频繁的空间缺陷证据。然而,与质量评估不同,伪影检测在标准基因表达分析流程中并未得到广泛应用。此外,尽管已经提出了多种方法来检测阵列上的各种类型的空间噪声,但这些伪影的校正大多留给汇总方法或完全丢弃相应的阵列。
我们表明,最先进的稳健汇总程序容易受到阵列上的伪影的影响,并且无法适当纠正这些伪影。为了解决这个问题,我们提出了一种简单的方法来检测具有高召回率和精度的伪影,我们进一步通过考虑阵列的空间布局来改进这些方法。最后,我们提出了两种用于这些伪影的校正方法,要么使用探针组信息替代有缺陷探针的值,要么过滤掉有缺陷的探针。我们表明,我们的方法可以适当地识别和纠正有缺陷的探针测量值,并优于现有工具。
虽然汇总不足以纠正有缺陷的探针,但可以通过我们提出的用于识别和纠正有缺陷探针的方法来直接解决这个问题。由于这些方法输出带有校正后探针值的 CEL 文件,可作为标准归一化和汇总程序的输入,因此它们可以作为附加的预处理步骤,轻松集成到现有的微阵列分析流程中。一个 R 包可从 http://www.bio.ifi.lmu.de/artefact-correction 免费获得。