Department of Ecology and Evolution, University of Lausanne and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.
Bioinformatics. 2014 May 15;30(10):1392-9. doi: 10.1093/bioinformatics/btu027. Epub 2014 Jan 21.
Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability.
The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments.
The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R.
Supplementary data are available at Bioinformatics online.
公共存储库中积累的微阵列结果被广泛用于元分析研究和二级数据库中。这项技术获得的数据质量因实验而异,因此需要一种有效的质量评估方法来确保其可靠性。
缺乏良好的基准妨碍了对现有质量控制方法的评估。在这项研究中,我们提出了一种新的基于表达谱进化保守性的独立质量指标。我们使用 11 个大型器官特异性数据集表明,我们开发的新质量指标 IQRray 在 14 种测试指标中与该参考指标的相关性最高。在由来自许多独立实验的阵列组成的数据集,IQRray 在识别低质量阵列方面优于其他方法。相比之下,设计用于检测单个实验中异常值的方法(如归一化未缩放标准误差和相对对数表达)的性能较低,因为这些方法无法检测仅包含低质量阵列的数据集,并且无法直接在实验之间比较得分。
IQRray 的 R 实现可在以下网址获得:ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R。
补充数据可在 Bioinformatics 在线获得。