Benito Monica, Parker Joel, Du Quan, Wu Junyuan, Xiang Dong, Perou Charles M, Marron J S
Department of Statistics and Econometrics, University of Carlos III, Madrid, Spain.
Bioinformatics. 2004 Jan 1;20(1):105-14. doi: 10.1093/bioinformatics/btg385.
Systematic differences due to experimental features of microarray experiments are present in most large microarray data sets. Many different experimental features can cause biases including different sources of RNA, different production lots of microarrays or different microarray platforms. These systematic effects present a substantial hurdle to the analysis of microarray data.
We present here a new method for the identification and adjustment of systematic biases that are present within microarray data sets. Our approach is based on modern statistical discrimination methods and is shown to be very effective in removing systematic biases present in a previously published breast tumor cDNA microarray data set. The new method of 'Distance Weighted Discrimination (DWD)' is shown to be better than Support Vector Machines and Singular Value Decomposition for the adjustment of systematic microarray effects. In addition, it is shown to be of general use as a tool for the discrimination of systematic problems present in microarray data sets, including the merging of two breast tumor data sets completed on different microarray platforms.
Matlab software to perform DWD can be retrieved from https://genome.unc.edu/pubsup/dwd/
大多数大型微阵列数据集中都存在因微阵列实验的实验特征导致的系统差异。许多不同的实验特征会导致偏差,包括不同的RNA来源、不同生产批次的微阵列或不同的微阵列平台。这些系统效应给微阵列数据分析带来了很大障碍。
我们在此提出一种新方法,用于识别和调整微阵列数据集中存在的系统偏差。我们的方法基于现代统计判别方法,并且在去除先前发表的乳腺肿瘤cDNA微阵列数据集中存在的系统偏差方面显示出非常有效。“距离加权判别(DWD)”新方法在调整微阵列系统效应方面比支持向量机和奇异值分解表现更好。此外,它被证明是一种通用工具,可用于判别微阵列数据集中存在的系统问题,包括合并在不同微阵列平台上完成的两个乳腺肿瘤数据集。