Rao Sreevidya Sadananda Sadasiva, Shepherd Lori A, Bruno Andrew E, Liu Song, Miecznikowski Jeffrey C
Department of Biostatistics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA.
Adv Bioinformatics. 2013;2013:790567. doi: 10.1155/2013/790567. Epub 2013 Oct 9.
Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with k = 4 is most accurate under the error measures considered. The k-nearest neighbor method with k = 1 has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with k = 4 has the best overall performance and k-nearest neighbor method with k = 1 has the worst overall performance. These results hold true for both 5% and 10% missing values.
引言。来自微阵列质量控制(MAQC)项目的微阵列数据集能够对微阵列的精度、可比性以及其他各种微阵列分析方法进行评估。然而,据我们所知,目前尚无研究报道过MAQC数据集上缺失值插补方案的性能。在本研究中,我们使用MAQC Affymetrix数据集来评估Affymetrix微阵列中的几种插补程序。结果。我们评估了几种前沿的插补程序,并使用不同的误差度量对它们进行比较。我们随机删除了5%和10%的数据,并使用插补测试对缺失值进行插补。我们进行了1000次模拟并对结果求平均值。5%和10%删除率的结果相似。在插补方法中,在所考虑的误差度量下,我们观察到k = 4的局部最小二乘法最为准确。在插补方法和误差度量中,k = 1的k近邻法具有最高的错误率。结论。我们得出结论,对于在Affymetrix微阵列数据集中插补缺失值,使用MAS 5.0预处理方案时,k =