School of Information and Communication Technology, Gold Coast Campus, Griffith University, QLD4222, Australia.
Brief Bioinform. 2011 Sep;12(5):498-513. doi: 10.1093/bib/bbq080. Epub 2010 Dec 14.
Microarray gene expression data generally suffers from missing value problem due to a variety of experimental reasons. Since the missing data points can adversely affect downstream analysis, many algorithms have been proposed to impute missing values. In this survey, we provide a comprehensive review of existing missing value imputation algorithms, focusing on their underlying algorithmic techniques and how they utilize local or global information from within the data, or their use of domain knowledge during imputation. In addition, we describe how the imputation results can be validated and the different ways to assess the performance of different imputation algorithms, as well as a discussion on some possible future research directions. It is hoped that this review will give the readers a good understanding of the current development in this field and inspire them to come up with the next generation of imputation algorithms.
微阵列基因表达数据通常由于各种实验原因而存在缺失值问题。由于缺失数据点会对下游分析产生不利影响,因此已经提出了许多算法来估算缺失值。在本调查中,我们全面回顾了现有的缺失值估算算法,重点介绍了它们的基本算法技术以及它们如何利用数据内部的局部或全局信息,或者在估算过程中利用领域知识。此外,我们还描述了如何验证估算结果以及评估不同估算算法性能的不同方法,以及对一些可能的未来研究方向的讨论。希望本综述能使读者很好地了解该领域的当前发展,并激发他们提出下一代估算算法。