Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
Genomics. 2011 May;97(5):257-64. doi: 10.1016/j.ygeno.2011.03.001. Epub 2011 Mar 21.
Epistatic miniarray profiling (E-MAP) is a powerful tool for analyzing gene functions and their biological relevance. However, E-MAP data suffers from large proportion of missing values, which often results in misleading and biased analysis results. It is urgent to develop effective missing value estimation methods for E-MAP. Although several independent algorithms can be applied to achieve this goal, their performance varies significantly on different datasets, indicating different algorithms having their own advantages and disadvantages. In this paper, we propose a novel ensemble approach EMDI based on the high-level diversity to impute missing values that consists of two global and four local base estimators. Experimental results on five E-MAP datasets show that EMDI outperforms all single base algorithms, demonstrating an appropriate combination providing complementarity among different methods. Comparison results between several fusion strategies also demonstrate that the proposed high-level diversity scheme is superior to others. EMDI is freely available at www.csbio.sjtu.edu.cn/bioinf/EMDI/.
上位性微阵列分析(E-MAP)是分析基因功能及其生物学相关性的有力工具。然而,E-MAP 数据存在大量缺失值,这常常导致误导和有偏差的分析结果。因此,迫切需要开发用于 E-MAP 的有效缺失值估计方法。尽管可以应用几种独立的算法来实现这一目标,但它们在不同的数据集上的性能差异很大,这表明不同的算法各有优缺点。在本文中,我们提出了一种基于高层多样性的新型集成方法 EMDI 来估计缺失值,它由两个全局和四个局部基估计器组成。在五个 E-MAP 数据集上的实验结果表明,EMDI 优于所有单基算法,这表明适当的组合在不同方法之间提供了互补性。几种融合策略的比较结果也表明,所提出的高层多样性方案优于其他方案。EMDI 可在 www.csbio.sjtu.edu.cn/bioinf/EMDI/ 上免费获得。