Chen Yuan, Zeng Ying, Luo Feng, Yuan Zheming
Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China.
Orient Science &Technology College of Hunan Agricultural University, Changsha, China.
PLoS One. 2016 Jun 22;11(6):e0157567. doi: 10.1371/journal.pone.0157567. eCollection 2016.
The maximal information coefficient (MIC) captures dependences between paired variables, including both functional and non-functional relationships. In this paper, we develop a new method, ChiMIC, to calculate the MIC values. The ChiMIC algorithm uses the chi-square test to terminate grid optimization and then removes the restriction of maximal grid size limitation of original ApproxMaxMI algorithm. Computational experiments show that ChiMIC algorithm can maintain same MIC values for noiseless functional relationships, but gives much smaller MIC values for independent variables. For noise functional relationship, the ChiMIC algorithm can reach the optimal partition much faster. Furthermore, the MCN values based on MIC calculated by ChiMIC can capture the complexity of functional relationships in a better way, and the statistical powers of MIC calculated by ChiMIC are higher than those calculated by ApproxMaxMI. Moreover, the computational costs of ChiMIC are much less than those of ApproxMaxMI. We apply the MIC values tofeature selection and obtain better classification accuracy using features selected by the MIC values from ChiMIC.
最大信息系数(MIC)可捕捉成对变量之间的依赖关系,包括函数关系和非函数关系。在本文中,我们开发了一种新方法——卡方最大信息系数(ChiMIC)来计算MIC值。ChiMIC算法使用卡方检验来终止网格优化,进而消除了原始近似最大互信息(ApproxMaxMI)算法中最大网格大小限制的约束。计算实验表明,对于无噪声的函数关系,ChiMIC算法可保持相同的MIC值,但对于独立变量,其给出的MIC值要小得多。对于有噪声的函数关系,ChiMIC算法能更快地达到最优划分。此外,基于ChiMIC计算的MIC得出的MCN值能够更好地捕捉函数关系的复杂性,且ChiMIC计算的MIC的统计功效高于ApproxMaxMI计算的MIC。而且,ChiMIC的计算成本远低于ApproxMaxMI。我们将MIC值应用于特征选择,并使用从ChiMIC得到的MIC值所选择的特征获得了更好的分类准确率。