Yang Dan, Liu Hanming
Technol Health Care. 2019;27(S1):249-262. doi: 10.3233/THC-199024.
The main obstacle encountered in microarray technology is how to mine the valuable information under the profiles and study the genes function.
Maximal information coefficient (MIC) is a novel, non-parametric statistic that has been successfully applied to genome-wide association studies and differentially gene and miRNA expression analysis. However, the data used in these applications are not gold standard but real data.
Therefore, this study attempts to test the feasibility of MIC for differentially expressed gene identification with simulation data.
Our experiments indicate that, MIC perfermance is better than Limma always, which is almost the same level of SAM, ROTS or DESeq2. However, the count of AUC < 0.5 of MIC is significantly smaller than the three methods, and MIC does not exhibit an abnormal phenomenon in which the AUC increases as the noise increases.
Compared to the existing methods, our experiments show that MIC is not only in the first tier in identifying differentially expressed genes and noise immunity, but also shows better robustness and stronger data/environment adaptability.
微阵列技术面临的主要障碍是如何从数据概况中挖掘有价值的信息并研究基因功能。
最大信息系数(MIC)是一种新型非参数统计量,已成功应用于全基因组关联研究以及差异基因和miRNA表达分析。然而,这些应用中使用的数据并非金标准数据,而是真实数据。
因此,本研究尝试用模拟数据检验MIC用于鉴定差异表达基因的可行性。
我们的实验表明,MIC的性能始终优于Limma,与SAM、ROTS或DESeq2处于几乎相同的水平。然而,MIC的AUC < 0.5的数量明显少于这三种方法,并且MIC不存在随着噪声增加AUC也增加的异常现象。
与现有方法相比,我们的实验表明,MIC不仅在鉴定差异表达基因和抗噪声能力方面处于第一梯队,而且还表现出更好的稳健性和更强的数据/环境适应性。