School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, China.
PLoS One. 2019 Jul 17;14(7):e0219551. doi: 10.1371/journal.pone.0219551. eCollection 2019.
The hypothesis of data probability density distributions has many effects on the design of a new statistical method. Based on the analysis of a group of real gene expression profiles, this study reveal that the primary density distributions of the real profiles are normal/log-normal and t distributions, accounting for 80% and 19% respectively. According to these distributions, we generated a series of simulation data to make a more comprehensive assessment for a novel statistical method, maximal information coefficient (MIC). The results show that MIC is not only in the top tier in the overall performance of identifying differentially expressed genes, but also exhibits a better adaptability and an excellent noise immunity in comparison with the existing methods.
数据概率密度分布的假设对新统计方法的设计有许多影响。基于对一组真实基因表达谱的分析,本研究表明,真实谱的主要密度分布分别为正态/对数正态分布和 t 分布,分别占 80%和 19%。根据这些分布,我们生成了一系列模拟数据,对一种新的统计方法——最大信息系数(MIC)进行了更全面的评估。结果表明,MIC 在识别差异表达基因的整体性能方面不仅处于顶级水平,而且与现有方法相比,它还表现出更好的适应性和出色的抗噪能力。