Suppr超能文献

一种改进的最大信息系数算法及其应用。

An improved algorithm for the maximal information coefficient and its application.

作者信息

Cao Dan, Chen Yuan, Chen Jin, Zhang Hongyan, Yuan Zheming

机构信息

Hunan Engineering and Technology Research Centre for Agricultural Big Data Analysis and Decision-making, Hunan Agricultural University, Changsha 410000, People's Republic of China.

Orient Science and Technology College of Hunan Agricultural University, Changsha 410000, Hunan, People's Republic of China.

出版信息

R Soc Open Sci. 2021 Feb 10;8(2):201424. doi: 10.1098/rsos.201424.

Abstract

The maximal information coefficient (MIC) captures both linear and nonlinear correlations between variable pairs. In this paper, we proposed the BackMIC algorithm for MIC estimation. The BackMIC algorithm adds a searching back process on the equipartitioned axis to obtain a better grid partition than the original implementation algorithm ApproxMaxMI. And similar to the ChiMIC algorithm, it terminates the grid search process by the -test instead of the maximum number of bins B(, ). Results on simulated data show that the BackMIC algorithm maintains the generality of MIC, and gives more reasonable grid partition and MIC values for independent and dependent variable pairs under comparable running times. Moreover, it is robust under different in B(, ). MIC calculated by the BackMIC algorithm reveals an improvement in statistical power and equitability. We applied (1-MIC) as the distance measurement in the K-means algorithm to perform a clustering of the cancer/normal samples. The results on four cancer datasets demonstrated that the MIC values calculated by the BackMIC algorithm can obtain better clustering results, indicating the correlations between samples measured by the BackMIC algorithm were more credible than those measured by other algorithms.

摘要

最大信息系数(MIC)能够捕捉变量对之间的线性和非线性相关性。在本文中,我们提出了用于MIC估计的BackMIC算法。BackMIC算法在等分区轴上添加了一个回溯搜索过程,以获得比原始实现算法ApproxMaxMI更好的网格划分。并且与ChiMIC算法类似,它通过t检验而非最大箱数B(n, k)来终止网格搜索过程。模拟数据结果表明,BackMIC算法保持了MIC的通用性,并且在可比的运行时间下,为独立和相关变量对给出了更合理的网格划分和MIC值。此外,它在B(n, k)的不同k值下具有鲁棒性。通过BackMIC算法计算得到的MIC在统计功效和公平性方面有所改进。我们将(1 - MIC)用作K均值算法中的距离度量,对癌症/正常样本进行聚类。四个癌症数据集的结果表明,由BackMIC算法计算得到的MIC值能够获得更好的聚类结果,这表明BackMIC算法所测量的样本之间的相关性比其他算法所测量的更可信。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ad/8074658/5dd37e43b422/rsos201424f01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验