Human Genetics, David Geffen School of Medicine, University of California, California, Los Angeles, USA.
BMC Bioinformatics. 2012 Dec 9;13:328. doi: 10.1186/1471-2105-13-328.
Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other co-expression measures lead to biologically meaningful modules (clusters of genes).
We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining co-expression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing non-linear relationships between quantitative variables.
The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data.
共表达度量常被用于定义基因之间的网络。互信息(MI)常被用作广义相关度量。目前尚不清楚 MI 在标准(稳健)相关度量或基于回归模型的关联度量之外能增加多少信息。此外,评估这些和其他共表达度量的转换如何导致具有生物学意义的模块(基因簇)也很重要。
我们在 8 个经验数据集和模拟中对互信息和几种相关度量进行了全面比较。我们还研究了不同的方法来转换邻接矩阵,例如使用拓扑重叠度量。总体而言,我们在所有数据集上都证实了 MI 和相关性之间的密切关系,这反映了大多数基因对满足线性或单调关系的事实。我们讨论了两种度量存在分歧的罕见情况。当涉及定义共表达网络模块时,我们还比较了相关和 MI 方法。我们表明,稳健的相关度量(通过拓扑重叠转换转换的双权重中相关)生成的模块在基因本体富集方面优于基于 MI 的模块和最大信息系数(MIC)基于模块。我们提出了一个将相关性与互信息相关联的函数,可以使用该函数从相应的相关系数中近似互信息。我们建议使用多项式或样条回归模型来代替 MI 来捕捉定量变量之间的非线性关系。
双权重中相关在阐明基因对关系方面优于 MI。与拓扑重叠矩阵转换结合使用,它通常会导致更显著富集的共表达模块。对于非线性关系,样条和多项式网络是 MI 的有吸引力的替代方案。我们的结果表明,在处理静止数据中的共表达关系时,MI 网络可以安全地被相关网络所取代。