Department of Statistics, Harvard University, Cambridge, Massachusetts, United States of America.
Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
PLoS One. 2023 Apr 26;18(4):e0284904. doi: 10.1371/journal.pone.0284904. eCollection 2023.
Given a large clinical database of longitudinal patient information including many covariates, it is computationally prohibitive to consider all types of interdependence between patient variables of interest. This challenge motivates the use of mutual information (MI), a statistical summary of data interdependence with appealing properties that make it a suitable alternative or addition to correlation for identifying relationships in data. MI: (i) captures all types of dependence, both linear and nonlinear, (ii) is zero only when random variables are independent, (iii) serves as a measure of relationship strength (similar to but more general than R2), and (iv) is interpreted the same way for numerical and categorical data. Unfortunately, MI typically receives little to no attention in introductory statistics courses and is more difficult than correlation to estimate from data. In this article, we motivate the use of MI in the analyses of epidemiologic data, while providing a general introduction to estimation and interpretation. We illustrate its utility through a retrospective study relating intraoperative heart rate (HR) and mean arterial pressure (MAP). We: (i) show postoperative mortality is associated with decreased MI between HR and MAP and (ii) improve existing postoperative mortality risk assessment by including MI and additional hemodynamic statistics.
给定一个包含许多协变量的大型临床患者信息纵向数据库,考虑所有类型的感兴趣患者变量之间的相关性在计算上是不可行的。这一挑战促使人们使用互信息(MI),这是一种数据相关性的统计总结,具有吸引人的属性,使其成为相关分析的替代方法或补充方法,用于识别数据中的关系。MI:(i)捕捉所有类型的依赖关系,包括线性和非线性;(ii)仅在随机变量独立时为零;(iii)作为关系强度的度量(类似于但比 R2 更通用);(iv)对数值数据和分类数据的解释方式相同。不幸的是,MI 在统计学入门课程中通常很少受到关注,而且比相关系数更难从数据中估计。在本文中,我们将在流行病学数据分析中使用 MI 的动机,同时提供对估计和解释的一般介绍。我们通过一项与术中心率(HR)和平均动脉压(MAP)相关的回顾性研究来说明其用途。我们:(i)表明术后死亡率与 HR 和 MAP 之间的 MI 降低有关;(ii)通过包括 MI 和其他血流动力学统计信息来改进现有的术后死亡率风险评估。