Assecondi Sara, Ostwald Dirk, Bagshaw Andrew P
School of Psychology, University of Birmingham, Birmingham, B17 2TT, U.K.
Neural Comput. 2015 Feb;27(2):281-305. doi: 10.1162/NECO_a_00695. Epub 2014 Dec 16.
Most studies involving simultaneous electroencephalographic (EEG) and functional magnetic resonance imaging (fMRI) data rely on the first-order, affine-linear correlation of EEG and fMRI features within the framework of the general linear model. An alternative is the use of information-based measures such as mutual information and entropy, which can also detect higher-order correlations present in the data. The estimate of information-theoretic quantities might be influenced by several parameters, such as the numerosity of the sample, the amount of correlation between variables, and the discretization (or binning) strategy of choice. While these issues have been investigated for invasive neurophysiological data and a number of bias-correction estimates have been developed, there has been no attempt to systematically examine the accuracy of information estimates for the multivariate distributions arising in the context of EEG-fMRI recordings. This is especially important given the differences between electrophysiological and EEG-fMRI recordings. In this study, we drew random samples from simulated bivariate and trivariate distributions, mimicking the statistical properties of EEG-fMRI data. We compared the estimated information shared by simulated random variables with its numerical value and found that the interaction between the binning strategy and the estimation method influences the accuracy of the estimate. Conditional on the simulation assumptions, we found that the equipopulated binning strategy yields the best and most consistent results across distributions and bias correction methods. We also found that within bias correction techniques, the asymptotically debiased (TPMC), the jackknife debiased (JD), and the best upper bound (BUB) approach give similar results, and those are consistent across distributions.
大多数涉及同步脑电图(EEG)和功能磁共振成像(fMRI)数据的研究都依赖于一般线性模型框架内EEG和fMRI特征的一阶仿射线性相关性。另一种方法是使用基于信息的度量,如互信息和熵,它们也可以检测数据中存在的高阶相关性。信息论量的估计可能会受到几个参数的影响,例如样本数量、变量之间的相关程度以及所选的离散化(或分箱)策略。虽然这些问题已经在侵入性神经生理学数据中进行了研究,并且已经开发了一些偏差校正估计方法,但尚未有人尝试系统地检验EEG-fMRI记录背景下多元分布的信息估计的准确性。考虑到电生理记录和EEG-fMRI记录之间的差异,这一点尤为重要。在本研究中,我们从模拟的二元和三元分布中抽取随机样本,模拟EEG-fMRI数据的统计特性。我们将模拟随机变量共享的估计信息与其数值进行比较,发现分箱策略和估计方法之间的相互作用会影响估计的准确性。在模拟假设的条件下,我们发现等数量分箱策略在各种分布和偏差校正方法中产生的结果最佳且最一致。我们还发现,在偏差校正技术中,渐近无偏(TPMC)、刀切法无偏(JD)和最佳上界(BUB)方法给出的结果相似,并且在各种分布中都是一致的。