Craddock R Cameron, Taylor Renee, Broderick Gordon, Whistler Toni, Klimas Nancy, Unger Elizabeth R
Centers for Disease Control and Prevention, Viral Exanthems and Herpesvirus Branch, Atlanta, GA 30333, USA.
Pharmacogenomics. 2006 Apr;7(3):421-8. doi: 10.2217/14622416.7.3.421.
The entropy correlation coefficient (ECC) is a useful tool for measuring statistical dependence between variables. We employed this tool to search for pairs of variables that correlated in the chronic fatigue syndrome (CFS) Computational Challenge dataset. Highly related variables are candidates for data reduction, and novel relationships could lead to hypotheses regarding the pathogenesis of CFS.
Data for 130 female participants in the Wichita (KS, USA) clinical study [1] was coded into numerical values. Metric data was grouped using Gaussian mixture models; the number of groups was chosen using Bayesian information content. The pair-wise correlation between all variables was computed using the ECC. Significance was estimated from 1000 iterations of a permutation test and a threshold of 0.01 was used to identify significantly correlated variables.
The five dimensions of multidimensional fatigue inventory (MFI) were all highly correlated with each other. Seven Short Form (SF)-36 measures, four CFS case-defining symptoms and the Zung self-rating depression scale all correlated with all MFI dimensions. No physiological variables correlate with more than one MFI dimension. MFI, SF-36, CDC symptom inventory, the Zung self-rating depression scale and three Cambridge Neuropsychological Test Automated Battery (CANTAB) measures are highly correlated with CFS disease status.
Correlations between the five dimensions of MFI are expected since they are measured from the same instrument. The relationship between MFI and Zung depression index has been previously reported. MFI, SF-36, and Centers for Disease Control and Prevention (CDC) symptom inventory are used to classify CFS; it is not surprising that they are correlated with disease status. Only one of the three CANTAB measures that correlate with disease status has been previously found, indicating the ECC identifies relationships not found with other statistical tools.
The ECC is a useful tool for measuring statistical dependence between variables in clinical and laboratory datasets. The ECC needs to be further studied to gain a better understanding of its meaning for clinical data.
熵相关系数(ECC)是测量变量间统计依赖性的有用工具。我们运用此工具在慢性疲劳综合征(CFS)计算挑战数据集中寻找相关变量对。高度相关的变量是数据简化的候选对象,而新的关系可能会引出关于CFS发病机制的假设。
美国堪萨斯州威奇托市临床研究中130名女性参与者的数据[1]被编码为数值。使用高斯混合模型对度量数据进行分组;使用贝叶斯信息准则选择组数。使用ECC计算所有变量之间的成对相关性。通过1000次排列检验迭代估计显著性,并使用0.01的阈值来识别显著相关的变量。
多维疲劳量表(MFI)的五个维度彼此高度相关。七个简短健康调查问卷(SF)-36测量指标、四个CFS病例定义症状以及zung自评抑郁量表均与所有MFI维度相关。没有生理变量与超过一个MFI维度相关。MFI、SF-36、疾病控制与预防中心(CDC)症状量表、zung自评抑郁量表以及三项剑桥神经心理测试自动成套系统(CANTAB)测量指标与CFS疾病状态高度相关。
MFI五个维度之间的相关性在意料之中,因为它们是通过同一工具测量的。MFI与zung抑郁指数之间的关系此前已有报道。MFI、SF-36以及疾病控制与预防中心(CDC)症状量表用于对CFS进行分类;它们与疾病状态相关并不奇怪。与疾病状态相关的三项CANTAB测量指标中,此前仅发现了一项,这表明ECC识别出了其他统计工具未发现的关系。
ECC是测量临床和实验室数据集中变量间统计依赖性的有用工具。需要对ECC进行进一步研究,以更好地理解其对临床数据的意义。