Ferrari Alberto
a Department of Brain and Behavioural Sciences , University of Pavia.
Multivariate Behav Res. 2017 Mar-Apr;52(2):259-270. doi: 10.1080/00273171.2017.1279957. Epub 2017 Feb 16.
Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.
香农熵作为符号序列(如语言、氨基酸序列、DNA甲基化模式和动物发声)的复杂性和信息含量指标,在生物医学研究中越来越多地被使用。然而,信息熵作为一个随机变量的分布特性很少成为研究对象,这导致研究人员在不同实验条件下对熵进行重复测量时,主要使用线性模型或基于模拟的分析方法来评估信息含量的差异。本文提出了一种在此类条件下对熵进行推断的方法。基于贝叶斯熵估计领域的研究结果,构建了一个对称狄利克雷-多项回归模型,该模型能够有效处理平均熵估计问题。通过模拟研究表明,该模型在广泛的场景中优于线性建模,并且具有良好的统计特性。作为一个实际例子,该方法被应用于一个来自动物通讯真实实验的数据集。