Galas David J, Nykter Matti, Carter Gregory W, Price Nathan D, Shmulevich Ilya
Institute for Systems Biology, Seattle, WA, USA; Battelle Memorial Institute, Columbus, OH, USA.
Institute for Systems Biology, Seattle, WA, USA; Institute of Signal Processing, Tampere University of Technology, Tampere, Finland.
IEEE Trans Inf Theory. 2010 Feb;56(2):667-677. doi: 10.1109/TIT.2009.2037046. Epub 2010 Feb 25.
It is not obvious what fraction of all the potential information residing in the molecules and structures of living systems is significant or meaningful to the system. Sets of random sequences or identically repeated sequences, for example, would be expected to contribute little or no useful information to a cell. This issue of quantitation of information is important since the ebb and flow of biologically significant information is essential to our quantitative understanding of biological function and evolution. Motivated specifically by these problems of biological information, we propose here a class of measures to quantify the contextual nature of the information in sets of objects, based on Kolmogorov's intrinsic complexity. Such measures discount both random and redundant information and are inherent in that they do not require a defined state space to quantify the information. The maximization of this new measure, which can be formulated in terms of the universal information distance, appears to have several useful and interesting properties, some of which we illustrate with examples.
目前尚不清楚存在于生命系统分子和结构中的所有潜在信息中,有多大比例对该系统具有重要意义或有实际意义。例如,随机序列集或完全重复的序列集预计对细胞贡献很少或没有有用信息。信息定量问题很重要,因为具有生物学意义的信息的起伏对于我们定量理解生物学功能和进化至关重要。特别是受这些生物信息问题的推动,我们在此提出一类基于柯尔莫哥洛夫内在复杂性来量化对象集中信息的上下文性质的度量。此类度量会剔除随机信息和冗余信息,并且其固有特性在于它们不需要定义状态空间来量化信息。这种新度量的最大化(可根据通用信息距离来表述)似乎具有若干有用且有趣的特性,我们将通过示例对其中一些特性进行说明。