Tal Omri, Tran Tat Dat, Portegies Jacobus
Max-Planck-Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany.
J Theor Biol. 2017 Apr 21;419:159-183. doi: 10.1016/j.jtbi.2017.02.010. Epub 2017 Feb 12.
We demonstrate an application of a core notion of information theory, typical sequences and their related properties, to analysis of population genetic data. Based on the asymptotic equipartition property (AEP) for nonstationary discrete-time sources producing independent symbols, we introduce the concepts of typical genotypes and population entropy and cross entropy rate. We analyze three perspectives on typical genotypes: a set perspective on the interplay of typical sets of genotypes from two populations, a geometric perspective on their structure in high dimensional space, and a statistical learning perspective on the prospects of constructing typical-set based classifiers. In particular, we show that such classifiers have a surprising resilience to noise originating from small population samples, and highlight the potential for further links between inference and communication.
我们展示了信息论的一个核心概念——典型序列及其相关属性在群体遗传数据分析中的应用。基于产生独立符号的非平稳离散时间源的渐近均分性质(AEP),我们引入了典型基因型、群体熵和交叉熵率的概念。我们从三个角度分析典型基因型:从两个群体的典型基因型集相互作用的集合角度、从它们在高维空间中的结构的几何角度,以及从构建基于典型集的分类器的前景的统计学习角度。特别是,我们表明这种分类器对来自小群体样本的噪声具有惊人的抗性,并强调了推理与通信之间进一步联系的潜力。