Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
Sci Rep. 2020 Apr 2;10(1):5805. doi: 10.1038/s41598-020-62362-8.
Classic concepts of genetic (gene) diversity (heterozygosity) such as Nei & Li's nucleotide diversity were defined within a population context. Although variations are often measured in population context, the basic carriers of variation are individuals. Hence, measuring variations such as SNP of an individual against a reference genome, which has been ignored previously, is certainly in its own right. Indeed, similar practice has been a tradition in community ecology, where the basic unit of diversity measure is individual community sample. We propose to use Renyi's-entropy-based Hill numbers to define individual-level genetic diversity and similarity and demonstrate the definitions with the SNP (single nucleotide polymorphism) datasets from the 1000-Genomes Project. Hill numbers, derived from Renyi's entropy (of which Shannon's entropy is a special case), have found widely applications including measuring the quantum information entanglement and ecological diversity. The demonstrated individual-level SNP diversity not only complements the existing population-level genetic diversity concepts, but also offers building blocks for comparative genetic analysis at higher levels. The concept of individual covers, but is not limited to, individual chromosome, region of chromosome, gene cluster(s), or whole genome. Similarly, the SNP can be replaced by other structural variants or mutation types such as indels.
经典的遗传(基因)多样性(杂合性)概念,如 Nei 和 Li 的核苷酸多样性,是在种群背景下定义的。尽管变异通常在种群背景下进行测量,但变异的基本载体是个体。因此,以前被忽略的个体相对于参考基因组的 SNP 等变异的测量,本身肯定是有意义的。事实上,这种类似的做法在群落生态学中已经是一种传统,其中多样性测量的基本单位是个体群落样本。我们建议使用基于 Renyi 熵的 Hill 数来定义个体水平的遗传多样性和相似度,并使用 1000 基因组计划的 SNP(单核苷酸多态性)数据集来演示这些定义。Hill 数源自 Renyi 熵(Shannon 熵是其特例),已广泛应用于包括测量量子信息纠缠和生态多样性等领域。所展示的个体水平 SNP 多样性不仅补充了现有的种群水平遗传多样性概念,还为更高层次的比较遗传分析提供了构建模块。个体的概念不仅包含但不限于单个染色体、染色体区域、基因簇或整个基因组。同样,SNP 可以被其他结构变异或突变类型如插入缺失所取代。