Suppr超能文献

分形维数作为一种表征人类基因组遗传变异的度量。

The fractal dimension as a measure for characterizing genetic variation of the human genome.

作者信息

Lee Chang-Yong

机构信息

The Department of Industrial and Systems Engineering, Kongju National University, Cheonan, 31080, South Korea.

出版信息

Comput Biol Chem. 2020 Aug;87:107278. doi: 10.1016/j.compbiolchem.2020.107278. Epub 2020 Jun 6.

Abstract

Motivated by the characteristics of highly clustered single nucleotide polymorphism (SNP) across the human genome, we propose a set of chromosome-wise fractal dimensions as a measure for identifying an individual for human polymorphism. The fractal dimension quantifies the degree of clustered distribution of SNPs and represents parsimoniously the genetic variation in a chromosome. In this sense, the proposed scheme projects the SNP genotype data into a new space which is simpler and lower in dimension. As an illustrative example, we estimate the chromosome-wise fractal dimensions of SNPs that are extracted from the HapMap of Phase III data set. To determine the validity of the proposed measure, we apply principal component analysis (PCA) to the set of estimated fractal dimensions and demonstrate that the set more or less described the population structure of 11 global populations. We also use multidimensional scaling to relate the genetic distances based on PCA to the geographical distances between global populations. This shows that, similar to the SNP genotype data, the fractal dimensions also has a role in genetic distance in the population structure. In addition, we apply the proposed measure to a signature for the classification of global populations by developing a support vector machine model. The selected feature model predicts the global population with a balanced accuracy of about 77%. These results support that the fractal dimension is an efficient way to describe the genetic variation of global populations.

摘要

受人类基因组中高度聚集的单核苷酸多态性(SNP)特征的启发,我们提出了一组按染色体划分的分形维数,作为识别个体人类多态性的一种度量。分形维数量化了SNP的聚集分布程度,并简洁地表示了染色体中的遗传变异。从这个意义上说,所提出的方案将SNP基因型数据投影到一个更简单、维度更低的新空间中。作为一个示例,我们估计了从第三阶段数据集的HapMap中提取的SNP的按染色体划分的分形维数。为了确定所提出度量的有效性,我们对估计的分形维数集合应用主成分分析(PCA),并证明该集合或多或少描述了11个全球人群的群体结构。我们还使用多维缩放将基于PCA的遗传距离与全球人群之间的地理距离联系起来。这表明,与SNP基因型数据类似,分形维数在群体结构的遗传距离中也有作用。此外,我们通过开发支持向量机模型,将所提出的度量应用于全球人群分类的一个特征。所选特征模型预测全球人群的平衡准确率约为77%。这些结果支持分形维数是描述全球人群遗传变异的一种有效方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验