Borri Alessandro, Cerasa Antonio, Tonin Paolo, Citrigno Luigi, Porcaro Camillo
CNR-IASI Biomathematics Laboratory, (BioMatLab), Rome, Italy.
Centre of Excellence for Research DEWS, University of L'Aquila, L'Aquila, Italy.
Int J Neural Syst. 2022 Jun;32(6):2250028. doi: 10.1142/S0129065722500289. Epub 2022 May 12.
Over the last decades, the exuberant development of next-generation sequencing has revolutionized gene discovery. These technologies have boosted the mapping of single nucleotide polymorphisms (SNPs) across the human genome, providing a complex universe of heterogeneity characterizing individuals worldwide. Fractal dimension (FD) measures the degree of geometric irregularity, quantifying how "complex" a self-similar phenomenon is. We compared two FD algorithms, box-counting dimension (BCD) and Higuchi's fractal dimension (HFD), to characterize genome-wide patterns of SNPs extracted from the HapMap data set, which includes data from 1184 healthy subjects of eleven populations. In addition, we have used cluster and classification analysis to relate the genetic distances within chromosomes based on FD similarities to the geographical distances among the 11 global populations. We found that HFD outperformed BCD at both grand average clusterization analysis by the cophenetic correlation coefficient, in which the closest value to 1 represents the most accurate clustering solution (0.981 for the HFD and 0.956 for the BCD) and classification (79.0% accuracy, 61.7% sensitivity, and 96.4% specificity for the HFD with respect to 69.1% accuracy, 43.2% sensitivity, and 94.9% specificity for the BCD) of the 11 populations present in the HapMap data set. These results support the evidence that HFD is a reliable measure helpful in representing individual variations within all chromosomes and categorizing individuals and global populations.
在过去几十年中,下一代测序技术的蓬勃发展彻底改变了基因发现的方式。这些技术推动了人类基因组中单核苷酸多态性(SNP)的图谱绘制,揭示了全球个体所具有的复杂异质性。分形维数(FD)用于衡量几何不规则程度,量化自相似现象的“复杂”程度。我们比较了两种FD算法,即盒计数维数(BCD)和 Higuchi 分形维数(HFD),以表征从 HapMap 数据集中提取的 SNP 的全基因组模式,该数据集包含来自 11 个群体的 1184 名健康受试者的数据。此外,我们还使用聚类和分类分析,将基于 FD 相似性的染色体内遗传距离与 11 个全球群体之间的地理距离联系起来。我们发现,在通过协表相关系数进行的总体平均聚类分析中,HFD 的表现优于 BCD,其中最接近 1 的值表示最准确的聚类解决方案(HFD 为 0.981,BCD 为 0.956),并且在对 HapMap 数据集中的 11 个群体进行分类时(HFD 的准确率为 79.0%、灵敏度为 61.7%、特异性为 96.4%,而 BCD 的准确率为 69.1%、灵敏度为 43.2%、特异性为 94.9%)也是如此。这些结果支持了这样的证据,即 HFD 是一种可靠的测量方法,有助于表示所有染色体内的个体变异,并对个体和全球群体进行分类。