Xue Fuzhong, Wang Jiezhen, Hu Ping, Ma Daoxin, Liu Jing, Li Guifu, Zhang Li, Wu Min, Sun Guoqing, Hou Haifeng
Department of Epidemiology and Biostatistics, School of Public Health, Shandong University China, No. 44 Wen-hua-xi-lu Road, Jinan City, Shandong 250012, People's Republic of China.
Hum Biol. 2005 Oct;77(5):577-617. doi: 10.1353/hub.2006.0008.
There are two purposes in displaying spatial genetic structure. One is that a visual representation of the variation of the genetic variable should be provided in the contour map. The other is that spatial genetic structure should be reflected by the patterns or the gradients with genetic boundaries in the map. Nevertheless, most conventional interpolation methods, such as Cavalli-Sforza's method in genography, inverse distance-weighted methods, and the Kriging technique, focus only on the first primary purpose because of their arbitrary thresholds marked on the maps. In this paper we present an application of the contour area multifractal model (CAMM) to human population genetics. The method enables the analysis of the geographic distribution of a genetic marker and provides an insight into the spatial and geometric properties of obtained patterns. Furthermore, the CAMM may overcome some of the limitations of other interpolation techniques because no arbitrary thresholds are necessary in the computation of genetic boundaries. The CAMM is built by establishing power law relationships between the area A (> or =rho) in the contour map and the value p itself after plotting these values on a log-log graph. A series of straight-line segments can be fitted to the points on the log-log graph, each representing a power law relationship between the area A (> or =rho) and the cutoff genetic variable value for rho in a particular range. These straight-line segments can yield a group of cutoff values, which can be identified as the genetic boundaries that can classify the map of genetic variable into discrete genetic zones. These genetic zones usually correspond to spatial genetic structure on the landscape. To provide a better understanding of the interest in the CAMM approach, we analyze the spatial genetic structures of three loci (ABO, HLA-A, and TPOX) in China using the CAMM. Each synthetic principal component (SPC) contour map of the three loci is created by using both Han and minority groups data together. These contour maps all present an obvious geographic diversity, which gradually increases from north to south, and show that the genetic differences among populations in different districts of the same nationality are greater than those among different nationalities of the same district. It is surprising to find that both the value of p and the fractal dimension alpha have a clear north to south gradient for each locus, and the same clear boundary between southern and northern Asians in each contour map is still seen in the zone of the Yangtze River, although substantial population migrations have occurred because of war or famine in the last 2,000 or 3,000 years. A clear genetic boundary between Europeans and Asians in each contour map is still seen in northwestern China with a small value of alpha, although the genetic gradient caused by gene flow between Europeans and Asians has tended to show expansion from northwestern China. From the three contour maps another interesting result can be found: The values of alpha north of the Yangtze River are generally less than those south of the Yangtze River. This indicates that the genetic differences among the populations north of the Yangtze River are generally smaller than those in populations south of the Yangtze River.
展示空间遗传结构有两个目的。一是应在等高线图中提供遗传变量变异的直观表示。另一个是空间遗传结构应由图中具有遗传边界的模式或梯度来反映。然而,大多数传统的插值方法,如基因地理学中的卡瓦利 - 斯福尔扎方法、反距离加权方法和克里金技术,由于在图上标记了任意阈值,仅关注第一个主要目的。在本文中,我们介绍了等高线面积多重分形模型(CAMM)在人类群体遗传学中的应用。该方法能够分析遗传标记的地理分布,并深入了解所获得模式的空间和几何特性。此外,CAMM 可能克服其他插值技术的一些局限性,因为在计算遗传边界时不需要任意阈值。CAMM 是通过在等高线图中建立面积 A(≥ρ)与值 p 之间的幂律关系而构建的,这些值在双对数图上绘制后,一系列直线段可以拟合到双对数图上的点,每个直线段代表特定范围内面积 A(≥ρ)与 ρ 的截止遗传变量值之间的幂律关系。这些直线段可以产生一组截止值,这些截止值可被识别为遗传边界,能够将遗传变量图分类为离散的遗传区域。这些遗传区域通常对应于景观上的空间遗传结构。为了更好地理解对 CAMM 方法的关注,我们使用 CAMM 分析了中国三个位点(ABO、HLA - A 和 TPOX)的空间遗传结构。三个位点的每个合成主成分(SPC)等高线图都是通过使用汉族和少数民族群体的数据共同创建的。这些等高线图都呈现出明显的地理多样性,从北向南逐渐增加,并且表明同一民族不同地区人群之间的遗传差异大于同一地区不同民族之间的遗传差异。令人惊讶的是,发现每个位点的 p 值和分形维数α都有明显的从北到南梯度,并且尽管在过去两三千年来由于战争或饥荒发生了大量人口迁移,但在长江区域的每个等高线图中,亚洲南部和北部之间仍然可以看到相同清晰的边界。在中国西北部,每个等高线图中欧洲人和亚洲人之间仍然可以看到明显的遗传边界,α值较小,尽管欧洲人和亚洲人之间基因流动引起的遗传梯度已倾向于从中国西北部显示出扩张。从这三个等高线图中还可以发现另一个有趣的结果:长江以北的α值通常小于长江以南的α值。这表明长江以北人群之间的遗传差异通常小于长江以南人群之间的遗传差异。