Indrayan A, Kumar R
Division of Biostatistics and Medical Informatics, Delhi University College of Medical Sciences, Dilshad Garden, India.
Int J Epidemiol. 1996 Feb;25(1):181-9. doi: 10.1093/ije/25.1.181.
The potential of maps in the study of regional variation and similarity in health and in understanding the underlying processes is being increasingly realized. It has thus become important that more care is exercised in drawing health maps and the subjective elements are minimized. Conventional choropleth maps based on qualitative data are mostly arbitrary with regard to the number of categories and the cutoff points. This can lead to substantially different pictures based on the same data set.
We suggest use of cluster methods to discover 'natural' groups of data points which to a large extent are suggested by the data themselves. These methods can determine not only the cutoff points but also the number of categories required to depict the variability in the data. The methods have natural extension to the multivariate set-up and thus can provide the strategy to construct integrated maps based on the simultaneous consideration of several variables. Since different cluster methods can yield different grouping we propose a simple method to identify cutoffs common to a majority of the methods.
The details of the methods are explained on two real data sets. One is the indicators of mortality before one year of age in India and the other is years of life lost due to premature mortality in different countries. The maps obtained are compared with the conventional maps.
The cutoff points obtained by a majority of cluster methods deserve attention for obtaining natural groups for choroplethic depiction. Maps based on such cutoffs seem to have promise for increasing the accuracy in perception and cognition of regional variation.
地图在研究健康领域的区域差异和相似性以及理解潜在过程方面的潜力正日益得到认可。因此,在绘制健康地图时更加谨慎并尽量减少主观因素变得很重要。基于定性数据的传统分级统计图在类别数量和分界点方面大多是随意的。这可能导致基于同一数据集呈现出截然不同的图像。
我们建议使用聚类方法来发现“自然”的数据点组,这些组在很大程度上由数据本身所暗示。这些方法不仅可以确定分界点,还能确定描绘数据变异性所需的类别数量。这些方法可以自然地扩展到多变量设置,从而能够提供基于同时考虑多个变量来构建综合地图的策略。由于不同的聚类方法可能产生不同的分组,我们提出一种简单的方法来识别大多数方法共有的分界点。
在两个真实数据集上解释了这些方法的细节。一个是印度一岁前死亡率指标,另一个是不同国家因过早死亡导致的寿命损失年数。将获得的地图与传统地图进行比较。
大多数聚类方法获得的分界点值得关注,以便为分级统计描绘获得自然分组。基于此类分界点的地图似乎有望提高对区域差异的感知和认知准确性。