Bansal Ajay K, Sharma Shrideo
Division of Biostatistics and Medical Informatics, Delhi University College of Medical Science, Delhi, India.
Med Sci Monit. 2003 Apr;9(4):PH1-6.
Cluster analysis is used to assign a set of observations into clusters that have similar characteristics as measured by a set of classifying variables. There have been few studies on clusters of longitudinal datasets or the mathematical and statistical modeling of partitioning of time trends. We present a model to cluster the infant mortality rate trends for 14 major states of India from 1972 to 1998.
MATERIAL/METHODS: Each state is represented as an nth degree polynomial using the curvilinear regression method. The total difference in the rate of change from time t1 (1972) to tn (1998) for each state is obtained by summing the differences in velocity between two adjacent time points (1), and the Euclidean distance of the trend from the base is calculated objectively by dividing the trend into the optimum number of divisions (2). By adding these two (1 & 2), the measure of dissimilarity coefficient is obtained, which is finally used to cluster the trends.
In this case, all three methods, i.e. complete linkage, average between groups, and Ward's linkage method (using SPSS 10.0), suggested the same number and type of clusters. Cluster I has only one state, Cluster II consists of four states, Cluster III has eight states, and Cluster IV has only one state.
Such clustering and grouping gives much more confidence to planners in devising strategies for the control of infant mortality and resource allocation at the national level.
聚类分析用于将一组观测值划分为具有相似特征的聚类,这些特征由一组分类变量来衡量。关于纵向数据集的聚类或时间趋势划分的数学和统计建模的研究较少。我们提出了一个模型,用于对1972年至1998年印度14个主要邦的婴儿死亡率趋势进行聚类。
材料/方法:使用曲线回归方法将每个邦表示为n次多项式。通过对两个相邻时间点之间的速度差异求和(1),得到每个邦从时间t1(1972年)到tn(1998年)变化率的总差异,并通过将趋势划分为最佳分割数(2)客观地计算趋势与基线的欧几里得距离。将这两者(1和2)相加,得到差异系数的度量,最终用于对趋势进行聚类。
在这种情况下,所有三种方法,即完全连锁法、组间平均法和沃德连锁法(使用SPSS 10.0),都给出了相同数量和类型的聚类。聚类I只有一个邦,聚类II由四个邦组成,聚类III有八个邦,聚类IV只有一个邦。
这种聚类和分组为规划者在制定国家层面控制婴儿死亡率和资源分配的策略时提供了更大的信心。