Perafita Xavier, Saez Marc
Observatori-Organisme Autònom de Salut Pública de la Diputació de Girona (Dipsalut), 17003 Girona, Spain.
Research Group on Statistics, Econometrics and Health (GRECS), University of Girona, 17003 Girona, Spain.
Int J Environ Res Public Health. 2022 Mar 12;19(6):3359. doi: 10.3390/ijerph19063359.
In the present paper, we conduct a study before creating an e-cohort for the design of the sample. This e-cohort had to enable the effective representation of the province of Girona to facilitate its study according to the axes of inequality.
The territory under study is divided by municipalities, considering these different axes. The study consists of a comparison of 14 clustering algorithms, together with 3 data sets of municipal information to detect the grouping that was the most consistent. Prior to carrying out the clustering, a variable selection process was performed to discard those that were not useful. The comparison was carried out following two axes: results and graphical representation.
The intra-cluster results were also analyzed to observe the coherence of the grouping. Finally, we study the probability of belonging to a cluster, such as the one containing the county capital.
This clustering can be the basis for working with a sample that is significant and representative of the territory.
在本文中,我们在创建电子队列以进行样本设计之前进行了一项研究。这个电子队列必须能够有效地代表赫罗纳省,以便根据不平等轴对其进行研究。
根据不同的轴,将研究区域按市镇划分。该研究包括对14种聚类算法以及3个市镇信息数据集进行比较,以检测最一致的分组。在进行聚类之前,进行了变量选择过程以剔除那些无用的变量。比较是按照两个轴进行的:结果和图形表示。
还分析了聚类内结果以观察分组的一致性。最后,我们研究了属于一个聚类的概率,例如包含县城的聚类。
这种聚类可以作为处理具有重要意义且能代表该区域的样本的基础。