Forina M, Casolino C, Lanteri S
Dipartimento di Chimica e Tecnologie Farmaceutiche ed Alimentari, Università di Genova, Via Brigata Salerno (s/n), I-16147 Genova, Italy.
Ann Chim. 2003 Jan-Feb;93(1-2):55-68.
The agglomerative clustering methods and the tests usually applied to evaluate the significance of clusters are critically evaluated. Many clustering techniques can provide erroneous information about the existence of clusters. The single linkage technique is suggested to identify natural, well separated, clusters. The existing statistical tests on the significance of clusters are not satisfactory. A new statistical test, based on the distribution of the distances between the objects and their first nearest neighbor, is presented. The performances of the test are compared with those of the Sneath test and of the variance-ratio test on some artificial and real data sets.