McLachlan G J
Department of Mathematics, University of Queensland, Australia.
Stat Methods Med Res. 1992;1(1):27-48. doi: 10.1177/096228029200100103.
In this paper we review methods of cluster analysis in the context of classifying patients on the basis of clinical and/or laboratory type observations. Both hierarchical and non-hierarchical methods of clustering are considered, although the emphasis is on the latter type, with particular attention devoted to the mixture likelihood-based approach. For the purposes of dividing a given data set into g clusters, this approach fits a mixture model of g components, using the method of maximum likelihood. It thus provides a sound statistical basis for clustering. The important but difficult question of how many clusters are there in the data can be addressed within the framework of standard statistical theory, although theoretical and computational difficulties still remain. Two case studies, involving the cluster analysis of some haemophilia and diabetes data respectively, are reported to demonstrate the mixture likelihood-based approach to clustering.
在本文中,我们回顾了在基于临床和/或实验室类型观察对患者进行分类的背景下的聚类分析方法。我们考虑了层次聚类和非层次聚类方法,不过重点是后者,尤其关注基于混合似然的方法。为了将给定数据集划分为g个聚类,这种方法使用最大似然法拟合一个具有g个成分的混合模型。因此,它为聚类提供了坚实的统计基础。尽管理论和计算上仍存在困难,但在标准统计理论框架内可以解决数据中存在多少个聚类这一重要但困难的问题。报告了两个案例研究,分别涉及对一些血友病和糖尿病数据的聚类分析,以展示基于混合似然的聚类方法。