Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, United States.
Division of Infectious Diseases, Johns Hopkins School of Medicine, Baltimore, MD 21205, United States.
Am J Epidemiol. 2024 Aug 5;193(8):1146-1154. doi: 10.1093/aje/kwae031.
Multimorbidity, defined as having 2 or more chronic conditions, is a growing public health concern, but research in this area is complicated by the fact that multimorbidity is a highly heterogenous outcome. Individuals in a sample may have a differing number and varied combinations of conditions. Clustering methods, such as unsupervised machine learning algorithms, may allow us to tease out the unique multimorbidity phenotypes. However, many clustering methods exist, and choosing which to use is challenging because we do not know the true underlying clusters. Here, we demonstrate the use of 3 individual algorithms (partition around medoids, hierarchical clustering, and probabilistic clustering) and a clustering ensemble approach (which pools different clustering approaches) to identify multimorbidity clusters in the AIDS Linked to the Intravenous Experience cohort study. We show how the clusters can be compared based on cluster quality, interpretability, and predictive ability. In practice, it is critical to compare the clustering results from multiple algorithms and to choose the approach that performs best in the domain(s) that aligns with plans to use the clusters in future analyses.
多发病,定义为同时患有 2 种或更多种慢性疾病,是一个日益严重的公共卫生问题,但由于多发病是一种高度异质的结果,该领域的研究变得复杂。样本中的个体可能具有不同数量和不同组合的病症。聚类方法,如无监督机器学习算法,可以帮助我们找出独特的多发病表型。然而,存在许多聚类方法,选择使用哪种方法具有挑战性,因为我们不知道真实的潜在聚类。在这里,我们展示了使用 3 种单独的算法(中心点划分、层次聚类和概率聚类)和聚类集成方法(汇集不同聚类方法)来识别艾滋病与静脉内经验队列研究中的多发病聚类。我们展示了如何根据聚类质量、可解释性和预测能力来比较聚类。在实践中,比较来自多种算法的聚类结果并选择在与未来分析中使用聚类相匹配的领域表现最佳的方法至关重要。