Hendricks Renee, Khasawneh Mohammad
Department of Systems Science and Industrial Engineering, Watson College of Engineering and Applied Science, Binghamton University, New York, NY 13902, USA.
Brain Sci. 2021 Sep 29;11(10):1290. doi: 10.3390/brainsci11101290.
Parkinson's disease (PD) is a chronic disease. No treatment stops its progression, and it presents symptoms in multiple areas. One way to understand the PD population is to investigate the clustering of patients by demographic and clinical similarities. Previous PD cluster studies included scores from clinical surveys, which provide a numerical but ordinal, non-linear value. In addition, these studies did not include categorical variables, as the clustering method utilized was not applicable to categorical variables. It was discovered that the numerical values of patient age and disease duration were similar among past cluster results, pointing to the need to exclude these values. This paper proposes a novel and automatic discovery method to cluster PD patients by incorporating categorical variables. No estimate of the number of clusters is required as input, whereas the previous cluster methods require a guess from the end user in order for the method to be initiated. Using a patient dataset from the Parkinson's Progression Markers Initiative (PPMI) website to demonstrate the new clustering technique, our results showed that this method provided an accurate separation of the patients. In addition, this method provides an explainable process and an easy way to interpret clusters and describe patient subtypes.
帕金森病(PD)是一种慢性疾病。目前尚无治疗方法能够阻止其病情进展,且它会在多个部位出现症状。了解帕金森病患者群体的一种方法是通过人口统计学和临床相似性对患者进行聚类分析。以往的帕金森病聚类研究纳入了临床调查的评分,这些评分提供的是数值,但属于有序的非线性值。此外,这些研究未纳入分类变量,因为所采用的聚类方法不适用于分类变量。研究发现,在以往的聚类结果中,患者年龄和病程的数值相似,这表明需要排除这些数值。本文提出了一种新颖的自动发现方法,通过纳入分类变量对帕金森病患者进行聚类。该方法无需输入聚类数量的估计值,而以往的聚类方法需要终端用户进行猜测才能启动。我们使用来自帕金森病进展标志物计划(PPMI)网站的患者数据集来演示这种新的聚类技术,结果表明该方法能够准确地将患者区分开来。此外,该方法提供了一个可解释的过程,以及一种解释聚类和描述患者亚型的简便方法。