Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK.
Health Data Research UK London, University College London, 222 Euston Road, London, NW1 2DA, UK.
BMC Med Inform Decis Mak. 2019 Apr 18;19(1):86. doi: 10.1186/s12911-019-0805-0.
COPD is a highly heterogeneous disease composed of different phenotypes with different aetiological and prognostic profiles and current classification systems do not fully capture this heterogeneity. In this study we sought to discover, describe and validate COPD subtypes using cluster analysis on data derived from electronic health records.
We applied two unsupervised learning algorithms (k-means and hierarchical clustering) in 30,961 current and former smokers diagnosed with COPD, using linked national structured electronic health records in England available through the CALIBER resource. We used 15 clinical features, including risk factors and comorbidities and performed dimensionality reduction using multiple correspondence analysis. We compared the association between cluster membership and COPD exacerbations and respiratory and cardiovascular death with 10,736 deaths recorded over 146,466 person-years of follow-up. We also implemented and tested a process to assign unseen patients into clusters using a decision tree classifier.
We identified and characterized five COPD patient clusters with distinct patient characteristics with respect to demographics, comorbidities, risk of death and exacerbations. The four subgroups were associated with 1) anxiety/depression; 2) severe airflow obstruction and frailty; 3) cardiovascular disease and diabetes and 4) obesity/atopy. A fifth cluster was associated with low prevalence of most comorbid conditions.
COPD patients can be sub-classified into groups with differing risk factors, comorbidities, and prognosis, based on data included in their primary care records. The identified clusters confirm findings of previous clustering studies and draw attention to anxiety and depression as important drivers of the disease in young, female patients.
COPD 是一种高度异质性疾病,由具有不同病因和预后特征的不同表型组成,而目前的分类系统并未完全捕捉到这种异质性。在这项研究中,我们试图通过对来自电子健康记录的数据进行聚类分析来发现、描述和验证 COPD 亚型。
我们在英格兰通过 CALIBER 资源链接的全国结构化电子健康记录中,对 30961 名当前和曾经的 COPD 吸烟者应用了两种无监督学习算法(k-均值和层次聚类)。我们使用了包括风险因素和合并症在内的 15 个临床特征,并使用多元对应分析进行降维。我们比较了聚类成员与 COPD 加重以及呼吸和心血管死亡之间的关联,随访时间为 146466 人年,共记录了 10736 例死亡。我们还实施并测试了一种使用决策树分类器将未见过的患者分配到聚类中的方法。
我们确定并描述了五个 COPD 患者聚类,这些聚类在人口统计学、合并症、死亡和加重风险方面具有不同的患者特征。这四个亚组分别与 1)焦虑/抑郁;2)严重气流阻塞和虚弱;3)心血管疾病和糖尿病;4)肥胖/过敏。第五个聚类与大多数合并症的低患病率相关。
根据初级保健记录中包含的数据,COPD 患者可以分为具有不同风险因素、合并症和预后的组。所确定的聚类证实了之前聚类研究的结果,并引起了对焦虑和抑郁作为年轻女性患者疾病重要驱动因素的关注。