Castaldi Peter J, Benet Marta, Petersen Hans, Rafaels Nicholas, Finigan James, Paoletti Matteo, Marike Boezen H, Vonk Judith M, Bowler Russell, Pistolesi Massimo, Puhan Milo A, Anto Josep, Wauters Els, Lambrechts Diether, Janssens Wim, Bigazzi Francesca, Camiciottoli Gianna, Cho Michael H, Hersh Craig P, Barnes Kathleen, Rennard Stephen, Boorgula Meher Preethi, Dy Jennifer, Hansel Nadia N, Crapo James D, Tesfaigzi Yohannes, Agusti Alvar, Silverman Edwin K, Garcia-Aymerich Judith
Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA.
Division of General Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA.
Thorax. 2017 Nov;72(11):998-1006. doi: 10.1136/thoraxjnl-2016-209846. Epub 2017 Jun 21.
COPD is a heterogeneous disease, but there is little consensus on specific definitions for COPD subtypes. Unsupervised clustering offers the promise of 'unbiased' data-driven assessment of COPD heterogeneity. Multiple groups have identified COPD subtypes using cluster analysis, but there has been no systematic assessment of the reproducibility of these subtypes.
We performed clustering analyses across 10 cohorts in North America and Europe in order to assess the reproducibility of (1) correlation patterns of key COPD-related clinical characteristics and (2) clustering results.
We studied 17 146 individuals with COPD using identical methods and common COPD-related characteristics across cohorts (FEV, FEV/FVC, FVC, body mass index, Modified Medical Research Council score, asthma and cardiovascular comorbid disease). Correlation patterns between these clinical characteristics were assessed by principal components analysis (PCA). Cluster analysis was performed using k-medoids and hierarchical clustering, and concordance of clustering solutions was quantified with normalised mutual information (NMI), a metric that ranges from 0 to 1 with higher values indicating greater concordance.
The reproducibility of COPD clustering subtypes across studies was modest (median NMI range 0.17-0.43). For methods that excluded individuals that did not clearly belong to any cluster, agreement was better but still suboptimal (median NMI range 0.32-0.60). Continuous representations of COPD clinical characteristics derived from PCA were much more consistent across studies.
Identical clustering analyses across multiple COPD cohorts showed modest reproducibility. COPD heterogeneity is better characterised by continuous disease traits coexisting in varying degrees within the same individual, rather than by mutually exclusive COPD subtypes.
慢性阻塞性肺疾病(COPD)是一种异质性疾病,但对于COPD亚型的具体定义几乎没有共识。无监督聚类为“无偏倚”的数据驱动的COPD异质性评估提供了希望。多个研究小组已使用聚类分析确定了COPD亚型,但尚未对这些亚型的可重复性进行系统评估。
我们在北美和欧洲的10个队列中进行了聚类分析,以评估(1)关键COPD相关临床特征的相关模式和(2)聚类结果的可重复性。
我们使用相同的方法并针对各队列中常见的COPD相关特征(第一秒用力呼气容积[FEV]、FEV/用力肺活量[FVC]、FVC、体重指数、改良医学研究委员会评分、哮喘和心血管合并症)研究了17146例COPD患者。通过主成分分析(PCA)评估这些临床特征之间的相关模式。使用k-中心点法和层次聚类进行聚类分析,并用归一化互信息(NMI)对聚类解决方案的一致性进行量化,NMI的范围为0至1,值越高表明一致性越高。
COPD聚类亚型在各项研究中的可重复性一般(NMI中位数范围为0.17 - 0.43)。对于排除了不属于任何聚类的个体的方法,一致性更好但仍未达到最佳(NMI中位数范围为0.32 - 0.60)。PCA得出的COPD临床特征的连续表示在各项研究中更为一致。
多个COPD队列的相同聚类分析显示可重复性一般。COPD的异质性通过同一个体内不同程度共存的连续疾病特征来更好地描述,而不是通过相互排斥的COPD亚型。