Channing Laboratory, Brigham & Women's Hospital, Boston, MA, USA.
Respir Res. 2010 Mar 16;11(1):30. doi: 10.1186/1465-9921-11-30.
Numerous studies have demonstrated associations between genetic markers and COPD, but results have been inconsistent. One reason may be heterogeneity in disease definition. Unsupervised learning approaches may assist in understanding disease heterogeneity.
We selected 31 phenotypic variables and 12 SNPs from five candidate genes in 308 subjects in the National Emphysema Treatment Trial (NETT) Genetics Ancillary Study cohort. We used factor analysis to select a subset of phenotypic variables, and then used cluster analysis to identify subtypes of severe emphysema. We examined the phenotypic and genotypic characteristics of each cluster.
We identified six factors accounting for 75% of the shared variability among our initial phenotypic variables. We selected four phenotypic variables from these factors for cluster analysis: 1) post-bronchodilator FEV1 percent predicted, 2) percent bronchodilator responsiveness, and quantitative CT measurements of 3) apical emphysema and 4) airway wall thickness. K-means cluster analysis revealed four clusters, though separation between clusters was modest: 1) emphysema predominant, 2) bronchodilator responsive, with higher FEV1; 3) discordant, with a lower FEV1 despite less severe emphysema and lower airway wall thickness, and 4) airway predominant. Of the genotypes examined, membership in cluster 1 (emphysema-predominant) was associated with TGFB1 SNP rs1800470.
Cluster analysis may identify meaningful disease subtypes and/or groups of related phenotypic variables even in a highly selected group of severe emphysema subjects, and may be useful for genetic association studies.
许多研究表明遗传标记与 COPD 之间存在关联,但结果并不一致。原因之一可能是疾病定义的异质性。无监督学习方法可能有助于理解疾病的异质性。
我们从 National Emphysema Treatment Trial(NETT)遗传学辅助研究队列中的 308 名受试者中选择了五个候选基因的 31 个表型变量和 12 个 SNP。我们使用因子分析选择了一组表型变量,然后使用聚类分析来识别严重肺气肿的亚型。我们检查了每个聚类的表型和基因型特征。
我们确定了六个因素,占我们初始表型变量之间共享变异性的 75%。我们从这些因素中选择了四个表型变量进行聚类分析:1)支气管扩张剂后 FEV1 占预计值的百分比,2)支气管扩张剂反应百分比,以及 3)肺尖气肿和 4)气道壁厚度的定量 CT 测量值。K-均值聚类分析显示有四个聚类,但聚类之间的分离程度不大:1)肺气肿为主,2)支气管扩张剂反应良好,FEV1 较高;3)不一致,尽管肺气肿较轻且气道壁厚度较低,但 FEV1 较低,4)气道为主。在检查的基因型中,聚类 1(肺气肿为主)的成员与 TGFB1 SNP rs1800470 相关。
聚类分析即使在高度选择的严重肺气肿受试者中也可以识别有意义的疾病亚型和/或相关表型变量组,并且可能对遗传关联研究有用。