Ross James C, Castaldi Peter J, Cho Michael H, Chen Junxiang, Chang Yale, Dy Jennifer G, Silverman Edwin K, Washko George R, Jose Estepar Raul San
IEEE Trans Med Imaging. 2017 Jan;36(1):343-354. doi: 10.1109/TMI.2016.2608782.
We introduce a novel Bayesian nonparametric model that uses the concept of disease trajectories for disease subtype identification. Although our model is general, we demonstrate that by treating fractions of tissue patterns derived from medical images as compositional data, our model can be applied to study distinct progression trends between population subgroups. Specifically, we apply our algorithm to quantitative emphysema measurements obtained from chest CT scans in the COPDGene Study and show several distinct progression patterns. As emphysema is one of the major components of chronic obstructive pulmonary disease (COPD), the third leading cause of death in the United States [1], an improved definition of emphysema and COPD subtypes is of great interest. We investigate several models with our algorithm, and show that one with age , pack years (a measure of cigarette exposure), and smoking status as predictors gives the best compromise between estimated predictive performance and model complexity. This model identified nine subtypes which showed significant associations to seven single nucleotide polymorphisms (SNPs) known to associate with COPD. Additionally, this model gives better predictive accuracy than multiple, multivariate ordinary least squares regression as demonstrated in a five-fold cross validation analysis. We view our subtyping algorithm as a contribution that can be applied to bridge the gap between CT-level assessment of tissue composition to population-level analysis of compositional trends that vary between disease subtypes.
我们介绍了一种新颖的贝叶斯非参数模型,该模型利用疾病轨迹的概念进行疾病亚型识别。尽管我们的模型具有通用性,但我们证明,通过将从医学图像中获得的组织模式分数视为成分数据,我们的模型可应用于研究人群亚组之间不同的进展趋势。具体而言,我们将我们的算法应用于慢性阻塞性肺疾病基因研究(COPDGene Study)中从胸部CT扫描获得的定量肺气肿测量数据,并展示了几种不同的进展模式。由于肺气肿是慢性阻塞性肺疾病(COPD)的主要组成部分之一,而COPD是美国第三大死因[1],因此对肺气肿和COPD亚型的改进定义备受关注。我们用我们的算法研究了几种模型,并表明以年龄、吸烟包年数(衡量吸烟暴露程度的指标)和吸烟状态作为预测因子的模型在估计预测性能和模型复杂性之间取得了最佳平衡。该模型识别出了九个亚型,这些亚型与已知与COPD相关的七个单核苷酸多态性(SNP)显示出显著关联。此外,如在五重交叉验证分析中所示,该模型比多个多变量普通最小二乘回归具有更好的预测准确性。我们认为我们的亚型分类算法是一项贡献,可用于弥合从组织成分的CT水平评估到疾病亚型之间成分趋势的人群水平分析之间的差距。