Division of Biostatistics and Epidemiology, Department of Medicine, Medical University of South Carolina, Charleston, SC 29425, USA.
Stat Med. 2012 Jun 15;31(13):1342-60. doi: 10.1002/sim.4448. Epub 2012 Apr 11.
High-grade gliomas are the most common primary brain tumors in adults and are typically diagnosed using histopathology. However, these diagnostic categories are highly heterogeneous and do not always correlate well with survival. In an attempt to refine these diagnoses, we make several immunohistochemical measurements of YKL-40, a gene previously shown to be differentially expressed between diagnostic groups. We propose two latent class models for classification and variable selection in the presence of high-dimensional binary data, fit by using Bayesian Markov chain Monte Carlo techniques. Penalization and model selection are incorporated in this setting via prior distributions on the unknown parameters. The methods provide valid parameter estimates under conditions in which standard supervised latent class models do not, and outperform two-stage approaches to variable selection and parameter estimation in a variety of settings. We study the properties of these methods in simulations, and apply these methodologies to the glioma study for which identifiable three-class parameter estimates cannot be obtained without penalization. With penalization, the resulting latent classes correlate well with clinical tumor grade and offer additional information on survival prognosis that is not captured by clinical diagnosis alone. The inclusion of YKL-40 features also increases the precision of survival estimates. Fitting models with and without YKL-40 highlights a subgroup of patients who have glioblastoma (GBM) diagnosis but appear to have better prognosis than the typical GBM patient.
高级别神经胶质瘤是成年人中最常见的原发性脑肿瘤,通常通过组织病理学诊断。然而,这些诊断类别高度异质,并不总是与生存情况很好地相关。为了尝试改进这些诊断,我们对 YKL-40 进行了几项免疫组织化学测量,YKL-40 是一种先前显示在诊断组之间差异表达的基因。我们提出了两种潜在类别模型,用于在存在高维二进制数据的情况下进行分类和变量选择,使用贝叶斯马尔可夫链蒙特卡罗技术进行拟合。在这种情况下,通过对未知参数的先验分布进行惩罚和模型选择。这些方法在标准监督潜在类别模型无法提供有效参数估计的条件下提供了有效的参数估计,并且在各种情况下都优于两阶段变量选择和参数估计方法。我们在模拟中研究了这些方法的特性,并将这些方法应用于神经胶质瘤研究,在没有惩罚的情况下,无法获得可识别的三类别参数估计。通过惩罚,得出的潜在类别与临床肿瘤分级密切相关,并提供了仅凭临床诊断无法捕捉到的生存预后的额外信息。包含 YKL-40 特征还提高了生存估计的精度。拟合有和没有 YKL-40 的模型突出了一组患者,他们被诊断为胶质母细胞瘤 (GBM),但似乎比典型的 GBM 患者预后更好。