Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, St Mary's Campus, Norfolk Place, London, UK.
Biostatistics. 2010 Jul;11(3):484-98. doi: 10.1093/biostatistics/kxq013. Epub 2010 Mar 29.
Standard regression analyses are often plagued with problems encountered when one tries to make inference going beyond main effects using data sets that contain dozens of variables that are potentially correlated. This situation arises, for example, in epidemiology where surveys or study questionnaires consisting of a large number of questions yield a potentially unwieldy set of interrelated data from which teasing out the effect of multiple covariates is difficult. We propose a method that addresses these problems for categorical covariates by using, as its basic unit of inference, a profile formed from a sequence of covariate values. These covariate profiles are clustered into groups and associated via a regression model to a relevant outcome. The Bayesian clustering aspect of the proposed modeling framework has a number of advantages over traditional clustering approaches in that it allows the number of groups to vary, uncovers subgroups and examines their association with an outcome of interest, and fits the model as a unit, allowing an individual's outcome potentially to influence cluster membership. The method is demonstrated with an analysis of survey data obtained from the National Survey of Children's Health. The approach has been implemented using the standard Bayesian modeling software, WinBUGS, with code provided in the supplementary material available at Biostatistics online. Further, interpretation of partitions of the data is helped by a number of postprocessing tools that we have developed.
标准回归分析经常会遇到问题,当试图使用包含数十个潜在相关变量的数据集进行超出主效应的推断时,就会出现这些问题。这种情况在流行病学中很常见,例如,调查或研究问卷包含大量问题,从这些问题中得出的相关数据可能难以梳理出多个协变量的影响。我们提出了一种方法,通过使用由一系列协变量值组成的轮廓作为其基本推断单位,来解决分类协变量的这些问题。这些协变量轮廓被聚类成组,并通过回归模型与相关结果相关联。与传统聚类方法相比,所提出的建模框架的贝叶斯聚类方面具有许多优势,因为它允许组的数量变化,揭示子组并检查它们与感兴趣的结果的关联,并作为一个整体拟合模型,允许个体的结果可能影响聚类成员。该方法通过对从全国儿童健康调查中获得的调查数据进行分析得到了验证。该方法使用标准的贝叶斯建模软件 WinBUGS 实现,并在可在线获取的生物统计学补充材料中提供了代码。此外,我们开发的一些后处理工具有助于解释数据的分区。