Biostatistics Unit, Medical Research Council, Cambridge, UK.
J Epidemiol Community Health. 2012 Jan;66(1):89-94. doi: 10.1136/jech.2009.103408. Epub 2011 Aug 28.
Finite mixture models posit the existence of a latent categorical variable and can be used for probabilistic classification. The authors illustrate the use of mixture models for dietary pattern analysis. An advantage of this approach is taking classification uncertainty into account.
Participants were a random sample of women from the European Prospective Investigation into Cancer. Food consumption was measured using dietary questionnaires. Mixture models identified latent classes in food consumption data, which were interpreted as dietary patterns.
Among various assumptions examined, models allowing the variance of foods to vary within and between classes fit better than alternatives assuming constant variance (the K-means method of cluster analysis also makes the latter assumption). An eight-class model was best fitting and five patterns validated well in a second random sample. Patterns with lower classification uncertainty tended to be better validated. One pattern showed low consumption of foods despite being associated with moderate body mass index.
Mixture modelling for dietary pattern analysis has advantages over both factor and cluster analysis. In contrast to these other methods, it is easy to estimate pattern prevalence, to describe patterns and to use patterns to predict disease taking classification uncertainty into account. Owing to substantial error in food consumptions, any analysis will usually find some patterns that cannot be well validated. While knowledge of classification uncertainty may aid pattern evaluation, any method will better identify patterns from food consumptions measured with less error. Mixture models may be useful to identify individuals who under-report food consumption.
有限混合模型假设存在潜在的分类变量,并可用于概率分类。作者举例说明了混合模型在饮食模式分析中的应用。这种方法的一个优点是考虑了分类不确定性。
参与者是来自欧洲癌症前瞻性调查的女性随机样本。通过饮食问卷来测量食物的摄入量。混合模型在食物摄入数据中识别潜在的类别,这些类别被解释为饮食模式。
在检查的各种假设中,允许食物方差在类别内和类别间变化的模型比假设方差不变的模型(聚类分析的 K 均值方法也做出了后者的假设)拟合得更好。一个 8 类模型是最合适的,并且在第二个随机样本中验证了 5 个模式。分类不确定性较低的模式往往验证效果更好。尽管与中等体重指数有关,但有一种模式的食物摄入量较低。
混合模型在饮食模式分析方面优于因子分析和聚类分析。与这些其他方法不同,它易于估计模式的流行率,描述模式并使用模式来预测疾病,同时考虑到分类不确定性。由于食物摄入量存在大量误差,任何分析通常都会发现一些无法很好验证的模式。虽然对分类不确定性的了解可能有助于模式评估,但任何方法都能更好地从测量误差较小的食物摄入量中识别出模式。混合模型可能有助于识别那些低报食物摄入量的个体。