Cintron Dakota W, Loken Eric, McCoach D Betsy
University of California San Francisco.
University of Connecticut.
Multivariate Behav Res. 2023 Jul-Aug;58(4):675-686. doi: 10.1080/00273171.2022.2082913. Epub 2022 Jun 14.
Mixture models can be used for explanation or individual prediction and classification. In practice, researchers are often tempted to make the class membership manifest by classifying cases according to their class of maximum posterior probability and using the "observed" class membership directly or as a variable in follow-up analyses to predict distal outcomes. This study revisits the issue of correct class assignment in latent profile analysis by providing an example where the number of classes is known (3-classes), sampling variability is eliminated, and precise estimates of classification indices are provided. This pseudo-population study design assumes the data-generating mechanism is known and provides a "best-case" scenario for evaluating correct class assignment. We use a variety of classification indices and graphical displays to show that correct classification may be poor despite relatively high entropy and overall correct class assignment metrics (e.g., percent correct). Our study serves as a reminder of the risks associated with trying to make latent class memberships manifest.
混合模型可用于解释或个体预测及分类。在实践中,研究人员常常倾向于通过根据最大后验概率类别对病例进行分类,并直接使用“观察到的”类别成员身份或在后续分析中将其作为变量来预测远端结局,从而使类别成员身份变得明显。本研究通过提供一个已知类别数量(3类)、消除抽样变异性并提供分类指标精确估计值的示例,重新审视了潜在类别分析中正确类别分配的问题。这种伪总体研究设计假定数据生成机制是已知的,并为评估正确类别分配提供了一个“最佳情况”场景。我们使用各种分类指标和图形显示来表明,尽管熵相对较高且总体正确类别分配指标(如正确百分比)较好,但正确分类可能仍然很差。我们的研究提醒人们注意试图使潜在类别成员身份变得明显所带来的风险。