Gueorguieva Ralitza, Buta Eugenia, Morean Meghan, Krishnan-Sarin Suchitra
Department of Biostatistics, Yale Center for the Study of Tobacco Products (TCORS), Yale School of Public Health, New Haven, Connecticut, USA.
Department of Psychiatry, Yale Center for the Study of Tobacco Products (TCORS), Yale School of Medicine, New Haven, Connecticut, USA.
Stat Med. 2020 Dec 30;39(30):4574-4592. doi: 10.1002/sim.8739. Epub 2020 Sep 9.
Ordinal data (eg, "low," "medium," "high"; graded response on a Likert scale) with an additional "don't know" category are frequently encountered in the medical, social, and behavioral science literature. The handling of a "don't know" option presents unique challenges as it often "destroys" the ordinal nature of the data. Commonly, nominal models are employed which ignore the partial ordering and have a complicated interpretation, especially in situations with repeatedly measured outcomes. We propose two-part models that easily accommodate longitudinal partially ordered (semiordinal) data. The most easily interpretable formulation consists of a random effect logistic submodel for "don't know" vs all the other categories combined, and a random effect ordinal submodel for the ordered categories. Correlated random effects account for statistical dependence within individual. An extension allowing for nonproportionality of odds for the predictor effects in the ordinal submodel is also considered. Maximum likelihood estimation is performed using adaptive Gaussian quadrature in SAS PROC NLMIXED. A simulation study is performed to evaluate the performance of the estimation algorithm in terms of bias and efficiency, and to compare the results of joint and separate models of the two parts, and of proportional and nonproportional model formulations. The methods are motivated and illustrated on a dataset from a study of adolescents' perceptions of nicotine strength of JUUL e-cigarettes. Using the proposed approach we show that adolescents perceive 5% nicotine content as relatively low, a misconception more pronounced among past month nonusers than among past month users of JUUL e-cigarettes.
在医学、社会科学和行为科学文献中,经常会遇到带有额外“不知道”类别的有序数据(例如,“低”、“中”、“高”;李克特量表上的分级反应)。“不知道”选项的处理带来了独特的挑战,因为它常常“破坏”数据的有序性质。通常会采用名义模型,这些模型忽略了部分排序,解释起来很复杂,尤其是在有重复测量结果的情况下。我们提出了两部分模型,它可以轻松地处理纵向部分有序(半有序)数据。最易于解释的公式包括一个用于“不知道”与所有其他类别合并的随机效应逻辑子模型,以及一个用于有序类别的随机效应有序子模型。相关随机效应考虑了个体内部的统计依赖性。还考虑了一种扩展,允许有序子模型中预测变量效应的优势比不成比例。使用SAS PROC NLMIXED中的自适应高斯求积法进行最大似然估计。进行了一项模拟研究,以评估估计算法在偏差和效率方面的性能,并比较两部分联合模型和单独模型以及比例模型和非比例模型公式的结果。这些方法通过一项关于青少年对JUUL电子烟尼古丁强度认知的研究数据集进行了说明。使用所提出的方法,我们表明青少年认为5%的尼古丁含量相对较低,这种误解在过去一个月未使用JUUL电子烟的人群中比在过去一个月使用JUUL电子烟的人群中更为明显。