Dwivedi Alok Kumar, Dwivedi Sada Nand, Deo Suryanarayana, Shukla Rakesh, Kopras Elizabeth
Center for Biostatistical Services, Department of Environmental Health, College of Medicine, University of Cincinnati, Cincinnati, USA.
Health (Irvine Calif). 2010 Jul;2(7):641-651. doi: 10.4236/health.2010.27098.
Clinicians need to predict the number of involved nodes in breast cancer patients in order to ascertain severity, prognosis, and design subsequent treatment. The distribution of involved nodes often displays over-dispersion-a larger variability than expected. Until now, the negative binomial model has been used to describe this distribution assuming that over-dispersion is only due to unobserved heterogeneity. The distribution of involved nodes contains a large proportion of excess zeros (negative nodes), which can lead to over-dispersion. In this situation, alternative models may better account for over-dispersion due to excess zeros. This study examines data from 1152 patients who underwent axillary dissections in a tertiary hospital in India during January 1993-January 2005. We fit and compare various count models to test model abilities to predict the number of involved nodes. We also argue for using zero inflated models in such populations where all the excess zeros come from those who have at some risk of the outcome of interest. The negative binomial regression model fits the data better than the Poisson, zero hurdle/inflated Poisson regression models. However, zero hurdle/inflated negative binomial regression models predicted the number of involved nodes much more accurately than the negative binomial model. This suggests that the number of involved nodes displays excess variability not only due to unobserved heterogeneity but also due to excess negative nodes in the data set. In this analysis, only skin changes and primary site were associated with negative nodes whereas parity, skin changes, primary site and size of tumor were associated with a greater number of involved nodes. In case of near equal performances, the zero inflated negative binomial model should be preferred over the hurdle model in describing the nodal frequency because it provides an estimate of negative nodes that are at "high-risk" of nodal involvement.
临床医生需要预测乳腺癌患者受累淋巴结的数量,以便确定疾病严重程度、预后情况并设计后续治疗方案。受累淋巴结的分布常常呈现过度离散——即变异性比预期的更大。到目前为止,负二项式模型一直被用于描述这种分布,其假设过度离散仅仅是由于未观察到的异质性所致。受累淋巴结的分布中包含很大比例的零值(阴性淋巴结)过多的情况,这可能导致过度离散。在这种情况下,替代模型可能能更好地解释因零值过多而导致的过度离散现象。本研究分析了1993年1月至2005年1月期间在印度一家三级医院接受腋窝清扫术的1152例患者的数据。我们拟合并比较了各种计数模型,以测试模型预测受累淋巴结数量的能力。我们还主张在所有多余的零值都来自那些有发生感兴趣结局风险的人群中使用零膨胀模型。负二项式回归模型对数据的拟合效果优于泊松模型、零障碍/零膨胀泊松回归模型。然而,零障碍/零膨胀负二项式回归模型在预测受累淋巴结数量方面比负二项式模型准确得多。这表明受累淋巴结数量呈现出过度变异性,不仅是由于未观察到的异质性,还由于数据集中存在过多的阴性淋巴结。在该分析中,只有皮肤改变和原发部位与阴性淋巴结有关,而产次、皮肤改变、原发部位和肿瘤大小与更多受累淋巴结有关。在表现近乎相同的情况下,在描述淋巴结频率时,零膨胀负二项式模型应优于障碍模型,因为它能提供处于淋巴结受累“高风险”的阴性淋巴结的估计值。