Xu Tao, Zhu Guang-Jin, Han Shao-Mei
Department of Epidemiology and Statistics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China.
Department of physiopathology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China.
Chin Med Sci J. 2017 Dec 30;32(4):218-225. doi: 10.24920/J1001-9294.2017.054.
Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study. Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test. Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike's Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455), indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presence of sub-health symptoms; the binomial negative section of ZINB model analysis showed that sex, occupation, smoking, alcohol drinking, ethnicity, marital status and obesity had significant effect on the severity of sub-health. Conclusions All tests for goodness of fit and the predictive probability curve produced the same finding that ZINB model was the optimum model for exploring the influencing factors of sub-health symptoms.
目的 亚健康状态日益受到医学专业人员和公众的更多关注。将亚健康症状数量视为计数数据而非二分数据有助于全面、准确地分析亚健康人群的研究结果。本研究旨在比较计数结果模型的拟合优度,以确定亚健康研究的最佳模型。方法 本研究样本来源于2007年至2011年在中国4个省和2个自治区进行的一项关于生理和心理常数的大规模人群调查。我们使用SAS构建了四个计数结果模型:泊松模型、负二项式(NB)模型、零膨胀泊松(ZIP)模型和零膨胀负二项式(ZINB)模型。亚健康症状数量用作主要结局指标。使用α离散参数和O检验来识别过度分散的数据,并使用Vuong检验来评估零计数过多的情况。通过预测概率曲线和似然比检验统计量来确定回归模型的拟合优度。结果 在所有78307名受访者中,38.53%报告无亚健康症状。亚健康症状的平均数量为2.98,标准差为3.72。过度分散检验中的统计量O为720.995(P<0.001);比较ZINB模型和ZIP模型时,估计的α为0.618(95%CI:0.600 - 0.636);Vuong检验统计量Z为4