Division of Global HIV and Tuberculosis, U.S. Centers for Disease Control and Prevention, Nairobi, Kenya.
PLoS One. 2019 Sep 3;14(9):e0214262. doi: 10.1371/journal.pone.0214262. eCollection 2019.
Reproducible research is increasingly gaining interest in the research community. Automating the production of research manuscript tables from statistical software can help increase the reproducibility of findings. Logistic regression is used in studying disease prevalence and associated factors in epidemiological studies and can be easily performed using widely available software including SAS, SUDAAN, Stata or R. However, output from these software must be processed further to make it readily presentable. There exists a number of procedures developed to organize regression output, though many of them suffer limitations of flexibility, complexity, lack of validation checks for input parameters, as well as inability to incorporate survey design.
We developed a SAS macro, %svy_logistic_regression, for fitting simple and multiple logistic regression models. The macro also creates quality publication-ready tables using survey or non-survey data which aims to increase transparency of data analyses. It further significantly reduces turn-around time for conducting analysis and preparing output tables while also addressing the limitations of existing procedures. In addition, the macro allows for user-specific actions to handle missing data as well as use of replication-based variance estimation methods.
We demonstrate the use of the macro in the analysis of the 2013-2014 National Health and Nutrition Examination Survey (NHANES), a complex survey designed to assess the health and nutritional status of adults and children in the United States. The output presented here is directly from the macro and is consistent with how regression results are often presented in the epidemiological and biomedical literature, with unadjusted and adjusted model results presented side by side.
The SAS code presented in this macro is comprehensive, easy to follow, manipulate and to extend to other areas of interest. It can also be incorporated quickly by the statistician for immediate use. It is an especially valuable tool for generating quality, easy to review tables which can be incorporated directly in a publication.
可重复性研究越来越受到研究界的关注。从统计软件自动生成研究手稿表格可以帮助提高研究结果的可重复性。逻辑回归用于研究流行病学研究中的疾病患病率和相关因素,并且可以使用广泛可用的软件(包括 SAS、SUDAAN、Stata 或 R)轻松执行。然而,这些软件的输出必须进一步处理,使其易于呈现。有许多程序用于组织回归输出,尽管其中许多程序存在灵活性、复杂性、输入参数验证检查不足以及无法纳入调查设计的局限性。
我们开发了一个 SAS 宏 %svy_logistic_regression,用于拟合简单和多重逻辑回归模型。该宏还使用调查或非调查数据创建质量出版准备好的表格,旨在提高数据分析的透明度。它进一步大大减少了进行分析和准备输出表格的周转时间,同时解决了现有程序的局限性。此外,该宏允许用户特定的操作来处理缺失数据以及使用基于复制的方差估计方法。
我们展示了该宏在分析 2013-2014 年全国健康和营养检查调查(NHANES)中的使用,该调查旨在评估美国成年人和儿童的健康和营养状况。这里呈现的输出直接来自宏,与回归结果在流行病学和生物医学文献中通常呈现的方式一致,并列呈现未经调整和调整后的模型结果。
该宏中提供的 SAS 代码全面、易于遵循、操作和扩展到其他感兴趣的领域。它也可以由统计学家快速纳入,立即使用。它是生成质量高、易于审查的表格的宝贵工具,可以直接纳入出版物。