Turner Elizabeth L, Dobson Joanna E, Pocock Stuart J
Department of Medical Statistics, London School of Hygiene & Tropical Medicine, London, UK.
Epidemiol Perspect Innov. 2010 Oct 15;7:9. doi: 10.1186/1742-5573-7-9.
Reports of observational epidemiological studies often categorise (group) continuous risk factor (exposure) variables. However, there has been little systematic assessment of how categorisation is practiced or reported in the literature and no extended guidelines for the practice have been identified. Thus, we assessed the nature of such practice in the epidemiological literature. Two months (December 2007 and January 2008) of five epidemiological and five general medical journals were reviewed. All articles that examined the relationship between continuous risk factors and health outcomes were surveyed using a standard proforma, with the focus on the primary risk factor. Using the survey results we provide illustrative examples and, combined with ideas from the broader literature and from experience, we offer guidelines for good practice.
Of the 254 articles reviewed, 58 were included in our survey. Categorisation occurred in 50 (86%) of them. Of those, 42% also analysed the variable continuously and 24% considered alternative groupings. Most (78%) used 3 to 5 groups. No articles relied solely on dichotomisation, although it did feature prominently in 3 articles. The choice of group boundaries varied: 34% used quantiles, 18% equally spaced categories, 12% external criteria, 34% other approaches and 2% did not describe the approach used. Categorical risk estimates were most commonly (66%) presented as pairwise comparisons to a reference group, usually the highest or lowest (79%). Reporting of categorical analysis was mostly in tables; only 20% in figures.
Categorical analyses of continuous risk factors are common. Accordingly, we provide recommendations for good practice. Key issues include pre-defining appropriate choice of groupings and analysis strategies, clear presentation of grouped findings in tables and figures, and drawing valid conclusions from categorical analyses, avoiding injudicious use of multiple alternative analyses.
观察性流行病学研究报告常常对连续型风险因素(暴露)变量进行分类(分组)。然而,对于文献中如何进行分类以及如何报告分类情况,几乎没有系统的评估,也未发现相关的扩展实践指南。因此,我们评估了流行病学文献中此类实践的本质。对五种流行病学杂志和五种普通医学杂志在两个月(2007年12月和2008年1月)期间发表的文章进行了回顾。使用标准格式对所有研究连续型风险因素与健康结局之间关系的文章进行了调查,重点关注主要风险因素。利用调查结果,我们提供了示例,并结合更广泛文献中的观点和经验,给出了良好实践的指南。
在审查的254篇文章中,有58篇纳入我们的调查。其中50篇(86%)进行了分类。在这些文章中,42%还对变量进行了连续分析,24%考虑了其他分组方式。大多数(78%)使用3至5个组。没有文章仅依赖二分法,尽管二分法在3篇文章中显著出现。组界的选择各不相同:34%使用分位数,18%使用等距类别,12%使用外部标准,34%使用其他方法,2%未描述所使用的方法。分类风险估计最常见的呈现方式(66%)是与一个参照组进行两两比较,通常是最高或最低组(79%)。分类分析的报告大多在表格中;只有20%在图中。
对连续型风险因素进行分类分析很常见。因此,我们提供了良好实践的建议。关键问题包括预先确定合适的分组选择和分析策略,在表格和图中清晰呈现分组结果,并从分类分析中得出有效结论,避免不当使用多种替代分析。