Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada.
BMC Med Res Methodol. 2014 Jan 25;14:15. doi: 10.1186/1471-2288-14-15.
The Canadian Community Health Survey (CCHS) is a cross-sectional survey that has collected information on health determinants, health status and the utilization of the health system in Canada since 2001. Several hundred articles have been written utilizing the CCHS dataset. Previous analyses of statistical methods utilized in the literature have focused on a particular journal or set of journals to understand the statistical literacy required for understanding the published research. In this study, we describe the statistical methods referenced in the published literature utilizing the CCHS dataset(s).
A descriptive study was undertaken of references published in Medline, Embase, Web of Knowledge and Scopus associated with the CCHS. These references were imported into a Java application utilizing the searchable Apache Lucene text database and screened based upon pre-defined inclusion and exclusion criteria. Full-text PDF articles that met the inclusion criteria were then used for the identification of descriptive, elementary and regression statistical methods referenced in these articles. The identification of statistical methods occurred through an automated search of key words on the full-text articles utilizing the Java application.
We identified 4811 references from the 4 bibliographical databases for possible inclusion. After exclusions, 663 references were used for the analysis. Descriptive statistics such as means or proportions were presented in a majority of the articles (97.7%). Elementary-level statistics such as t-tests were less frequently referenced (29.7%) than descriptive statistics. Regression methods were frequently referenced in the articles: 79.8% of articles contained reference to regression in general with logistic regression appearing most frequently in 67.1% of the articles.
Our study shows a diverse set of analysis methods being referenced in the CCHS literature, however, the literature heavily relies on only a subset of all possible statistical tools. This information can be used in identifying gaps in statistical methods that could be applied to future analysis of public health surveys, insight into training and educational programs, and also identifies the level of statistical literacy needed to understand the published literature.
加拿大社区健康调查(CCHS)自 2001 年以来一直在收集加拿大健康决定因素、健康状况和卫生系统利用情况的信息。已经有数百篇文章利用 CCHS 数据集进行了编写。以前对文献中使用的统计方法的分析侧重于特定的期刊或一组期刊,以了解理解已发表研究所需的统计素养。在这项研究中,我们描述了利用 CCHS 数据集发表的文献中引用的统计方法。
对与 CCHS 相关的 Medline、Embase、Web of Knowledge 和 Scopus 中发表的参考文献进行描述性研究。这些参考文献被导入到一个利用可搜索的 Apache Lucene 文本数据库的 Java 应用程序中,并根据预定义的纳入和排除标准进行筛选。符合纳入标准的全文 PDF 文章随后用于识别这些文章中引用的描述性、基础和回归统计方法。通过利用 Java 应用程序对全文文章进行关键字自动搜索来识别统计方法。
我们从 4 个书目数据库中确定了 4811 条可能纳入的参考文献。排除后,有 663 条参考文献用于分析。大多数文章都呈现了描述性统计数据,如平均值或比例(97.7%)。基础统计数据(如 t 检验)的引用频率低于描述性统计数据(29.7%)。回归方法在文章中经常被引用:79.8%的文章包含一般回归的引用,逻辑回归在 67.1%的文章中出现最频繁。
我们的研究表明,在 CCHS 文献中引用了一组多样化的分析方法,但文献主要依赖于所有可能的统计工具中的一个子集。这些信息可用于确定公共卫生调查未来分析中可能应用的统计方法的差距、洞察培训和教育计划,并确定理解已发表文献所需的统计素养水平。