Suppr超能文献

测试一种用于干预和暴露系统评价中研究设计分类的工具,结果显示该工具具有中等可靠性和低准确性。

Testing a tool for the classification of study designs in systematic reviews of interventions and exposures showed moderate reliability and low accuracy.

机构信息

Department of Pediatrics, Alberta Research Center for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11402 University Avenue, Edmonton, Alberta, Canada.

出版信息

J Clin Epidemiol. 2011 Aug;64(8):861-71. doi: 10.1016/j.jclinepi.2011.01.010. Epub 2011 Apr 30.

Abstract

OBJECTIVES

To develop and test a study design classification tool.

STUDY DESIGN

We contacted relevant organizations and individuals to identify tools used to classify study designs and ranked these using predefined criteria. The highest ranked tool was a design algorithm developed, but no longer advocated, by the Cochrane Non-Randomized Studies Methods Group; this was modified to include additional study designs and decision points. We developed a reference classification for 30 studies; 6 testers applied the tool to these studies. Interrater reliability (Fleiss' κ) and accuracy against the reference classification were assessed. The tool was further revised and retested.

RESULTS

Initial reliability was fair among the testers (κ=0.26) and the reference standard raters κ=0.33). Testing after revisions showed improved reliability (κ=0.45, moderate agreement) with improved, but still low, accuracy. The most common disagreements were whether the study design was experimental (5 of 15 studies), and whether there was a comparison of any kind (4 of 15 studies). Agreement was higher among testers who had completed graduate level training versus those who had not.

CONCLUSION

The moderate reliability and low accuracy may be because of lack of clarity and comprehensiveness of the tool, inadequate reporting of the studies, and variability in tester characteristics. The results may not be generalizable to all published studies, as the test studies were selected because they had posed challenges for previous reviewers with respect to their design classification. Application of such a tool should be accompanied by training, pilot testing, and context-specific decision rules.

摘要

目的

开发和测试一种研究设计分类工具。

研究设计

我们联系了相关组织和个人,以确定用于对研究设计进行分类的工具,并根据预先设定的标准对这些工具进行了排名。排名最高的工具是由 Cochrane 非随机研究方法组开发的,但已不再提倡的设计算法;该算法经过修改,纳入了更多的研究设计和决策点。我们为 30 项研究制定了参考分类;有 6 名测试人员将该工具应用于这些研究。评估了组内一致性(Fleiss' κ)和与参考分类的准确性。该工具进一步修订并重新测试。

结果

测试者之间的初始可靠性(κ=0.26)和参考标准评估者的可靠性(κ=0.33)都处于中等水平。修订后测试的可靠性有所提高(κ=0.45,中等一致性),准确性有所提高,但仍然较低。最常见的分歧是研究设计是否为实验性(15 项研究中有 5 项),以及是否有任何类型的比较(15 项研究中有 4 项)。接受过研究生水平培训的测试者比未接受过培训的测试者之间的一致性更高。

结论

中等的可靠性和较低的准确性可能是由于工具的不清晰和不全面、研究报告的不足以及测试者特征的差异造成的。由于测试研究是根据之前的评论者在设计分类方面遇到的挑战选择的,因此结果可能不适用于所有已发表的研究。此类工具的应用应伴随着培训、试点测试和特定于上下文的决策规则。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验