Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, Division of Musculoskeletal and Dermatological Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, M13 9PT, UK.
Department of Developmental Disability Neuropsychiatry, School of Psychiatry, University of New South Wales, Sydney, Australia.
BMC Med Res Methodol. 2020 May 27;20(1):132. doi: 10.1186/s12874-020-00994-0.
Propensity scores are widely used to deal with confounding bias in medical research. An incorrectly specified propensity score model may lead to residual confounding bias; therefore it is essential to use diagnostics to assess propensity scores in a propensity score analysis. The current use of propensity score diagnostics in the medical literature is unknown. The objectives of this study are to (1) assess the use of propensity score diagnostics in medical studies published in high-ranking journals, and (2) assess whether the use of propensity score diagnostics differs between studies (a) in different research areas and (b) using different propensity score methods.
A PubMed search identified studies published in high-impact journals between Jan 1st 2014 and Dec 31st 2016 using propensity scores to answer an applied medical question. From each study we extracted information regarding how propensity scores were assessed and which propensity score method was used. Research area was defined using the journal categories from the Journal Citations Report.
A total of 894 papers were included in the review. Of these, 187 (20.9%) failed to report whether the propensity score had been assessed. Commonly reported diagnostics were p-values from hypothesis tests (36.6%) and the standardised mean difference (34.6%). Statistical tests provided marginally stronger evidence for a difference in diagnostic use between studies in different research areas (p = 0.033) than studies using different propensity score methods (p = 0.061).
The use of diagnostics in the propensity score medical literature is far from optimal, with different diagnostics preferred in different areas of medicine. The propensity score literature may improve with focused efforts to change practice in areas where suboptimal practice is most common.
倾向评分在医学研究中被广泛用于处理混杂偏差。一个指定不当的倾向评分模型可能导致残留混杂偏差;因此,在倾向评分分析中使用诊断方法来评估倾向评分是至关重要的。目前,医学文献中倾向评分诊断的使用情况尚不清楚。本研究的目的是:(1)评估在高影响力期刊上发表的医学研究中倾向评分诊断的使用情况;(2)评估倾向评分诊断的使用是否因研究(a)在不同的研究领域和(b)使用不同的倾向评分方法而有所不同。
通过 PubMed 检索,确定了 2014 年 1 月 1 日至 2016 年 12 月 31 日期间在高影响力期刊上发表的使用倾向评分回答应用医学问题的研究。从每项研究中,我们提取了关于如何评估倾向评分以及使用何种倾向评分方法的信息。研究领域是使用期刊引文报告中的期刊类别来定义的。
共有 894 篇论文被纳入综述。其中,187 篇(20.9%)未报告是否评估了倾向评分。常见的报告诊断方法是假设检验的 p 值(36.6%)和标准化平均差(34.6%)。统计检验结果表明,在不同研究领域的研究中,诊断使用的差异(p=0.033)比使用不同倾向评分方法的研究(p=0.061)更具统计学意义。
倾向评分医学文献中诊断方法的使用远非最佳,不同医学领域倾向于使用不同的诊断方法。在最常见的实践不当领域,通过有针对性的努力来改变实践,倾向评分文献可能会得到改善。