Linden Ariel
Linden Consulting Group, LLC, Ann Arbor, MI, USA; Department of Health Management and Policy, School of Public Health, University of Michigan, Ann Arbor, MI, USA.
J Eval Clin Pract. 2015 Apr;21(2):242-7. doi: 10.1111/jep.12297. Epub 2014 Dec 26.
RATIONALE, AIMS AND OBJECTIVES: An essential requirement for ensuring the validity of outcomes in matching studies is that study groups are comparable on observed pre-intervention characteristics. Investigators typically use numerical diagnostics, such as t-tests, to assess comparability (referred to as 'balance'). However, such diagnostics only test equality along one dimension (e.g. means in the case of t-tests), and therefore do not adequately capture imbalances that may exist elsewhere in the distribution. Furthermore, these tests are generally sensitive to sample size, raising the concern that a reduction in power may be mistaken for an improvement in covariate balance. In this paper, we demonstrate the shortcomings of numerical diagnostics and demonstrate how visual displays provide a complete representation of the data to more robustly assess balance.
We generate artificial datasets specifically designed to demonstrate how widely used equality tests capture only a single-dimension of the data and are sensitive to sample size. We then plot the covariate distributions using several graphical displays.
As expected, tests showing perfect covariate balance in means failed to reflect imbalances at higher moments (variances). However, these discrepancies were easily detected upon inspection of the graphic displays. Additionally, smaller sample sizes led to the appearance of covariate balance, when in fact it was a result of lower statistical power.
Given the limitations of numerical diagnostics, we advocate using graphical displays for assessing covariate balance and encourage investigators to provide such graphs when reporting balance statistics in their matching studies.
原理、目的和目标:确保匹配研究结果有效性的一个基本要求是,研究组在观察到的干预前特征上具有可比性。研究人员通常使用数值诊断方法,如t检验,来评估可比性(称为“平衡性”)。然而,此类诊断仅在一个维度上检验相等性(例如t检验中的均值),因此无法充分捕捉分布中其他地方可能存在的不平衡。此外,这些检验通常对样本量敏感,这引发了人们的担忧,即检验效能的降低可能会被误认为是协变量平衡性的改善。在本文中,我们展示了数值诊断的缺点,并展示了可视化显示如何提供数据的完整表示,以便更稳健地评估平衡性。
我们生成了专门设计的人工数据集,以展示广泛使用的相等性检验如何仅捕捉数据的一个维度,并且对样本量敏感。然后,我们使用几种图形显示来绘制协变量分布。
正如预期的那样,在均值上显示出完美协变量平衡的检验未能反映高阶矩(方差)处的不平衡。然而,通过检查图形显示很容易检测到这些差异。此外,较小的样本量导致了协变量平衡的表象,而实际上这是统计效能较低的结果。
鉴于数值诊断的局限性,我们提倡使用图形显示来评估协变量平衡,并鼓励研究人员在其匹配研究中报告平衡统计数据时提供此类图形。