Gaebler Johann D, Goel Sharad
Department of Statistics, Harvard University, Cambridge, MA 02138.
Kennedy School of Government, Harvard University, Cambridge, MA 02138.
Proc Natl Acad Sci U S A. 2025 Mar 11;122(10):e2416348122. doi: 10.1073/pnas.2416348122. Epub 2025 Mar 4.
In observational studies of discrimination, the most common statistical approaches consider either the rate at which decisions are made (benchmark tests) or the success rate of those decisions (outcome tests). Both tests, however, have well-known statistical limitations, sometimes suggesting discrimination even when there is none. Despite the fallibility of the benchmark and outcome tests individually, here we prove a surprisingly strong statistical guarantee: Under a common nonparametric assumption, at least one of the two tests must be correct; consequently, when both tests agree, they are guaranteed to yield correct conclusions. We present empirical evidence that the underlying assumption holds approximately in several important domains, including lending, education, and criminal justice-and that our hybrid test is robust to the moderate violations of the assumption that we observe in practice. Applying this approach to 2.8 million police stops across California, we find evidence of widespread racial discrimination.
在关于歧视的观察性研究中,最常见的统计方法要么考虑决策做出的速率(基准测试),要么考虑这些决策的成功率(结果测试)。然而,这两种测试都存在众所周知的统计局限性,有时即便不存在歧视也会显示出存在歧视的迹象。尽管基准测试和结果测试各自都存在易出错的问题,但在此我们证明了一个惊人有力的统计保证:在一个常见的非参数假设下,这两种测试中至少有一种必定是正确的;因此,当两种测试结果一致时,它们必定能得出正确的结论。我们给出了实证证据,表明该潜在假设在包括贷款、教育和刑事司法在内的几个重要领域大致成立——并且我们的混合测试对于我们在实际中观察到的对该假设的适度违背具有稳健性。将这种方法应用于加利福尼亚州280万次警方拦截事件,我们发现了广泛存在种族歧视的证据。