Dimmery Drew, Munger Kevin
Hertie School, Berlin, Germany.
Computational Social Science European University Institute.
Obs Stud. 2025 Apr 11;11(1):17-26. doi: 10.1353/obs.2025.a956838. eCollection 2025.
We provide a critical response to Aronow et al. (2021) which argued that randomized controlled trials (RCTs) are "enough," while nonparametric identification in observational studies is not. We first investigate what is meant by "enough," arguing that this is a fundamentally a sociological claim about the relationship between statistical work and relevant institutional processes (here, academic peer review), rather than something that can be decided from within the logic of statistics. For a more complete conception of "enough," we outline all that would need to be known - not just knowledge of propensity scores, but knowledge of many other spatial and temporal characteristics of the social world. Even granting the logic of the critique in Aronow et al. (2021), its practical importance is a question of the contexts under study. We argue that we should not be satisfied by appeals to intuition or experience about the complexity of "naturally occurring" propensity score functions. Instead, we call for more empirical metascience to begin to characterize this complexity. We apply this logic to the case of recommender systems as a demonstration of the weakness of allowing statisticians' intuitions to serve in place of metascientific data. This may be, as Aronow et al. (2021) claim, one of the "few free lunches in statistics"-but like many of the free lunches consumed by statisticians, it is only available to those working at a handful of large tech firms. Rather than implicitly deciding what is "enough" based on statistical applications the social world has determined to be most profitable, we are argue that practicing statisticians should explicitly engage with questions like "for what?" and "for whom?" in order to adequately answer the question of "enough?"
我们对阿罗诺等人(2021年)的观点做出了批判性回应。他们认为随机对照试验(RCT)“足矣”,而观察性研究中的非参数识别则不然。我们首先探究“足矣”意味着什么,认为这从根本上来说是一个关于统计工作与相关制度流程(此处指学术同行评审)之间关系的社会学论断,而非能从统计学逻辑内部做出判定的内容。为了更全面地理解“足矣”,我们概述了所有需要知晓的内容——不仅要了解倾向得分,还要了解社会世界的许多其他时空特征。即便认可阿罗诺等人(2021年)批判的逻辑,其实际重要性仍是一个所研究背景的问题。我们认为,对于“自然发生”的倾向得分函数的复杂性,不应仅诉诸直觉或经验就感到满足。相反,我们呼吁开展更多实证元科学研究来刻画这种复杂性。我们将这种逻辑应用于推荐系统的案例,以证明让统计学家的直觉取代元科学数据的弱点。这或许如阿罗诺等人(2021年)所宣称的,是“统计学中为数不多的免费午餐”之一——但就像统计学家享用的许多免费午餐一样,只有少数几家大型科技公司的工作人员才能得到。我们主张,执业统计学家不应基于社会世界认定为最有利可图的统计应用来隐含地判定什么是“足矣”,而应明确地探讨诸如“为了什么?”和“为了谁?”这样的问题,以便充分回答“足矣?”这个问题。