Suppr超能文献

使用基础模型估计工资差距。

Estimating wage disparities using foundation models.

作者信息

Vafa Keyon, Athey Susan, Blei David M

机构信息

Harvard Data Science Initiative, Harvard University, Cambridge, MA 02138.

Graduate School of Business, Stanford University, Stanford, CA 94305.

出版信息

Proc Natl Acad Sci U S A. 2025 Jun 3;122(22):e2427298122. doi: 10.1073/pnas.2427298122. Epub 2025 May 30.

Abstract

The rise of foundation models marks a paradigm shift in machine learning: instead of training specialized models from scratch, foundation models are trained on massive datasets before being adjusted or fine-tuned to make predictions on smaller datasets. Initially developed for text, foundation models have also excelled at making predictions about social science data. However, while many estimation problems in the social sciences use prediction as an intermediate step, they ultimately require different criteria for success. In this paper, we develop methods for fine-tuning foundation models to perform these estimation problems. We first characterize an omitted variable bias that can arise when a foundation model is fine-tuned in the standard way: to minimize predictive error. We then provide a set of conditions for fine-tuning under which estimates derived from a foundation model are [Formula: see text]-consistent. Based on this theory, we develop fine-tuning algorithms that empirically mitigate this omitted variable bias. To demonstrate our ideas, we study gender wage gap estimation. Classical methods for estimating the adjusted wage gap employ simple predictive models of wages, which can induce omitted variable bias because they condition on coarse summaries of career history. Instead, we use a custom-built foundation model, capturing a richer representation of career history. Using data from the Panel Study of Income Dynamics, we find that career history explains more of the gender wage gap than standard econometric models can measure, and we identify elements of career history that are omitted by standard models but are important for explaining the gap.

摘要

基础模型的兴起标志着机器学习领域的范式转变

基础模型并非从头开始训练专门的模型,而是先在海量数据集上进行训练,然后再进行调整或微调,以便在较小的数据集上进行预测。基础模型最初是为文本开发的,在对社会科学数据进行预测方面也表现出色。然而,虽然社会科学中的许多估计问题将预测作为中间步骤,但它们最终需要不同的成功标准。在本文中,我们开发了微调基础模型以执行这些估计问题的方法。我们首先刻画了在以标准方式微调基础模型(即最小化预测误差)时可能出现的遗漏变量偏差。然后,我们提供了一组微调条件,在这些条件下,从基础模型得出的估计是[公式:见正文]一致的。基于这一理论,我们开发了经验性减轻这种遗漏变量偏差的微调算法。为了证明我们的想法,我们研究了性别工资差距估计。估计调整后工资差距的经典方法采用简单的工资预测模型,这可能会导致遗漏变量偏差,因为它们以职业历史的粗略汇总为条件。相反,我们使用定制的基础模型,以捕捉更丰富的职业历史表示。利用收入动态面板研究的数据,我们发现职业历史对性别工资差距的解释比标准计量经济模型所能衡量的更多,并且我们确定了标准模型遗漏但对解释差距很重要的职业历史要素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1014/12146757/ae3db07a5e04/pnas.2427298122fig01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验