Wadhwa Somin, Amir Silvio, Wallace Byron C
Northeastern University.
Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:15566-15589. doi: 10.18653/v1/2023.acl-long.868.
Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a task, linearizing relations between entities as target strings to be generated conditioned on the input. Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision. We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. Under this refined evaluation, we find that: (1) prompting with GPT-3 achieves near SOTA performance, i.e., roughly equivalent to existing models; (2) Flan-T5 is not as capable in the few-shot setting, but supervising and fine-tuning it with Chain-of-Thought (CoT) style explanations (generated via GPT-3) yields SOTA results. We release this model as a new baseline for RE tasks.
关系抽取(RE)是自然语言处理(NLP)的核心任务,即从文本中推断实体之间的语义关系。标准的监督式关系抽取技术需要训练模块对构成实体跨度的词元进行标记,然后预测它们之间的关系。相反,最近的工作将该问题视为一项任务,将实体之间的关系线性化为根据输入生成的目标字符串。在这里,我们拓展了这种方法的极限,使用了比先前工作中考虑的更大的语言模型(GPT-3和Flan-T5 large),并在不同监督水平下评估它们在标准关系抽取任务上的性能。我们通过进行人工评估来解决评估关系抽取生成方法所固有的问题,而不是依赖于精确匹配。在这种精细的评估下,我们发现:(1)使用GPT-3进行提示可实现接近最优的性能,即大致等同于现有模型;(2)Flan-T5在少样本设置中能力较弱,但使用思维链(CoT)风格的解释(通过GPT-3生成)对其进行监督和微调可产生最优结果。我们将此模型作为关系抽取任务的新基线发布。