Suppr超能文献

使用GPT-3总结、简化和综合医学证据(效果各异)。

Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success).

作者信息

Shaib Chantal, Li Millicent L, Joseph Sebastian, Marshall Iain J, Li Junyi Jessy, Wallace Byron C

机构信息

Northeastern University.

The University of Texas at Austin.

出版信息

Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:1387-1407. doi: 10.18653/v1/2023.acl-short.119.

Abstract

Large language models, particularly GPT-3, are able to produce high quality summaries of general domain news articles in few- and zero-shot settings However, it is unclear if such models are similarly capable in more specialized, high-stakes domains such as biomedicine. In this paper, we enlist domain experts (individuals with medical training) to evaluate summaries of biomedical articles generated by GPT-3, given zero supervision. We consider both single- and multi-document settings. In the former, GPT-3 is tasked with generating regular and plain-language summaries of articles describing randomized controlled trials; in the latter, we assess the degree to which GPT-3 is able to evidence reported across a collection of articles. We design an annotation scheme for evaluating model outputs, with an emphasis on assessing the factual accuracy of generated summaries. We find that while GPT-3 is able to summarize and simplify single biomedical articles faithfully, it struggles to provide accurate aggregations of findings over multiple documents. We release all data and annotations used in this work.

摘要

大型语言模型,尤其是GPT-3,能够在少样本和零样本设置下生成高质量的通用领域新闻文章摘要。然而,尚不清楚此类模型在生物医学等更专业、高风险的领域是否同样适用。在本文中,我们邀请领域专家(接受过医学培训的人员)在零监督的情况下评估GPT-3生成的生物医学文章摘要。我们考虑了单文档和多文档设置。在前一种情况下,GPT-3的任务是生成描述随机对照试验的文章的常规和通俗易懂的摘要;在后一种情况下,我们评估GPT-3能够在多大程度上整合一组文章中报告的证据。我们设计了一种注释方案来评估模型输出,重点是评估生成摘要的事实准确性。我们发现,虽然GPT-3能够如实地总结和简化单篇生物医学文章,但它难以对多篇文档的研究结果进行准确汇总。我们发布了这项工作中使用的所有数据和注释。

相似文献

引用本文的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验