Suppr超能文献

学会解释是一个很好的生物医学小样本学习者。

Learning to explain is a good biomedical few-shot learner.

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.

出版信息

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae589.

Abstract

MOTIVATION

Significant progress has been achieved in biomedical text mining using deep learning methods, which rely heavily on large amounts of high-quality data annotated by human experts. However, the reality is that obtaining high-quality annotated data is extremely challenging due to data scarcity (e.g. rare or new diseases), data privacy and security concerns, and the high cost of data annotation. Additionally, nearly all researches focus on predicting labels without providing corresponding explanations. Therefore, in this paper, we investigate a more realistic scenario, biomedical few-shot learning, and explore the impact of interpretability on biomedical few-shot learning.

RESULTS

We present LetEx-Learning to explain-a novel multi-task generative approach that leverages reasoning explanations from large language models (LLMs) to enhance the inductive reasoning ability of few-shot learning. Our approach includes (1) collecting high-quality explanations by devising a suite of complete workflow based on LLMs through CoT prompting and self-training strategies, (2) converting various biomedical NLP tasks into a text-to-text generation task in a unified manner, where collected explanations serve as additional supervision between text-label pairs by multi-task training. Experiments are conducted on three few-shot settings across six biomedical benchmark datasets. The results show that learning to explain improves the performances of diverse biomedical NLP tasks in low-resource scenario, outperforming strong baseline models significantly by up to 6.41%. Notably, the proposed method makes the 220M LetEx perform superior reasoning explanation ability against LLMs.

AVAILABILITY AND IMPLEMENTATION

Our source code and data are available at https://github.com/cpmss521/LetEx.

摘要

动机

深度学习方法在生物医学文本挖掘方面取得了重大进展,这些方法严重依赖于大量经过人类专家注释的高质量数据。然而,现实情况是,由于数据稀缺(例如罕见或新疾病)、数据隐私和安全问题以及数据注释成本高昂,获取高质量注释数据极具挑战性。此外,几乎所有研究都侧重于预测标签,而没有提供相应的解释。因此,在本文中,我们研究了一个更现实的场景,即生物医学少样本学习,并探讨了可解释性对生物医学少样本学习的影响。

结果

我们提出了 LetEx-Learning 来进行解释——一种新颖的多任务生成方法,利用大型语言模型(LLMs)中的推理解释来增强少样本学习的归纳推理能力。我们的方法包括:(1)通过基于 LLM 的 Cot 提示和自我训练策略设计一整套完整的工作流程来收集高质量的解释;(2)将各种生物医学 NLP 任务统一转化为文本到文本生成任务,其中收集的解释通过多任务训练作为文本-标签对之间的额外监督。在六个生物医学基准数据集的三个少样本设置上进行了实验。结果表明,学习解释可以提高低资源场景下各种生物医学 NLP 任务的性能,显著优于强大的基线模型,最高可达 6.41%。值得注意的是,所提出的方法使 220M 的 LetEx 具有优于 LLM 的强大推理解释能力。

可用性和实现

我们的源代码和数据可在 https://github.com/cpmss521/LetEx 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6d9/11483110/b11ea1317f31/btae589f7.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验