Suppr超能文献

基于人工智能的数据提取用于下一代风险评估:微调大型语言模型是否值得?

Artificial intelligence-based data extraction for next generation risk assessment: Is fine-tuning of a large language model worth the effort?

机构信息

Department of Pesticides Safety, German Federal Institute for Risk Assessment, Max-Dohrn-Straße 8-10, Berlin 10589, Germany.

Division of Toxicology, Wageningen University & Research, Stippeneng 4, Wageningen 6708 WE, the Netherlands; Wageningen Food Safety Research, Wageningen University & Research, Akkermaalsbos 2 6708WB Wageningen, The Netherlands.

出版信息

Toxicology. 2024 Nov;508:153933. doi: 10.1016/j.tox.2024.153933. Epub 2024 Aug 23.

Abstract

To underpin scientific evaluations of chemical risks, agencies such as the European Food Safety Authority (EFSA) heavily rely on the outcome of systematic reviews, which currently require extensive manual effort. One specific challenge constitutes the meaningful use of vast amounts of valuable data from new approach methodologies (NAMs) which are mostly reported in an unstructured way in the scientific literature. In the EFSA-initiated project 'AI4NAMS', the potential of large language models (LLMs) was explored. Models from the GPT family, where GPT refers to Generative Pre-trained Transformer, were used for searching, extracting, and integrating data from scientific publications for NAM-based risk assessment. A case study on bisphenol A (BPA), a substance of very high concern due to its adverse effects on human health, focused on the structured extraction of information on test systems measuring biologic activities of BPA. Fine-tuning of a GPT-3 model (Curie base model) for extraction tasks was tested and the performance of the fine-tuned model was compared to the performance of a ready-to-use model (text-davinci-002). To update findings from the AI4NAMS project and to check for technical progress, the fine-tuning exercise was repeated and a newer ready-to-use model (text-davinci-003) served as comparison. In both cases, the fine-tuned Curie model was found to be superior to the ready-to-use model. Performance improvement was also obvious between text-davinci-002 and the newer text-davinci-003. Our findings demonstrate how fine-tuning and the swift general technical development improve model performance and contribute to the growing number of investigations on the use of AI in scientific and regulatory tasks.

摘要

为了支持对化学风险的科学评估,欧洲食品安全局(EFSA)等机构严重依赖系统评价的结果,而系统评价目前需要大量的人工努力。一个特定的挑战是如何有意义地使用大量来自新方法学(NAMs)的宝贵数据,这些数据在科学文献中主要以非结构化的方式报告。在 EFSA 发起的“AI4NAMS”项目中,探索了大型语言模型(LLMs)的潜力。来自 GPT 系列的模型被用于搜索、提取和整合来自科学出版物的数据,用于基于 NAM 的风险评估。对双酚 A(BPA)的案例研究是一个非常关注的物质,因为它对人类健康有不良影响,该研究集中于结构化提取测试系统信息,以测量 BPA 的生物活性。针对提取任务对 GPT-3 模型(Curie 基础模型)进行了微调测试,并比较了微调模型和即用型模型(text-davinci-002)的性能。为了更新 AI4NAMS 项目的发现并检查技术进展,重复了微调练习,并使用较新的即用型模型(text-davinci-003)进行了比较。在这两种情况下,微调的 Curie 模型都优于即用型模型。text-davinci-002 和较新的 text-davinci-003 之间的性能改进也很明显。我们的研究结果表明,微调以及快速的技术发展如何提高模型性能,并有助于越来越多的关于在科学和监管任务中使用人工智能的研究。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验