• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

针对跨学科环境挑战微调大语言模型。

Fine-tuning large language models for interdisciplinary environmental challenges.

作者信息

Zhang Yuanxin, Lin Sijie, Xiong Yaxin, Li Nan, Zhong Lijin, Ding Longzhen, Hu Qing

机构信息

State Key Laboratory of Soil Pollution Control and Safety, Southern University of Science and Technology, Shenzhen, 518055, China.

School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China.

出版信息

Environ Sci Ecotechnol. 2025 Jul 28;27:100608. doi: 10.1016/j.ese.2025.100608. eCollection 2025 Sep.

DOI:10.1016/j.ese.2025.100608
PMID:40799362
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12340390/
Abstract

Large language models (LLMs) are revolutionizing specialized fields by enabling advanced reasoning and data synthesis. Environmental science, however, poses unique hurdles due to its interdisciplinary scope, specialized jargon, and heterogeneous data from climate dynamics to ecosystem management. Despite progress in subdomains like hydrology and climate modeling, no integrated framework exists to generate high-quality, domain-specific training data or evaluate LLM performance across the discipline. Here we introduce a unified pipeline to address this gap. It comprises EnvInstruct, a multi-agent system for prompt generation; ChatEnv, a balanced 100-million-token instruction dataset spanning five core themes (climate change, ecosystems, water resources, soil management, and renewable energy); and EnvBench, a 4998-item benchmark assessing analysis, reasoning, calculation, and description tasks. Applying this pipeline, we fine-tune an 8-billion-parameter model, EnvGPT, which achieves 92.06 ± 1.85 % accuracy on the independent EnviroExam benchmark-surpassing the parameter-matched LLaMA-3.1-8B baseline by ∼8 percentage points and rivaling the closed-source GPT-4o-mini and the 9-fold larger Qwen2.5-72B. On EnvBench, EnvGPT earns top LLM-assigned scores for relevance (4.87 ± 0.11), factuality (4.70 ± 0.15), completeness (4.38 ± 0.19), and style (4.85 ± 0.10), outperforming baselines in every category. This study reveals how targeted supervised fine-tuning on curated domain data can propel compact LLMs to state-of-the-art levels, bridging gaps in environmental applications. By openly releasing EnvGPT, ChatEnv, and EnvBench, our work establishes a reproducible foundation for accelerating LLM adoption in environmental research, policy, and practice, with potential extensions to multimodal and real-time tools.

摘要

大语言模型(LLMs)通过实现高级推理和数据合成,正在彻底改变各个专业领域。然而,环境科学因其跨学科范围、专业术语以及从气候动态到生态系统管理的异构数据,带来了独特的障碍。尽管在水文学和气候建模等子领域取得了进展,但目前还没有一个综合框架来生成高质量的、特定领域的训练数据,或评估整个学科的大语言模型性能。在此,我们引入一个统一的流程来弥补这一差距。它包括用于提示生成的多智能体系统EnvInstruct;ChatEnv,一个包含五个核心主题(气候变化、生态系统、水资源、土壤管理和可再生能源)的平衡的1亿令牌指令数据集;以及EnvBench,一个包含4998个项目的基准,用于评估分析、推理、计算和描述任务。应用这个流程,我们对一个80亿参数的模型EnvGPT进行了微调,该模型在独立的EnvroExam基准测试中达到了92.06 ± 1.85%的准确率,比参数匹配的LLaMA - 3.1 - 8B基线高出约8个百分点,与闭源的GPT - 4o - mini以及大9倍的Qwen2.5 - 72B相媲美。在EnvBench上,EnvGPT在相关性(4.87 ± 0.11)、事实性(4.70 ± 0.15)、完整性(4.38 ± 0.19)和风格(4.85 ± 0.10)方面获得了大语言模型给出的最高分数,在每个类别中都优于基线。这项研究揭示了对精心策划的领域数据进行有针对性的监督微调如何能够将紧凑的大语言模型提升到先进水平,弥合环境应用中的差距。通过公开发布EnvGPT、ChatEnv和EnvBench,我们的工作为加速大语言模型在环境研究、政策和实践中的应用建立了一个可重复的基础,并有可能扩展到多模态和实时工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/6c763a9e9ac5/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/e848662f5446/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/c30f36b33d2d/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/fabf9fd3c7f0/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/9b81d81c23b2/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/1ee7f141e9a8/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/5aeea3b2cc57/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/27e151a5acdf/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/bbb38a10ecee/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/6c763a9e9ac5/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/e848662f5446/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/c30f36b33d2d/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/fabf9fd3c7f0/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/9b81d81c23b2/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/1ee7f141e9a8/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/5aeea3b2cc57/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/27e151a5acdf/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/bbb38a10ecee/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac0/12340390/6c763a9e9ac5/gr8.jpg

相似文献

1
Fine-tuning large language models for interdisciplinary environmental challenges.针对跨学科环境挑战微调大语言模型。
Environ Sci Ecotechnol. 2025 Jul 28;27:100608. doi: 10.1016/j.ese.2025.100608. eCollection 2025 Sep.
2
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
3
Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA.在印度使用专门的大语言模型进行月经健康教育:MenstLLaMA的开发与评估研究
J Med Internet Res. 2025 Jul 16;27:e71977. doi: 10.2196/71977.
4
Interdisciplinary Development and Fine-Tuning of CARDIO a LLM for Cardiovascular Health Education in HIV Care: A Tutorial.用于艾滋病护理中心血管健康教育的CARDIO大语言模型的跨学科开发与优化:教程
J Med Internet Res. 2025 Aug 11. doi: 10.2196/77053.
5
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.在医疗保健中应用大语言模型:以临床医生为重点的回顾与交互式指南
J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.
6
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
7
Fine-tuning medical language models for enhanced long-contextual understanding and domain expertise.微调医学语言模型以增强长上下文理解和领域专业知识。
Quant Imaging Med Surg. 2025 Jun 6;15(6):5450-5462. doi: 10.21037/qims-2024-2655. Epub 2025 Jun 3.
8
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
9
Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.评估和提高大语言模型中的辨证思维能力:方法开发研究
JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.
10
Classifying Patient Complaints Using Artificial Intelligence-Powered Large Language Models: Cross-Sectional Study.使用人工智能驱动的大语言模型对患者投诉进行分类:横断面研究
J Med Internet Res. 2025 Aug 6;27:e74231. doi: 10.2196/74231.

本文引用的文献

1
Evaluation and mitigation of cognitive biases in medical language models.医学语言模型中认知偏差的评估与缓解
NPJ Digit Med. 2024 Oct 21;7(1):295. doi: 10.1038/s41746-024-01283-6.
2
Suitability of GPT-4o as an evaluator of cardiopulmonary resuscitation skills examinations.GPT-4o 作为心肺复苏技能考试评估者的适用性。
Resuscitation. 2024 Nov;204:110404. doi: 10.1016/j.resuscitation.2024.110404. Epub 2024 Sep 28.
3
Generative Artificial Intelligence: A New Engine for Advancing Environmental Science and Engineering.生成式人工智能:推动环境科学与工程发展的新引擎。
Environ Sci Technol. 2024 Oct 8;58(40):17524-17528. doi: 10.1021/acs.est.4c07216. Epub 2024 Sep 29.
4
Long-term spatiotemporal mapping in lacustrine environment by remote sensing:Review with case study, challenges, and future directions.湖泊环境中遥感的长期时空映射:案例研究回顾、挑战和未来方向。
Water Res. 2024 Dec 1;267:122457. doi: 10.1016/j.watres.2024.122457. Epub 2024 Sep 16.
5
GPT-4o is more like a real person: potentials in surgical oncology.GPT-4o更像一个真实的人:外科肿瘤学中的潜力。
Int J Surg. 2025 Jan 1;111(1):1654-1655. doi: 10.1097/JS9.0000000000001898.
6
AI Personal Assistants and Sustainability: Risks and Opportunities.人工智能个人助理与可持续性:风险与机遇
Environ Sci Technol. 2024 Apr 30;58(17):7237-7239. doi: 10.1021/acs.est.4c03300. Epub 2024 Apr 18.
7
Large language models help computer programs to evolve.大型语言模型有助于计算机程序的发展。
Nature. 2024 Jan;625(7995):452-453. doi: 10.1038/d41586-023-03998-0.
8
Diagnostic Accuracy of a Large Language Model in Pediatric Case Studies.大型语言模型在儿科病例研究中的诊断准确性。
JAMA Pediatr. 2024 Mar 1;178(3):313-315. doi: 10.1001/jamapediatrics.2023.5750.
9
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
10
Global climate-change trends detected in indicators of ocean ecology.海洋生态指标中检测到的全球气候变化趋势。
Nature. 2023 Jul;619(7970):551-554. doi: 10.1038/s41586-023-06321-z. Epub 2023 Jul 12.