• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大语言模型从历史油井记录中提取信息。

Information extraction from historical well records using a large language model.

作者信息

Ma Zhiwei, Santos Javier E, Lackey Greg, Viswanathan Hari, O'Malley Daniel

机构信息

Earth & Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, 87544, USA.

Geological and Environmental Systems Directorate, National Energy Technology Laboratory, Pittsburgh, PA, 15236, USA.

出版信息

Sci Rep. 2024 Dec 30;14(1):31702. doi: 10.1038/s41598-024-81846-5.

DOI:10.1038/s41598-024-81846-5
PMID:39738349
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11685759/
Abstract

To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a new computational approach for rapidly and cost-effectively characterizing these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test it on a dataset of 160 well documents. The developed workflow achieves an overall accuracy of 100%, accounting for both text conversion and LLM analysis when applied to clean, PDF-based reports. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, because it reduces labor and increases automation. Additionally, more detailed prompting leads to improved information extraction, and LLMs with more parameters typically perform better. Given that a vast amount of geoscientific information is locked up in old documents, this work demonstrates that recent breakthroughs in LLMs allow us to access and utilize this information more effectively.

摘要

为降低废弃油井(废弃的石油和天然气井)带来的环境风险和影响,首先必须找到并封堵这些油井。鉴于油井数量众多,通过人工阅读和数字化历史文档中的信息是不可行的。在此,我们提出一种新的计算方法,用于快速且经济高效地描述这些油井的特征。具体而言,我们利用大语言模型(LLMs)的先进能力,从废弃油井的历史记录中提取包括油井位置和深度在内的重要信息。在本文中,我们展示了一种基于开源Llama 2模型的信息提取工作流程,并在包含160份油井文档的数据集上对其进行了测试。当应用于基于PDF的清晰报告时,所开发的工作流程在文本转换和大语言模型分析方面的总体准确率达到了100%。然而,对于基于非结构化图像的油井记录,其准确率降至70%。该工作流程相较于人工数字化具有显著优势,因为它减少了人力并提高了自动化程度。此外,更详细的提示会带来更好的信息提取效果,参数更多的大语言模型通常表现更佳。鉴于大量地球科学信息被封存于旧文档中,这项工作表明大语言模型的最新突破使我们能够更有效地获取和利用这些信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/fe1bfd362842/41598_2024_81846_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/b032e53fd032/41598_2024_81846_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/381f85e6b152/41598_2024_81846_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/861445f4056a/41598_2024_81846_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/02b517e68e4f/41598_2024_81846_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/ef826ae7c4d6/41598_2024_81846_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/fe1bfd362842/41598_2024_81846_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/b032e53fd032/41598_2024_81846_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/381f85e6b152/41598_2024_81846_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/861445f4056a/41598_2024_81846_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/02b517e68e4f/41598_2024_81846_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/ef826ae7c4d6/41598_2024_81846_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1531/11685759/fe1bfd362842/41598_2024_81846_Fig6_HTML.jpg

相似文献

1
Information extraction from historical well records using a large language model.使用大语言模型从历史油井记录中提取信息。
Sci Rep. 2024 Dec 30;14(1):31702. doi: 10.1038/s41598-024-81846-5.
2
LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models.LLM-AIx:一种基于隐私保护大语言模型从非结构化医学文本中提取信息的开源管道。
medRxiv. 2024 Sep 3:2024.09.02.24312917. doi: 10.1101/2024.09.02.24312917.
3
Documented Orphaned Oil and Gas Wells Across the United States.美国有记录的废弃油气井。
Environ Sci Technol. 2022 Oct 18;56(20):14228-14236. doi: 10.1021/acs.est.2c03268. Epub 2022 Sep 26.
4
Role of Model Size and Prompting Strategies in Extracting Labels from Free-Text Radiology Reports with Open-Source Large Language Models.模型规模和提示策略在使用开源大语言模型从自由文本放射学报告中提取标签方面的作用。
J Imaging Inform Med. 2025 May 5. doi: 10.1007/s10278-025-01505-7.
5
Scalable information extraction from free text electronic health records using large language models.使用大语言模型从自由文本电子健康记录中进行可扩展的信息提取。
BMC Med Res Methodol. 2025 Jan 28;25(1):23. doi: 10.1186/s12874-025-02470-z.
6
Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation.用于从非结构化和半结构化电子健康记录中提取数据的大语言模型:多模型性能评估
BMJ Health Care Inform. 2025 Jan 19;32(1):e101139. doi: 10.1136/bmjhci-2024-101139.
7
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
10
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study.零样本临床自然语言处理中大型语言模型提示策略的实证评估:算法开发与验证研究
JMIR Med Inform. 2024 Apr 8;12:e55318. doi: 10.2196/55318.

引用本文的文献

1
Large Language Models can extract morphological data from taxonomic descriptions, but their stochastic nature makes automation challenging: a test on Australian Asteraceae.大型语言模型可以从分类描述中提取形态学数据,但其随机性使得自动化具有挑战性:对澳大利亚菊科植物的一项测试。
PhytoKeys. 2025 Aug 19;261:189-210. doi: 10.3897/phytokeys.261.158396. eCollection 2025.

本文引用的文献

1
Unlocking Solutions: Innovative Approaches to Identifying and Mitigating the Environmental Impacts of Undocumented Orphan Wells in the United States.解锁解决方案:美国无证废弃油井环境影响识别和缓解的创新方法。
Environ Sci Technol. 2024 Nov 5;58(44):19584-19594. doi: 10.1021/acs.est.4c02069. Epub 2024 Sep 29.
2
Improving deep learning performance for predicting large-scale geological [Formula: see text] sequestration modeling through feature coarsening.通过特征粗化提高深度学习在预测大规模地质二氧化碳封存建模方面的性能。
Sci Rep. 2022 Nov 30;12(1):20667. doi: 10.1038/s41598-022-24774-6.
3
Documented Orphaned Oil and Gas Wells Across the United States.
美国有记录的废弃油气井。
Environ Sci Technol. 2022 Oct 18;56(20):14228-14236. doi: 10.1021/acs.est.2c03268. Epub 2022 Sep 26.
4
A machine learning framework for rapid forecasting and history matching in unconventional reservoirs.一种用于非常规油藏快速预测和历史拟合的机器学习框架。
Sci Rep. 2021 Nov 5;11(1):21730. doi: 10.1038/s41598-021-01023-w.
5
Decommissioning Orphaned and Abandoned Oil and Gas Wells: New Estimates and Cost Drivers.废弃闲置油气井的退役处置:新估算与成本驱动因素。
Environ Sci Technol. 2021 Aug 3;55(15):10224-10230. doi: 10.1021/acs.est.1c02234. Epub 2021 Jul 14.