Ma Zhiwei, Santos Javier E, Lackey Greg, Viswanathan Hari, O'Malley Daniel
Earth & Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, 87544, USA.
Geological and Environmental Systems Directorate, National Energy Technology Laboratory, Pittsburgh, PA, 15236, USA.
Sci Rep. 2024 Dec 30;14(1):31702. doi: 10.1038/s41598-024-81846-5.
To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a new computational approach for rapidly and cost-effectively characterizing these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test it on a dataset of 160 well documents. The developed workflow achieves an overall accuracy of 100%, accounting for both text conversion and LLM analysis when applied to clean, PDF-based reports. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, because it reduces labor and increases automation. Additionally, more detailed prompting leads to improved information extraction, and LLMs with more parameters typically perform better. Given that a vast amount of geoscientific information is locked up in old documents, this work demonstrates that recent breakthroughs in LLMs allow us to access and utilize this information more effectively.
为降低废弃油井(废弃的石油和天然气井)带来的环境风险和影响,首先必须找到并封堵这些油井。鉴于油井数量众多,通过人工阅读和数字化历史文档中的信息是不可行的。在此,我们提出一种新的计算方法,用于快速且经济高效地描述这些油井的特征。具体而言,我们利用大语言模型(LLMs)的先进能力,从废弃油井的历史记录中提取包括油井位置和深度在内的重要信息。在本文中,我们展示了一种基于开源Llama 2模型的信息提取工作流程,并在包含160份油井文档的数据集上对其进行了测试。当应用于基于PDF的清晰报告时,所开发的工作流程在文本转换和大语言模型分析方面的总体准确率达到了100%。然而,对于基于非结构化图像的油井记录,其准确率降至70%。该工作流程相较于人工数字化具有显著优势,因为它减少了人力并提高了自动化程度。此外,更详细的提示会带来更好的信息提取效果,参数更多的大语言模型通常表现更佳。鉴于大量地球科学信息被封存于旧文档中,这项工作表明大语言模型的最新突破使我们能够更有效地获取和利用这些信息。