用于改进渗透率预测的综合输入模型和机器学习方法。

Comprehensive input models and machine learning methods to improve permeability prediction.

作者信息

Davari Mohammad Ali, Kadkhodaie Ali

机构信息

Department of Petroleum Engineering, Imam Khomeini International University (IKIU), Qazvin, Iran.

Earth Sciences Department, Faculty of Natural Science, University of Tabriz, Tabriz, Iran.

出版信息

Sci Rep. 2024 Sep 27;14(1):22087. doi: 10.1038/s41598-024-73846-2.

DOI:10.1038/s41598-024-73846-2

PMID:39333687

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11437116/

Abstract

This study investigates the use of machine learning techniques and the proper selection of input data to estimate permeability in geosciences, using six types of input logs: gamma ray (GR), resistivity (RT), effective porosity (PHIE), density (RHO), sonic (DT), and compensated neutron porosity (NPHI). A total of 57 models were constructed using combinations of these logs and tested using five machine learning methods: Extreme Learning Machine (ELM), Random Forest (RF), Gradient Boosting (GB), K-Nearest Neighbor (KNN), and Multilayer Perceptron (MLP). This approach produced 285 unique permeability predictions. RF had the highest correlation coefficient (0.925) and average error (0.196), indicating a precision-correlation trade-off. The ELM approach had the lowest average error, 0.083, and a correlation value of 0.871. Testing on a blind well revealed that the GB and RF approaches were highly effective in predicting permeability, with R² values of 0.92 and 0.90, respectively, even in untested settings. The findings emphasize the need of using appropriate machine learning algorithms and input data to improve model accuracy and reliability.

摘要

本研究调查了机器学习技术的应用以及输入数据的恰当选择，以便在地球科学中估算渗透率，使用了六种类型的输入测井数据：自然伽马（GR）、电阻率（RT）、有效孔隙度（PHIE）、密度（RHO）、声波时差（DT）和补偿中子孔隙度（NPHI）。使用这些测井数据的组合构建了总共57个模型，并使用五种机器学习方法进行测试：极限学习机（ELM）、随机森林（RF）、梯度提升（GB）、K近邻（KNN）和多层感知器（MLP）。这种方法产生了285个独特的渗透率预测结果。随机森林的相关系数最高（0.925），平均误差为（0.196），这表明存在精度与相关性的权衡。极限学习机方法的平均误差最低，为0.083，相关值为0.871。在一口未知井的测试表明，梯度提升和随机森林方法在预测渗透率方面非常有效，R²值分别为0.92和0.90，即使在未经测试的情况下也是如此。研究结果强调了使用适当的机器学习算法和输入数据来提高模型准确性和可靠性的必要性。