Ma Famin, Altalbawy Farag M A, Patel Pinank, Manjunatha R, Kalia Rishiv, Formanova Shoira, Naveen P Raja, Joshi Kamal Kant, Sinha Aashna, Kandahari Abdolali Yarahmadi, Al-Rubaye Taqi Mohammed Khattab, Alam Mohammad Mahtab
Shangluo University, Shangluo, 726000, Shannxi, China.
Department of Chemistry, University College of Duba, University of Tabuk, Tabuk, Saudi Arabia.
Sci Rep. 2025 Jul 30;15(1):27765. doi: 10.1038/s41598-025-12129-w.
Optimizing oil production in wells employing gas lift systems is a critical challenge due to the complex interplay of operational and reservoir parameters. This study aimed to develop robust predictive models for estimating oil production rates using a comprehensive dataset from oil fields in south-eastern Iraq, leveraging advanced machine learning techniques. The dataset, comprised of 169 rigorously validated samples, includes key features such as basic sediment and water content, choke size, pressures, gas injection characteristics, gas lift valve depth, oil density, and temperature. Input and output variables were normalized and split into training and test sets to ensure fairness and reliability. Multiple machine learning models (Decision Tree, AdaBoost, Random Forest, Ensemble Learning, CNN, SVR, MLP-ANN, and Lasso Regression) were trained and evaluated using 5-fold cross-validation and key statistical metrics (R², MSE, AARE%). The Random Forest model demonstrated superior performance, achieving a test R² of 0.867 and the lowest prediction errors (MSE: 18502 and AARE: 8.76%) for the testing phase, while other models were prone to overfitting or underfitting. Sensitivity analysis and SHAP interpretability methods revealed that basic sediment and water content, choke size, and upstream pressure had the greatest influence on oil output. These findings underscore the importance of both statistical rigor and model interpretability in oil production forecasting and provide actionable insights for optimizing gas lift operations in oil wells.
由于操作参数和油藏参数之间复杂的相互作用,优化采用气举系统的油井产油量是一项严峻挑战。本研究旨在利用伊拉克东南部油田的综合数据集,借助先进的机器学习技术,开发用于估算产油率的稳健预测模型。该数据集由169个经过严格验证的样本组成,包括基本沉积物和含水量、节流阀尺寸、压力、气体注入特性、气举阀深度、油密度和温度等关键特征。对输入和输出变量进行归一化处理,并划分为训练集和测试集,以确保公平性和可靠性。使用5折交叉验证和关键统计指标(R²、MSE、AARE%)对多个机器学习模型(决策树、AdaBoost、随机森林、集成学习、卷积神经网络、支持向量回归、多层感知器人工神经网络和套索回归)进行训练和评估。随机森林模型表现出卓越的性能,在测试阶段实现了0.867的测试R²以及最低的预测误差(MSE:18502,AARE:8.76%),而其他模型则容易出现过拟合或欠拟合。敏感性分析和SHAP可解释性方法表明,基本沉积物和含水量、节流阀尺寸以及上游压力对产油量影响最大。这些发现强调了统计严谨性和模型可解释性在石油产量预测中的重要性,并为优化油井气举作业提供了可操作的见解。