Zhou Yancong, Chen Wenyue, Sun Xiaochen, Yang Dandan
School of Information Engineering, Tianjin University of Commerce, Tianjin, China.
College of Management and Economics, Tianjin University, Tianjin, China.
PLoS One. 2023 Oct 11;18(10):e0292466. doi: 10.1371/journal.pone.0292466. eCollection 2023.
Analyzing customers' characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.
基于机器学习算法分析客户特征并进行客户流失预警,有助于企业提供针对性的营销策略和个性化服务,节省大量运营成本。基于Python语言,对90万条电信客户个人特征和历史行为数据集进行数据清洗、过采样、数据标准化等预处理操作。选择合适的模型参数构建反向传播神经网络(BPNN)。引入随机森林(RF)和Adaboost这两种经典的集成学习模型,并提出以RF为基学习器的Adaboost双集成学习模型。分别利用这四种模型以及另外四种经典机器学习模型——决策树、朴素贝叶斯、K近邻(KNN)、支持向量机(SVM)来分析客户流失数据。结果表明,这四种模型在召回率、精确率、F1分数等指标方面表现较好,且RF-Adaboost双集成模型性能最佳。其中,BPNN、RF、Adaboost和RF-Adaboost双集成模型在正样本上的召回率分别为79%、90%、89%、93%,精确率分别为97%、99%、98%、99%,F1分数分别为87%、95%、94%、96%。RF-Adaboost双集成模型性能最佳,这三个指标分别比对照高10%、1%和6%。客户流失预测结果为电信公司对预流失客户采取适当的挽留策略、降低客户流失提供了有力的数据支持。