School of Social and Behavioral Sciences, Nanjing University, Nanjing, 210023, China.
Department of Sociology and Anthropology, National University of Singapore, Singapore, 117573, Singapore.
Sci Rep. 2023 Apr 4;13(1):5533. doi: 10.1038/s41598-023-31846-8.
It is difficult to accurately estimate the incidence rate of intimate partner violence (IPV) using traditional social survey methods because IPV victims are often reluctant to disclose their experiences, leading to an underestimation of the incidence rate. To address this issue, we applied machine learning algorithms to predict the incidence rate of IPV in China based on data from the Third Wave Survey on the Social Status of Women in China (TWSSSCW 2010). Specifically, we examined five unbalanced sample-processing methods and six machine learning algorithms, choosing the random under-sampling ensemble method and the random forest algorithm to impute the missing data. Analysis of the complete data showed that the incidence rates of physical violence, verbal violence, and cold violence were 7.10%, 13.74%, and 21.35%, respectively, which were higher than the incidence rates in the original dataset (4.05%, 11.21%, and 17.95%, respectively). The robustness of our findings was further confirmed by analysis using different training sets. Overall, this study demonstrates that better tools need to be developed to accurately estimate the incidence rates of IPV. It also serves as a useful guide for future research that imputes missing data using machine learning.
使用传统的社会调查方法很难准确估计亲密伴侣暴力(IPV)的发生率,因为 IPV 受害者往往不愿意透露他们的经历,导致对发生率的低估。为了解决这个问题,我们应用机器学习算法根据中国第三次妇女社会地位调查(TWSSSCW 2010)的数据来预测中国 IPV 的发生率。具体来说,我们检查了五种不平衡样本处理方法和六种机器学习算法,选择随机欠采样集成方法和随机森林算法来插补缺失数据。对完整数据的分析表明,身体暴力、言语暴力和冷暴力的发生率分别为 7.10%、13.74%和 21.35%,高于原始数据集的发生率(分别为 4.05%、11.21%和 17.95%)。使用不同的训练集进行分析进一步证实了我们发现的稳健性。总的来说,这项研究表明需要开发更好的工具来准确估计 IPV 的发生率。它也为未来使用机器学习插补缺失数据的研究提供了有用的指导。