Wu Chao, Wang Guolong, Hu Simon, Liu Yue, Mi Hong, Zhou Ye, Guo Yi-Ke, Song Tongtong
School of Public Affairs, Zhejiang University, Hangzhou, Zhejiang, China.
School of Civil and Environmental Engineering, ZJU-UIUC Institute, Zhejiang University, Haining, China.
PLoS One. 2020 Nov 20;15(11):e0242483. doi: 10.1371/journal.pone.0242483. eCollection 2020.
For decades, traditional correlation analysis and regression models have been used in social science research. However, the development of machine learning algorithms makes it possible to apply machine learning techniques for social science research and social issues, which may outperform standard regression methods in some cases. Under the circumstances, this article proposes a methodological workflow for data analysis by machine learning techniques that have the possibility to be widely applied in social issues. Specifically, the workflow tries to uncover the natural mechanisms behind the social issues through a data-driven perspective from feature selection to model building. The advantage of data-driven techniques in feature selection is that the workflow can be built without so much restriction of related knowledge and theory in social science. The advantage of using machine learning techniques in modelling is to uncover non-linear and complex relationships behind social issues. The main purpose of our methodological workflow is to find important fields relevant to the target and provide appropriate predictions. However, to explain the result still needs theory and knowledge from social science. In this paper, we trained a methodological workflow with left-behind children as the social issue case, and all steps and full results are included.
几十年来,传统的相关分析和回归模型一直被用于社会科学研究。然而,机器学习算法的发展使得将机器学习技术应用于社会科学研究和社会问题成为可能,在某些情况下,这些技术可能比标准回归方法表现得更好。在这种情况下,本文提出了一种使用机器学习技术进行数据分析的方法流程,这种流程有可能在社会问题中得到广泛应用。具体而言,该流程试图从数据驱动的角度,从特征选择到模型构建,揭示社会问题背后的自然机制。数据驱动技术在特征选择方面的优势在于,构建流程时无需受到社会科学中过多相关知识和理论的限制。在建模中使用机器学习技术的优势在于能够揭示社会问题背后的非线性和复杂关系。我们方法流程的主要目的是找到与目标相关的重要领域并提供适当的预测。然而,要解释结果仍需要社会科学的理论和知识。在本文中,我们以留守儿童作为社会问题案例训练了一个方法流程,并包含了所有步骤和完整结果。