Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America.
Escuela Superior Politécnica del Litoral, ESPOL, Guayaquil, Ecuador.
PLoS One. 2020 Nov 24;15(11):e0242453. doi: 10.1371/journal.pone.0242453. eCollection 2020.
There is large interest in networked social science experiments for understanding human behavior at-scale. Significant effort is required to perform data analytics on experimental outputs and for computational modeling of custom experiments. Moreover, experiments and modeling are often performed in a cycle, enabling iterative experimental refinement and data modeling to uncover interesting insights and to generate/refute hypotheses about social behaviors. The current practice for social analysts is to develop tailor-made computer programs and analytical scripts for experiments and modeling. This often leads to inefficiencies and duplication of effort. In this work, we propose a pipeline framework to take a significant step towards overcoming these challenges. Our contribution is to describe the design and implementation of a software system to automate many of the steps involved in analyzing social science experimental data, building models to capture the behavior of human subjects, and providing data to test hypotheses. The proposed pipeline framework consists of formal models, formal algorithms, and theoretical models as the basis for the design and implementation. We propose a formal data model, such that if an experiment can be described in terms of this model, then our pipeline software can be used to analyze data efficiently. The merits of the proposed pipeline framework is elaborated by several case studies of networked social science experiments.
对于理解大规模人类行为,网络社会科学实验具有很大的吸引力。需要大量的努力来对实验输出进行数据分析,并对定制实验进行计算建模。此外,实验和建模通常是循环进行的,这使得迭代实验改进和数据建模能够揭示有趣的见解,并产生/反驳关于社会行为的假设。当前,社会分析师的做法是为实验和建模开发定制的计算机程序和分析脚本。这往往导致效率低下和重复工作。在这项工作中,我们提出了一个流水线框架,以朝着克服这些挑战迈出重要的一步。我们的贡献在于描述一个软件系统的设计和实现,该系统可以自动化分析社会科学实验数据、构建模型以捕捉人类主体行为以及提供数据来检验假设所涉及的许多步骤。所提出的流水线框架由形式模型、形式算法和理论模型作为设计和实现的基础。我们提出了一个正式的数据模型,使得如果一个实验可以用这个模型来描述,那么我们的流水线软件就可以用来有效地分析数据。通过对网络社会科学实验的几个案例研究,详细阐述了所提出的流水线框架的优点。