Jason B. Colditz, Kar-Hai Chu, Chandler R. Larkin, and Brian A. Primack are with the Center for Research on Media, Technology, and Health, University of Pittsburgh School of Medicine, Pittsburgh, PA. Sherry L. Emery is with NORC, University of Chicago, Chicago, IL. A. Everette James is with the Health Policy Institute, University of Pittsburgh Graduate School of Public Health, Pittsburgh. Joel Welling is with the Pittsburgh Supercomputing Center, Pittsburgh.
Am J Public Health. 2018 Aug;108(8):1009-1014. doi: 10.2105/AJPH.2018.304497. Epub 2018 Jun 21.
There is growing interest in conducting public health research using data from social media. In particular, Twitter "infoveillance" has demonstrated utility across health contexts. However, rigorous and reproducible methodologies for using Twitter data in public health are not yet well articulated, particularly those related to content analysis, which is a highly popular approach. In 2014, we gathered an interdisciplinary team of health science researchers, computer scientists, and methodologists to begin implementing an open-source framework for real-time infoveillance of Twitter health messages (RITHM). Through this process, we documented common challenges and novel solutions to inform future work in real-time Twitter data collection and subsequent human coding. The RITHM framework allows researchers and practitioners to use well-planned and reproducible processes in retrieving, storing, filtering, subsampling, and formatting data for health topics of interest. Further considerations for human coding of Twitter data include coder selection and training, data representation, codebook development and refinement, and monitoring coding accuracy and productivity. We illustrate methodological considerations through practical examples from formative work related to hookah tobacco smoking, and we reference essential methods literature related to understanding and using Twitter data.
人们越来越感兴趣地利用社交媒体数据开展公共卫生研究。特别是,Twitter 的“infoveillance”在各种健康环境中都显示出了实用性。然而,使用 Twitter 数据进行公共卫生研究的严谨且可重复的方法还没有得到很好的阐述,特别是与内容分析相关的方法,内容分析是一种非常流行的方法。2014 年,我们召集了一个跨学科的健康科学研究人员、计算机科学家和方法学家团队,开始实施一个用于实时 Twitter 健康信息监测的开源框架(RITHM)。通过这个过程,我们记录了常见的挑战和新颖的解决方案,为未来的实时 Twitter 数据收集和后续人工编码工作提供了信息。RITHM 框架允许研究人员和实践者在检索、存储、过滤、抽样和格式化感兴趣的健康主题数据时使用计划周密且可重复的流程。人工对 Twitter 数据进行编码的进一步考虑因素包括编码员的选择和培训、数据表示、编码手册的开发和完善,以及监测编码准确性和效率。我们通过与水烟吸烟相关的形成性工作的实际例子来说明方法学方面的考虑因素,并参考了理解和使用 Twitter 数据的相关重要方法文献。