Department of Health Management and Policy, McCourt School of Public Policy, Georgetown University, Washington, DC, United States.
Department of Computer Science, Massive Data Institute, Georgetown University, Washington, DC, United States.
J Med Internet Res. 2023 Jun 13;25:e45187. doi: 10.2196/45187.
Gun violence research is characterized by a dearth of data available for measuring key constructs. Social media data may offer a potential opportunity to significantly reduce that gap, but developing methods for deriving firearms-related constructs from social media data and understanding the measurement properties of such constructs are critical precursors to their broader use.
This study aimed to develop a machine learning model of individual-level firearm ownership from social media data and assess the criterion validity of a state-level construct of ownership.
We used survey responses to questions on firearm ownership linked with Twitter data to construct different machine learning models of firearm ownership. We externally validated these models using a set of firearm-related tweets hand-curated from the Twitter Streaming application programming interface and created state-level ownership estimates using a sample of users collected from the Twitter Decahose application programming interface. We assessed the criterion validity of state-level estimates by comparing their geographic variance to benchmark measures from the RAND State-Level Firearm Ownership Database.
We found that the logistic regression classifier for gun ownership performs the best with an accuracy of 0.7 and an F-score of 0.69. We also found a strong positive correlation between Twitter-based estimates of gun ownership and benchmark ownership estimates. For states meeting a threshold requirement of a minimum of 100 labeled Twitter users, the Pearson and Spearman correlation coefficients are 0.63 (P<.001) and 0.64 (P<.001), respectively.
Our success in developing a machine learning model of firearm ownership at the individual level with limited training data as well as a state-level construct that achieves a high level of criterion validity underscores the potential of social media data for advancing gun violence research. The ownership construct is an important precursor for understanding the representativeness of and variability in outcomes that have been the focus of social media analyses in gun violence research to date, such as attitudes, opinions, policy stances, sentiments, and perspectives on gun violence and gun policy. The high criterion validity we achieved for state-level gun ownership suggests that social media data may be a useful complement to traditional sources of information on gun ownership such as survey and administrative data, especially for identifying early signals of changes in geographic patterns of gun ownership, given the immediacy of the availability of social media data, their continuous generation, and their responsiveness. These results also lend support to the possibility that other computationally derived, social media-based constructs may be derivable, which could lend additional insight into firearm behaviors that are currently not well understood. More work is needed to develop other firearms-related constructs and to assess their measurement properties.
枪支暴力研究的特点是可用数据匮乏,难以衡量关键指标。社交媒体数据可能为缩小这一差距提供潜在机会,但开发从社交媒体数据中提取枪支相关指标的方法,并了解这些指标的测量特性,是广泛应用的关键前提。
本研究旨在开发一种基于机器学习的个体枪支拥有量模型,并评估州级枪支拥有量的构念的效标效度。
我们使用与枪支拥有情况相关的推特数据,链接到推特调查数据,构建不同的枪支拥有量机器学习模型。我们使用从推特流媒体应用程序接口中手工筛选的一组与枪支相关的推文,以及从推特大数据集应用程序接口中收集的用户样本,对这些模型进行了外部验证,并使用这些样本创建了州级枪支拥有量的估计值。我们通过比较基于推特的枪支拥有量估计值与 RAND 州级枪支拥有数据库的基准衡量标准,评估了州级估计值的效标效度。
我们发现,用于枪支拥有的逻辑回归分类器表现最佳,准确率为 0.7,F 分数为 0.69。我们还发现,基于推特的枪支拥有量估计值与基准拥有量估计值之间存在很强的正相关关系。对于达到最低 100 个标记推特用户的阈值要求的州,皮尔逊和斯皮尔曼相关系数分别为 0.63(P<.001)和 0.64(P<.001)。
我们成功地在有限的训练数据基础上开发了个体枪支拥有量的机器学习模型,以及达到高效标效度的州级构念,这凸显了社交媒体数据在推进枪支暴力研究方面的潜力。该拥有量构念是理解迄今为止社交媒体分析中枪支暴力研究的代表性和结果变异性的重要前提,例如态度、观点、政策立场、情绪以及对枪支暴力和枪支政策的看法。我们在州级枪支拥有量方面取得的高效标效度表明,社交媒体数据可能是对枪支拥有等传统信息源的有用补充,尤其是在识别枪支拥有地理模式变化的早期信号方面,因为社交媒体数据的即时可用性、持续生成性和响应性。这些结果也支持了其他基于计算的、社交媒体衍生的构念可能是可推导的可能性,这可能为目前尚未充分理解的枪支行为提供更多的见解。还需要进一步开发其他与枪支相关的构念,并评估其测量特性。