Tekumalla Ramya, Banda Juan M
Georgia State University, Atlanta, GA 30303, USA.
Genomics Inform. 2020 Jun;18(2):e16. doi: 10.5808/GI.2020.18.2.e16. Epub 2020 Jun 15.
There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code or data for replicating their studies. With minimal exceptions, the few that do, place the burden on the researcher to figure out how to fetch the data, how to best format their data, and how to create automatic and manual annotations on the acquired data. In order to address this pressing issue, we introduce the Social Media Mining Toolkit (SMMT), a suite of tools aimed to encapsulate the cumbersome details of acquiring, preprocessing, annotating and standardizing social media data. The purpose of our toolkit is for researchers to focus on answering research questions, and not the technical aspects of using social media data. By using a standard toolkit, researchers will be able to acquire, use, and release data in a consistent way that is transparent for everybody using the toolkit, hence, simplifying research reproducibility and accessibility in the social media domain.
在生物医学领域,利用社交媒体数据进行研究的受欢迎程度急剧上升。仅在PubMed上,自2014年以来就有近2500篇出版物条目涉及分析来自Twitter和Reddit的社交媒体数据。然而,这些作品中的绝大多数都不共享其代码或数据以供他人复制其研究。除了极少数例外情况,那些共享代码和数据的作品,也将获取数据的方式、如何最好地格式化数据以及如何对获取的数据进行自动和手动注释等问题的解决负担留给了研究人员。为了解决这个紧迫的问题,我们推出了社交媒体挖掘工具包(SMMT),这是一套旨在封装获取、预处理、注释和标准化社交媒体数据的繁琐细节的工具。我们工具包的目的是让研究人员专注于回答研究问题,而不是使用社交媒体数据的技术方面。通过使用标准工具包,研究人员将能够以一致的方式获取、使用和发布数据,这种方式对于使用该工具包的每个人来说都是透明的,从而简化社交媒体领域研究的可重复性和可及性。