Schultheiß Sebastian, Lewandowski Dirk, von Mach Sonja, Yagci Nurce
Department of Information, Hamburg University of Applied Sciences, Hamburg, Germany.
Department of Computer Science and Applied Cognitive Science, University Duisburg-Essen, Duisburg, Germany.
PeerJ Comput Sci. 2023 Jun 7;9:e1421. doi: 10.7717/peerj-cs.1421. eCollection 2023.
Search engine queries are the starting point for studies in different fields, such as health or political science. These studies usually aim to make statements about social phenomena. However, the queries used in the studies are often created rather unsystematically and do not correspond to actual user behavior. Therefore, the evidential value of the studies must be questioned. We address this problem by developing an approach (query sampler) to sample queries from commercial search engines, using keyword research tools designed to support search engine marketing. This allows us to generate large numbers of queries related to a given topic and derive information on how often each keyword is searched for, that is, the query volume. We empirically test our approach with queries from two published studies, and the results show that the number of queries and total search volume could be considerably expanded. Our approach has a wide range of applications for studies that seek to draw conclusions about social phenomena using search engine queries. The approach can be applied flexibly to different topics and is relatively straightforward to implement, as we provide the code for querying Google Ads API. Limitations are that the approach needs to be tested with a broader range of topics and thoroughly checked for problems with topic drift and the role of close variants provided by keyword research tools.
搜索引擎查询是不同领域研究的起点,如健康或政治学领域。这些研究通常旨在对社会现象做出陈述。然而,研究中使用的查询往往是相当不系统地创建的,与实际用户行为不符。因此,这些研究的证据价值必须受到质疑。我们通过开发一种方法(查询采样器)来解决这个问题,该方法利用旨在支持搜索引擎营销的关键词研究工具从商业搜索引擎中采样查询。这使我们能够生成大量与给定主题相关的查询,并得出每个关键词被搜索的频率信息,即查询量。我们用两项已发表研究中的查询对我们的方法进行了实证测试,结果表明查询数量和总搜索量可以大幅增加。我们的方法对于那些试图利用搜索引擎查询对社会现象得出结论的研究有广泛的应用。该方法可以灵活地应用于不同主题,并且实施起来相对简单,因为我们提供了查询谷歌广告 API 的代码。局限性在于该方法需要用更广泛的主题进行测试,并彻底检查主题漂移问题以及关键词研究工具提供的近似变体的作用。