• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用 spaCy、Nominatim 和 Google Maps 提高推文文本位置推断的地理编码精度。对数据选择影响的比较分析。

Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection.

机构信息

Department of Digital and Analytical Sciences, University of Salzburg, Salzburg, Austria.

Centre for Geographic Analysis, Harvard University, Cambridge, MA, United States of America.

出版信息

PLoS One. 2023 Mar 15;18(3):e0282942. doi: 10.1371/journal.pone.0282942. eCollection 2023.

DOI:10.1371/journal.pone.0282942
PMID:36921000
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10016707/
Abstract

Twitter location inference methods are developed with the purpose of increasing the percentage of geotagged tweets by inferring locations on a non-geotagged dataset. For validation of proposed approaches, these location inference methods are developed on a fully geotagged dataset on which the attached Global Navigation Satellite System coordinates are used as ground truth data. Whilst a substantial number of location inference methods have been developed to date, questions arise pertaining the generalizability of the developed location inference models on a non-geotagged dataset. This paper proposes a high precision location inference method for inferring tweets' point of origin based on location mentions within the tweet text. We investigate the influence of data selection by comparing the model performance on two datasets. For the first dataset, we use a proportionate sample of tweet sources of a geotagged dataset. For the second dataset, we use a modelled distribution of tweet sources following a non-geotagged dataset. Our results showed that the distribution of tweet sources influences the performance of location inference models. Using the first dataset we outweighed state-of-the-art location extraction models by inferring 61.9%, 86.1% and 92.1% of the extracted locations within 1 km, 10 km and 50 km radius values, respectively. However, using the second dataset our precision values dropped to 45.3%, 73.1% and 81.0% for the same radius values.

摘要

Twitter 位置推断方法是为了通过推断非地理标记数据集上的位置来提高地理标记推文的百分比而开发的。为了验证提出的方法,这些位置推断方法是在完全地理标记的数据上开发的,其中附加的全球导航卫星系统坐标被用作地面真实数据。虽然迄今为止已经开发了大量的位置推断方法,但对于在非地理标记数据上开发的位置推断模型的通用性仍存在疑问。本文提出了一种基于推文文本中位置提及的高精度位置推断方法,用于推断推文的起源点。我们通过比较两个数据集上的模型性能来研究数据选择的影响。对于第一个数据集,我们使用地理标记数据集的推文源的比例样本。对于第二个数据集,我们使用遵循非地理标记数据集的推文源的模型分布。我们的结果表明,推文源的分布会影响位置推断模型的性能。使用第一个数据集,我们通过推断提取位置的 61.9%、86.1%和 92.1%,在 1 公里、10 公里和 50 公里半径值内分别超过了最先进的位置提取模型。然而,使用第二个数据集,我们的精度值分别下降到 45.3%、73.1%和 81.0%,对于相同的半径值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/651b/10016707/e314efe869e1/pone.0282942.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/651b/10016707/4ae62077851f/pone.0282942.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/651b/10016707/0a923ed09a63/pone.0282942.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/651b/10016707/e314efe869e1/pone.0282942.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/651b/10016707/4ae62077851f/pone.0282942.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/651b/10016707/0a923ed09a63/pone.0282942.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/651b/10016707/e314efe869e1/pone.0282942.g003.jpg

相似文献

1
Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection.使用 spaCy、Nominatim 和 Google Maps 提高推文文本位置推断的地理编码精度。对数据选择影响的比较分析。
PLoS One. 2023 Mar 15;18(3):e0282942. doi: 10.1371/journal.pone.0282942. eCollection 2023.
2
Estimating mobility of tourists. New Twitter-based procedure.估算游客的流动性。基于推特的新程序。
Heliyon. 2023 Feb 13;9(2):e13718. doi: 10.1016/j.heliyon.2023.e13718. eCollection 2023 Feb.
3
Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity.利用带有地理标签的推特数据构建国家邻里数据集,用于幸福、饮食和身体活动指标的研究。
JMIR Public Health Surveill. 2016 Oct 17;2(2):e158. doi: 10.2196/publichealth.5869.
4
Trustworthy Health-Related Tweets on Social Media in Saudi Arabia: Tweet Metadata Analysis.沙特阿拉伯社交媒体上与健康相关的可靠推文:推文元数据分析
J Med Internet Res. 2019 Oct 8;21(10):e14731. doi: 10.2196/14731.
5
A Twitter dataset for Monkeypox, May 2022.2022年5月的一个关于猴痘的推特数据集。
Data Brief. 2023 Jun;48:109118. doi: 10.1016/j.dib.2023.109118. Epub 2023 Apr 14.
6
[Who Hits the Mark? A Comparative Study of the Free Geocoding Services of Google and OpenStreetMap].谁命中目标?谷歌与开放街道地图免费地理编码服务的比较研究
Gesundheitswesen. 2015 Sep;77(8-9):e160-5. doi: 10.1055/s-0035-1549939. Epub 2015 Jul 8.
7
Location inference for hidden population with online text analysis.基于在线文本分析的隐匿人群定位推断。
Int J Health Geogr. 2020 Dec 9;19(1):57. doi: 10.1186/s12942-020-00245-x.
8
Understanding the Composition of a Successful Tweet in Urology.理解泌尿外科成功推特点的构成要素。
Eur Urol Focus. 2020 May 15;6(3):450-457. doi: 10.1016/j.euf.2019.08.008. Epub 2019 Aug 28.
9
Examining Tweet Content and Engagement of Canadian Public Health Agencies and Decision Makers During COVID-19: Mixed Methods Analysis.研究 COVID-19 期间加拿大公共卫生机构和决策者的推文内容和参与度:混合方法分析。
J Med Internet Res. 2021 Mar 11;23(3):e24883. doi: 10.2196/24883.
10
A case study of the New York City 2012-2013 influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectives.一项利用2012 - 2013年纽约市流感季节每日地理编码推特数据从时间和时空角度进行的案例研究。
J Med Internet Res. 2014 Oct 20;16(10):e236. doi: 10.2196/jmir.3416.

引用本文的文献

1
How politics affect pandemic forecasting: spatio-temporal early warning capabilities of different geo-social media topics in the context of state-level political leaning.政治如何影响疫情预测:在州级政治倾向背景下不同地理社交媒体话题的时空预警能力
Front Public Health. 2025 Jul 1;13:1618347. doi: 10.3389/fpubh.2025.1618347. eCollection 2025.
2
Scientific production in sexual and reproductive health and rights research according to gender and affiliation: An analysis of publications from 1972 to 2021.按照性别和所属机构分析 1972 年至 2021 年性健康和生殖健康与权利研究的科学产出。
PLoS One. 2024 Jun 26;19(6):e0304659. doi: 10.1371/journal.pone.0304659. eCollection 2024.
3

本文引用的文献

1
Scaling laws in geo-located Twitter data.地理位置标记的 Twitter 数据中的标度律。
PLoS One. 2019 Jul 24;14(7):e0218454. doi: 10.1371/journal.pone.0218454. eCollection 2019.
2
Using publicly visible social media to build detailed forecasts of civil unrest.利用公开可见的社交媒体来构建关于内乱的详细预测。
Secur Inform. 2014;3(1):4. doi: 10.1186/s13388-014-0004-6. Epub 2014 Sep 3.
Addressing bias in preterm birth research: The role of advanced imputation techniques for missing race and ethnicity in perinatal health data.
解决早产研究中的偏倚问题:在围产健康数据中缺失种族和民族信息的高级插补技术的作用。
Ann Epidemiol. 2024 Jun;94:120-126. doi: 10.1016/j.annepidem.2024.05.003. Epub 2024 May 10.
4
A Telemedicine Center Reduces the Comprehensive Carbon Footprint in Primary Care: A Monocenter, Retrospective Study.远程医疗中心降低基层医疗的综合碳足迹:单中心回顾性研究。
J Prim Care Community Health. 2023 Jan-Dec;14:21501319231215020. doi: 10.1177/21501319231215020.