• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

地理解析评估实用指南:地名、命名实体识别与语用学

A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics.

作者信息

Gritta Milan, Pilehvar Mohammad Taher, Collier Nigel

机构信息

Language Technology Lab (LTL), Department of Theoretical and Applied Linguistics (DTAL), University of Cambridge, 9 West Road, Cambridge, CB3 9DP UK.

出版信息

Lang Resour Eval. 2020;54(3):683-712. doi: 10.1007/s10579-019-09475-3. Epub 2019 Sep 19.

DOI:10.1007/s10579-019-09475-3
PMID:32802011
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7406539/
Abstract

Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by the lack of distinction between the , which necessitates new guidelines, a consolidation of metrics and a detailed toponym taxonomy with implications for Named Entity Recognition (NER) and beyond. To address these deficiencies, our manuscript introduces a new framework in three parts. (Part 1) Task Definition: clarified via corpus linguistic analysis proposing a fine-grained . (Part 2) Metrics: discussed and reviewed for a rigorous evaluation including recommendations for NER/Geoparsing practitioners. (Part 3) Evaluation data: shared via a new dataset called to provide test/train examples and enable immediate use of our contributions. In addition to fine-grained Geotagging and Toponym Resolution (Geocoding), this dataset is also suitable for prototyping and evaluating machine learning NLP models.

摘要

到目前为止,地理解析中的实证方法缺乏一个标准的评估框架来描述任务、指标以及用于比较最先进系统的数据。由于缺乏对[此处原文缺失部分内容]之间的区分,评估变得更加不一致,甚至不能代表现实世界的使用情况,这就需要新的指导方针、指标的整合以及一个详细的地名分类法,这对命名实体识别(NER)及其他方面都有影响。为了解决这些不足,我们的论文介绍了一个由三部分组成的新框架。(第一部分)任务定义:通过语料库语言学分析进行澄清,提出了一个细粒度的[此处原文缺失部分内容]。(第二部分)指标:进行了讨论和审查,以进行严格评估,包括为NER/地理解析从业者提供的建议。(第三部分)评估数据:通过一个名为[此处原文缺失数据集名称]的新数据集共享,以提供测试/训练示例,并使我们的贡献能够立即得到应用。除了细粒度的地理标记和地名解析(地理编码)外,这个数据集还适用于机器学习NLP模型的原型设计和评估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/835aab09193f/10579_2019_9475_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/9d5823405ae7/10579_2019_9475_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/869a68d84920/10579_2019_9475_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/4a321fcf3af8/10579_2019_9475_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/a690c22c0b23/10579_2019_9475_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/835aab09193f/10579_2019_9475_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/9d5823405ae7/10579_2019_9475_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/869a68d84920/10579_2019_9475_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/4a321fcf3af8/10579_2019_9475_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/a690c22c0b23/10579_2019_9475_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23bf/7406539/835aab09193f/10579_2019_9475_Fig5_HTML.jpg

相似文献

1
A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics.地理解析评估实用指南:地名、命名实体识别与语用学
Lang Resour Eval. 2020;54(3):683-712. doi: 10.1007/s10579-019-09475-3. Epub 2019 Sep 19.
2
Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation.利用 Twitter 数据监测自然灾害社会动态:基于词嵌入和核密度估计的递归神经网络方法。
Sensors (Basel). 2019 Apr 11;19(7):1746. doi: 10.3390/s19071746.
3
Spatial-temporal characteristics and causes of changes to the county-level administrative toponyms cultural landscape in the eastern plains of China.中国东部平原县级行政地名文化景观的时空特征及其变化原因。
PLoS One. 2019 May 28;14(5):e0217381. doi: 10.1371/journal.pone.0217381. eCollection 2019.
4
Evaluation of clinical named entity recognition methods for Serbian electronic health records.评估塞尔维亚电子健康记录中的临床命名实体识别方法。
Int J Med Inform. 2022 Aug;164:104805. doi: 10.1016/j.ijmedinf.2022.104805. Epub 2022 May 25.
5
Developing named entity recognition algorithms for Uzbek: Dataset insights and implementation.为乌兹别克语开发命名实体识别算法:数据集见解与实现
Data Brief. 2024 Apr 16;54:110413. doi: 10.1016/j.dib.2024.110413. eCollection 2024 Jun.
6
What's missing in geographical parsing?地理解析中缺少了什么?
Lang Resour Eval. 2018;52(2):603-623. doi: 10.1007/s10579-017-9385-8. Epub 2017 Mar 7.
7
Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study.评估医疗保健中的实体识别:实体模型定量研究。
JMIR Med Inform. 2024 Oct 17;12:e59782. doi: 10.2196/59782.
8
FloraNER: A new dataset for species and morphological terms named entity recognition in French botanical text.FloraNER:一个用于法语植物学文本中物种和形态学术语命名实体识别的新数据集。
Data Brief. 2024 Aug 10;56:110824. doi: 10.1016/j.dib.2024.110824. eCollection 2024 Oct.
9
Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and Validation Study.利用合成医疗保健数据借助大语言模型进行命名实体识别:开发与验证研究。
J Med Internet Res. 2025 Mar 18;27:e66279. doi: 10.2196/66279.
10
DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect.DarNERcorp:摩洛哥方言中的一个带注释的命名实体识别数据集。
Data Brief. 2023 May 12;48:109234. doi: 10.1016/j.dib.2023.109234. eCollection 2023 Jun.

引用本文的文献

1
Methodological proposal to identify the nationality of Twitter users through random-forests.通过随机森林识别 Twitter 用户国籍的方法学建议。
PLoS One. 2023 Jan 31;18(1):e0277858. doi: 10.1371/journal.pone.0277858. eCollection 2023.
2
Automated Travel History Extraction From Clinical Notes for Informing the Detection of Emergent Infectious Disease Events: Algorithm Development and Validation.从临床记录中自动提取旅行史以用于传染病事件的检测:算法的开发和验证。
JMIR Public Health Surveill. 2021 Mar 24;7(3):e26719. doi: 10.2196/26719.
3
Risk assessment strategies for early detection and prediction of infectious disease outbreaks associated with climate change.

本文引用的文献

1
We need to talk about standard splits.我们需要谈谈标准分割。
Proc Conf Assoc Comput Linguist Meet. 2019 Jul;2019:2786-2791. doi: 10.18653/v1/p19-1267.
2
What's missing in geographical parsing?地理解析中缺少了什么?
Lang Resour Eval. 2018;52(2):603-623. doi: 10.1007/s10579-017-9385-8. Epub 2017 Mar 7.
3
Spatiotemporal analysis of tropical disease research combining Europe PMC and affiliation mapping web services.结合欧洲生物医学中心(Europe PMC)和机构映射网络服务的热带病研究时空分析。
与气候变化相关的传染病暴发早期检测和预测的风险评估策略。
Can Commun Dis Rep. 2019 May 2;45(5):119-126. doi: 10.14745/ccdr.v45i05a02.
Trop Med Health. 2017 Oct 26;45:33. doi: 10.1186/s41182-017-0073-6. eCollection 2017.
4
Global hotspots and correlates of emerging zoonotic diseases.全球新发人畜共患病的热点和关联因素。
Nat Commun. 2017 Oct 24;8(1):1124. doi: 10.1038/s41467-017-00923-8.
5
What does research reproducibility mean?研究的可重复性是什么意思?
Sci Transl Med. 2016 Jun 1;8(341):341ps12. doi: 10.1126/scitranslmed.aaf5027.
6
Use of the Edinburgh geoparser for georeferencing digitized historical collections.利用爱丁堡地理解析器对数字化历史馆藏进行地理定位。
Philos Trans A Math Phys Eng Sci. 2010 Aug 28;368(1925):3875-89. doi: 10.1098/rsta.2010.0149.
7
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms.用于比较监督分类学习算法的近似统计检验
Neural Comput. 1998 Sep 15;10(7):1895-1923. doi: 10.1162/089976698300017197.