• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深度神经网络和远程监督在地理位置提及提取中的应用。

Deep neural networks and distant supervision for geographic location mention extraction.

机构信息

Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA.

Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, Tempe, AZ, USA.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i565-i573. doi: 10.1093/bioinformatics/bty273.

DOI:10.1093/bioinformatics/bty273
PMID:29950020
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6022665/
Abstract

MOTIVATION

Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeographers to scan the journal articles associated with the records to identify more localized geographic areas. To automate this process, we present a named entity recognizer (NER) for detecting locations in biomedical literature. We built the NER using a deep feedforward neural network to determine whether a given token is a toponym or not. To overcome the limited human annotated data available for training, we use distant supervision techniques to generate additional samples to train our NER.

RESULTS

Our NER achieves an F1-score of 0.910 and significantly outperforms the previous state-of-the-art system. Using the additional data generated through distant supervision further boosts the performance of the NER achieving an F1-score of 0.927. The NER presented in this research improves over previous systems significantly. Our experiments also demonstrate the NER's capability to embed external features to further boost the system's performance. We believe that the same methodology can be applied for recognizing similar biomedical entities in scientific literature.

摘要

动机

病毒系统发生地理学家依赖于病毒的 DNA 序列以及在 GenBank 等公共序列数据库中发现的受感染宿主的位置来对病毒传播进行建模。然而,GenBank 记录中的位置通常仅在国家或州一级,可能需要系统发生地理学家扫描与记录相关的期刊文章以确定更本地化的地理区域。为了自动化这个过程,我们提出了一种用于在生物医学文献中检测位置的命名实体识别器 (NER)。我们使用深度前馈神经网络构建了 NER,以确定给定标记是否是地名。为了克服可用于训练的有限人工标注数据,我们使用远程监督技术生成额外的样本来训练我们的 NER。

结果

我们的 NER 达到了 0.910 的 F1 分数,明显优于以前的最先进系统。通过远程监督生成的额外数据进一步提高了 NER 的性能,达到了 0.927 的 F1 分数。本研究中提出的 NER 显著优于以前的系统。我们的实验还证明了 NER 嵌入外部特征以进一步提高系统性能的能力。我们相信,相同的方法可以应用于识别科学文献中的类似生物医学实体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8127/6022665/d369ed3140c3/bty273f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8127/6022665/32210c5113b5/bty273f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8127/6022665/d369ed3140c3/bty273f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8127/6022665/32210c5113b5/bty273f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8127/6022665/d369ed3140c3/bty273f2.jpg

相似文献

1
Deep neural networks and distant supervision for geographic location mention extraction.深度神经网络和远程监督在地理位置提及提取中的应用。
Bioinformatics. 2018 Jul 1;34(13):i565-i573. doi: 10.1093/bioinformatics/bty273.
2
Knowledge-driven geospatial location resolution for phylogeographic models of virus migration.用于病毒迁移系统发育地理学模型的知识驱动型地理空间定位解析
Bioinformatics. 2015 Jun 15;31(12):i348-56. doi: 10.1093/bioinformatics/btv259.
3
Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature.用于生物医学文献中地理位置提取的双向递归神经网络模型
Pac Symp Biocomput. 2019;24:100-111.
4
Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods.基于集成深度学习方法的电子健康记录中的药物不良反应和药物关系提取。
J Am Med Inform Assoc. 2020 Jan 1;27(1):39-46. doi: 10.1093/jamia/ocz101.
5
Clinical Named Entity Recognition Using Deep Learning Models.使用深度学习模型的临床命名实体识别
AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819. eCollection 2017.
6
CollaboNet: collaboration of deep neural networks for biomedical named entity recognition.CollaboNet:用于生物医学命名实体识别的深度神经网络协作。
BMC Bioinformatics. 2019 May 29;20(Suppl 10):249. doi: 10.1186/s12859-019-2813-6.
7
Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods.使用监督式和远监督式方法从文献中提取地理位置用于病毒系统地理学研究。
AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:114-122. eCollection 2017.
8
Knowledge-Driven Drug-Use NamedEntity Recognition with Distant Supervision.基于知识驱动的药物使用命名实体识别的远程监督。
Stud Health Technol Inform. 2022 Jun 6;290:140-144. doi: 10.3233/SHTI220048.
9
Improving deep learning method for biomedical named entity recognition by using entity definition information.利用实体定义信息改进生物医学命名实体识别的深度学习方法。
BMC Bioinformatics. 2021 Dec 17;22(Suppl 1):600. doi: 10.1186/s12859-021-04236-y.
10
DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.DTranNER:基于深度学习的标签-标签转换模型的生物医学命名实体识别。
BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.

引用本文的文献

1
A comparison of few-shot and traditional named entity recognition models for medical text.医学文本的少样本与传统命名实体识别模型比较
Proc (IEEE Int Conf Healthc Inform). 2022 Jun;2022:84-89. doi: 10.1109/ichi54592.2022.00024. Epub 2022 Sep 8.
2
Sequence Matching between Hemagglutinin and Neuraminidase through Sequence Analysis Using Machine Learning.通过机器学习进行序列分析的血凝素和神经氨酸酶之间的序列匹配。
Viruses. 2022 Feb 25;14(3):469. doi: 10.3390/v14030469.
3
Automated Travel History Extraction From Clinical Notes for Informing the Detection of Emergent Infectious Disease Events: Algorithm Development and Validation.

本文引用的文献

1
Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods.使用监督式和远监督式方法从文献中提取地理位置用于病毒系统地理学研究。
AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:114-122. eCollection 2017.
2
Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.基于深度神经网络的中文临床文本命名实体识别
Stud Health Technol Inform. 2015;216:624-8.
3
Knowledge-driven geospatial location resolution for phylogeographic models of virus migration.用于病毒迁移系统发育地理学模型的知识驱动型地理空间定位解析
从临床记录中自动提取旅行史以用于传染病事件的检测:算法的开发和验证。
JMIR Public Health Surveill. 2021 Mar 24;7(3):e26719. doi: 10.2196/26719.
4
Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty.回归根源:用离散特征不确定性评估贝叶斯系统地理学模型。
Infect Genet Evol. 2020 Nov;85:104501. doi: 10.1016/j.meegid.2020.104501. Epub 2020 Aug 13.
5
GeoBoost2: a natural languageprocessing pipeline for GenBank metadata enrichment for virus phylogeography.GeoBoost2:一种用于 GenBank 元数据病毒系统地理学丰富化的自然语言处理管道。
Bioinformatics. 2020 Dec 22;36(20):5120-5121. doi: 10.1093/bioinformatics/btaa647.
6
CoCoScore: context-aware co-occurrence scoring for text mining applications using distant supervision.CoCoScore:用于基于远程监督的文本挖掘应用的上下文感知共现评分法
Bioinformatics. 2020 Jan 1;36(1):264-271. doi: 10.1093/bioinformatics/btz490.
7
Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature.用于生物医学文献中地理位置提取的双向递归神经网络模型
Pac Symp Biocomput. 2019;24:100-111.
Bioinformatics. 2015 Jun 15;31(12):i348-56. doi: 10.1093/bioinformatics/btv259.
4
Natural language processing methods for enhancing geographic metadata for phylogeography of zoonotic viruses.用于增强人畜共患病毒系统发育地理学地理元数据的自然语言处理方法。
AMIA Jt Summits Transl Sci Proc. 2014 Apr 7;2014:102-11. eCollection 2014.
5
Enhancing phylogeography by improving geographical information from GenBank.从 GenBank 中改进地理信息以增强系统发生地理学。
J Biomed Inform. 2011 Dec;44 Suppl 1(Suppl 1):S44-S47. doi: 10.1016/j.jbi.2011.06.005. Epub 2011 Jun 24.
6
EnvMine: a text-mining system for the automatic extraction of contextual information.EnvMine:一个文本挖掘系统,用于自动提取上下文信息。
BMC Bioinformatics. 2010 Jun 1;11:294. doi: 10.1186/1471-2105-11-294.
7
Note on the sampling error of the difference between correlated proportions or percentages.关于相关比例或百分比差异的抽样误差说明。
Psychometrika. 1947 Jun;12(2):153-7. doi: 10.1007/BF02295996.
8
Various criteria in the evaluation of biomedical named entity recognition.生物医学命名实体识别评估中的各种标准。
BMC Bioinformatics. 2006 Feb 24;7:92. doi: 10.1186/1471-2105-7-92.