• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

尼莫:从PubMed机构附属信息中提取并规范组织名称。

NEMO: Extraction and normalization of organization names from PubMed affiliations.

作者信息

Jonnalagadda Siddhartha Reddy, Topham Philip

机构信息

Lnx Research LLC, 750 The City Drive Suite 490, Orange, CA 92868, USA.

出版信息

J Biomed Discov Collab. 2010 Oct 4;5:50-75.

PMID:20922666
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2990275/
Abstract

BACKGROUND

Today, there are more than 18 million articles related to biomedical research indexed in MEDLINE, and information derived from them could be used effectively to save the great amount of time and resources spent by government agencies in understanding the scientific landscape, including key opinion leaders and centers of excellence. Associating biomedical articles with organization names could significantly benefit the pharmaceutical marketing industry, health care funding agencies and public health officials and be useful for other scientists in normalizing author names, automatically creating citations, indexing articles and identifying potential resources or collaborators. Large amount of extracted information helps in disambiguating organization names using machine-learning algorithms.

RESULTS

We propose NEMO, a system for extracting organization names in the affiliation and normalizing them to a canonical organization name. Our parsing process involves multi-layered rule matching with multiple dictionaries. The system achieves more than 98% f-score in extracting organization names. Our process of normalization that involves clustering based on local sequence alignment metrics and local learning based on finding connected components. A high precision was also observed in normalization.

CONCLUSION

NEMO is the missing link in associating each biomedical paper and its authors to an organization name in its canonical form and the Geopolitical location of the organization. This research could potentially help in analyzing large social networks of organizations for landscaping a particular topic, improving performance of author disambiguation, adding weak links in the co-author network of authors, augmenting NLM's MARS system for correcting errors in OCR output of affiliation field, and automatically indexing the PubMed citations with the normalized organization name and country. Our system is available as a graphical user interface available for download along with this paper.

摘要

背景

如今,MEDLINE中索引的与生物医学研究相关的文章超过1800万篇,从中获取的信息可有效用于节省政府机构在了解科学格局(包括关键意见领袖和卓越中心)方面所花费的大量时间和资源。将生物医学文章与组织名称相关联可显著造福制药营销行业、医疗保健资助机构和公共卫生官员,并且对其他科学家在规范作者姓名、自动创建引用、为文章编制索引以及识别潜在资源或合作者方面也很有用。大量提取的信息有助于使用机器学习算法消除组织名称的歧义。

结果

我们提出了NEMO,这是一个用于提取 affiliations 中的组织名称并将其规范化为标准组织名称的系统。我们的解析过程涉及与多个词典进行多层规则匹配。该系统在提取组织名称方面的F值超过98%。我们的规范化过程包括基于局部序列比对指标进行聚类以及基于查找连通分量进行局部学习。在规范化方面也观察到了高精度。

结论

NEMO是将每篇生物医学论文及其作者与标准形式的组织名称以及该组织的地缘政治位置相关联的缺失环节。这项研究可能有助于分析组织的大型社交网络以勾勒特定主题、提高作者消歧的性能、在作者的共同作者网络中添加弱链接、增强NLM的MARS系统以纠正affiliation字段的OCR输出中的错误,以及使用规范化的组织名称和国家自动为PubMed引用编制索引。我们的系统作为图形用户界面提供,可随本文一起下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/18d427565952/Jbiomeddiscovcollab-05-e04-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/9519011e950d/Jbiomeddiscovcollab-05-e04-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/39770b3b4c1f/Jbiomeddiscovcollab-05-e04-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/d86520280000/Jbiomeddiscovcollab-05-e04-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/1f9bafcb382e/Jbiomeddiscovcollab-05-e04-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/339eb3aea027/Jbiomeddiscovcollab-05-e04-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/0fd2d78f6fa9/Jbiomeddiscovcollab-05-e04-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/bcc683fcc66e/Jbiomeddiscovcollab-05-e04-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/fd7d8b359869/Jbiomeddiscovcollab-05-e04-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/c44d19c29c95/Jbiomeddiscovcollab-05-e04-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/ee9b6dd17a63/Jbiomeddiscovcollab-05-e04-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/89c93c3fd7dd/Jbiomeddiscovcollab-05-e04-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/f6809c74dd36/Jbiomeddiscovcollab-05-e04-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/18d427565952/Jbiomeddiscovcollab-05-e04-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/9519011e950d/Jbiomeddiscovcollab-05-e04-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/39770b3b4c1f/Jbiomeddiscovcollab-05-e04-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/d86520280000/Jbiomeddiscovcollab-05-e04-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/1f9bafcb382e/Jbiomeddiscovcollab-05-e04-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/339eb3aea027/Jbiomeddiscovcollab-05-e04-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/0fd2d78f6fa9/Jbiomeddiscovcollab-05-e04-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/bcc683fcc66e/Jbiomeddiscovcollab-05-e04-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/fd7d8b359869/Jbiomeddiscovcollab-05-e04-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/c44d19c29c95/Jbiomeddiscovcollab-05-e04-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/ee9b6dd17a63/Jbiomeddiscovcollab-05-e04-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/89c93c3fd7dd/Jbiomeddiscovcollab-05-e04-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/f6809c74dd36/Jbiomeddiscovcollab-05-e04-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e90/2990275/18d427565952/Jbiomeddiscovcollab-05-e04-g013.jpg

相似文献

1
NEMO: Extraction and normalization of organization names from PubMed affiliations.尼莫:从PubMed机构附属信息中提取并规范组织名称。
J Biomed Discov Collab. 2010 Oct 4;5:50-75.
2
A method for named entity normalization in biomedical articles: application to diseases and plants.一种生物医学文章中命名实体规范化的方法:应用于疾病和植物
BMC Bioinformatics. 2017 Oct 13;18(1):451. doi: 10.1186/s12859-017-1857-8.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Integrating various resources for gene name normalization.整合各种资源进行基因名称标准化。
PLoS One. 2012;7(9):e43558. doi: 10.1371/journal.pone.0043558. Epub 2012 Sep 12.
5
NetiNeti: discovery of scientific names from text using machine learning methods.内提内提:使用机器学习方法从文本中发现科学名称。
BMC Bioinformatics. 2012 Aug 22;13:211. doi: 10.1186/1471-2105-13-211.
6
Building a PubMed knowledge graph.构建 PubMed 知识图谱。
Sci Data. 2020 Jun 26;7(1):205. doi: 10.1038/s41597-020-0543-2.
7
Bridging the gap in author names: building an enhanced author name dataset for biomedical literature system.弥合作者姓名差异:构建生物医学文献系统的增强型作者姓名数据集。
J Am Med Inform Assoc. 2024 Aug 1;31(8):1648-1656. doi: 10.1093/jamia/ocae127.
8
Author Name Disambiguation for PubMed.PubMed的作者姓名消歧
J Assoc Inf Sci Technol. 2014 Apr;65(4):765-781. doi: 10.1002/asi.23063. Epub 2013 Nov 21.
9
Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称:一种机器学习方法。
Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.
10
Author Name Disambiguation in MEDLINE.医学在线数据库(MEDLINE)中的作者姓名消歧
ACM Trans Knowl Discov Data. 2009 Jul 1;3(3). doi: 10.1145/1552303.1552304.

引用本文的文献

1
Using an Ontology-Based Approach to Handle Author Affiliations in a Large Biomedical Citation Database.使用基于本体的方法处理大型生物医学引文数据库中的作者所属机构。
Stud Health Technol Inform. 2017;245:1338.
2
MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide.MapAffil:一种用于将作者所属机构字符串映射到全球城市及其地理编码的文献工具。
Dlib Mag. 2015 Nov-Dec;21(11-12). doi: 10.1045/november2015-torvik.
3
An automated approach for ranking journals to help in clinician decision support.

本文引用的文献

1
Overview of BioCreative II gene normalization.生物创意II基因标准化概述。
Genome Biol. 2008;9 Suppl 2(Suppl 2):S3. doi: 10.1186/gb-2008-9-s2-s3. Epub 2008 Sep 1.
2
Inter-species normalization of gene mentions with GNAT.使用GNAT对基因提及进行种间标准化。
Bioinformatics. 2008 Aug 15;24(16):i126-132. doi: 10.1093/bioinformatics/btn299.
3
The strength of co-authorship in gene name disambiguation.共同作者在基因名称消歧中的作用强度。
一种用于对期刊进行排名以辅助临床医生决策支持的自动化方法。
AMIA Annu Symp Proc. 2014 Nov 14;2014:757-66. eCollection 2014.
4
Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules.临床记录中的共指分析:一种带有交替回指解析模块的多遍筛选方法。
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):867-74. doi: 10.1136/amiajnl-2011-000766. Epub 2012 Jun 16.
5
Discovering opinion leaders for medical topics using news articles.利用新闻文章发现医学主题的意见领袖。
J Biomed Semantics. 2012 Mar 15;3(1):2. doi: 10.1186/2041-1480-3-2.
6
Determining student-faculty ratios and faculty scholarship levels/rates.确定师生比例以及教师学术成就水平/比例。
Am J Pharm Educ. 2010 Dec 15;74(10):193b; author reply 193c.
BMC Bioinformatics. 2008 Jan 29;9:69. doi: 10.1186/1471-2105-9-69.
4
An automatic method to generate domain-specific investigator networks using PubMed abstracts.一种利用PubMed摘要生成特定领域研究者网络的自动方法。
BMC Med Inform Decis Mak. 2007 Jun 20;7:17. doi: 10.1186/1472-6947-7-17.
5
Automated recognition of malignancy mentions in biomedical literature.生物医学文献中恶性肿瘤提及的自动识别。
BMC Bioinformatics. 2006 Nov 7;7:492. doi: 10.1186/1471-2105-7-492.
6
A probabilistic similarity metric for Medline records: a model for author name disambiguation.一种用于Medline记录的概率相似性度量:作者姓名消歧模型。
AMIA Annu Symp Proc. 2003;2003:1033.
7
Identification of common molecular subsequences.常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
8
A general method applicable to the search for similarities in the amino acid sequence of two proteins.一种适用于寻找两种蛋白质氨基酸序列相似性的通用方法。
J Mol Biol. 1970 Mar;48(3):443-53. doi: 10.1016/0022-2836(70)90057-4.