• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于多源文献特征融合的机构名称规范化深度学习模型:算法开发研究

A Deep Learning Model for the Normalization of Institution Names by Multisource Literature Feature Fusion: Algorithm Development Study.

作者信息

Chen Yifei, Li Xiaoying, Li Aihua, Li Yongjie, Yang Xuemei, Lin Ziluo, Yu Shirui, Tang Xiaoli

机构信息

Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China.

出版信息

JMIR Form Res. 2023 Aug 18;7:e47434. doi: 10.2196/47434.

DOI:10.2196/47434
PMID:37594844
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10474509/
Abstract

BACKGROUND

The normalization of institution names is of great importance for literature retrieval, statistics of academic achievements, and evaluation of the competitiveness of research institutions. Differences in authors' writing habits and spelling mistakes lead to various names of institutions, which affects the analysis of publication data. With the development of deep learning models and the increasing maturity of natural language processing methods, training a deep learning-based institution name normalization model can increase the accuracy of institution name normalization at the semantic level.

OBJECTIVE

This study aimed to train a deep learning-based model for institution name normalization based on the feature fusion of affiliation data from multisource literature, which would realize the normalization of institution name variants with the help of authority files and achieve a high specification accuracy after several rounds of training and optimization.

METHODS

In this study, an institution name normalization-oriented model was trained based on bidirectional encoder representations from transformers (BERT) and other deep learning models, including the institution classification model, institutional hierarchical relation extraction model, and institution matching and merging model. The model was then trained to automatically learn institutional features by pretraining and fine-tuning, and institution names were extracted from the affiliation data of 3 databases to complete the normalization process: Dimensions, Web of Science, and Scopus.

RESULTS

It was found that the trained model could achieve at least 3 functions. First, the model could identify the institution name that is consistent with the authority files and associate the name with the files through the unique institution ID. Second, it could identify the nonstandard institution name variants, such as singular forms, plural changes, and abbreviations, and update the authority files. Third, it could identify the unregistered institutions and add them to the authority files, so that when the institution appeared again, the model could identify and regard it as a registered institution. Moreover, the test results showed that the accuracy of the normalization model reached 93.79%, indicating the promising performance of the model for the normalization of institution names.

CONCLUSIONS

The deep learning-based institution name normalization model trained in this study exhibited high accuracy. Therefore, it could be widely applied in the evaluation of the competitiveness of research institutions, analysis of research fields of institutions, and construction of interinstitutional cooperation networks, among others, showing high application value.

摘要

背景

机构名称的规范化对于文献检索、学术成果统计以及研究机构竞争力评估至关重要。作者写作习惯的差异和拼写错误导致机构名称多种多样,这影响了出版数据的分析。随着深度学习模型的发展和自然语言处理方法的日益成熟,训练基于深度学习的机构名称规范化模型可以提高语义层面机构名称规范化的准确性。

目的

本研究旨在基于多源文献的机构隶属数据特征融合,训练一种基于深度学习的机构名称规范化模型,借助权威文件实现机构名称变体的规范化,并经过多轮训练和优化达到较高的规范准确率。

方法

在本研究中,基于来自变换器的双向编码器表征(BERT)和其他深度学习模型,训练了一个面向机构名称规范化的模型,包括机构分类模型、机构层次关系提取模型和机构匹配与合并模型。然后通过预训练和微调对该模型进行训练,以自动学习机构特征,并从3个数据库(Dimensions、科学引文索引和Scopus)的机构隶属数据中提取机构名称,完成规范化过程。

结果

发现训练后的模型至少可以实现3个功能。首先,该模型可以识别与权威文件一致的机构名称,并通过唯一的机构ID将该名称与文件关联起来。其次,它可以识别非标准的机构名称变体,如单数形式、复数变化和缩写,并更新权威文件。第三,它可以识别未注册的机构并将其添加到权威文件中,这样当该机构再次出现时,模型可以识别并将其视为注册机构。此外,测试结果表明,规范化模型的准确率达到93.79%,表明该模型在机构名称规范化方面具有良好的性能。

结论

本研究训练的基于深度学习的机构名称规范化模型具有较高的准确率。因此,它可广泛应用于研究机构竞争力评估、机构研究领域分析以及机构间合作网络构建等方面,具有较高的应用价值。

相似文献

1
A Deep Learning Model for the Normalization of Institution Names by Multisource Literature Feature Fusion: Algorithm Development Study.一种基于多源文献特征融合的机构名称规范化深度学习模型:算法开发研究
JMIR Form Res. 2023 Aug 18;7:e47434. doi: 10.2196/47434.
2
BERT-based Ranking for Biomedical Entity Normalization.基于BERT的生物医学实体规范化排序
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:269-277. eCollection 2020.
3
Application of Entity-BERT model based on neuroscience and brain-like cognition in electronic medical record entity recognition.基于神经科学和类脑认知的实体BERT模型在电子病历实体识别中的应用
Front Neurosci. 2023 Sep 20;17:1259652. doi: 10.3389/fnins.2023.1259652. eCollection 2023.
4
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN(带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合)模型的医患对话多标签分类:命名实体研究
JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.
5
Identifying Patient Populations in Texts Describing Drug Approvals Through Deep Learning-Based Information Extraction: Development of a Natural Language Processing Algorithm.通过基于深度学习的信息提取在描述药物批准的文本中识别患者群体:一种自然语言处理算法的开发
JMIR Form Res. 2023 Jun 22;7:e44876. doi: 10.2196/44876.
6
Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.基于 BERT(来自 Transformers 的双向编码器表示)的深度学习方法在提取中文放射学报告证据中的应用:计算机辅助肝癌诊断框架的开发。
J Med Internet Res. 2021 Jan 12;23(1):e19689. doi: 10.2196/19689.
7
BertSRC: transformer-based semantic relation classification.BertSRC:基于转换器的语义关系分类。
BMC Med Inform Decis Mak. 2022 Sep 6;22(1):234. doi: 10.1186/s12911-022-01977-5.
8
Deep learning to refine the identification of high-quality clinical research articles from the biomedical literature: Performance evaluation.深度学习改进生物医学文献中高质量临床研究文章的识别:性能评估。
J Biomed Inform. 2023 Jun;142:104384. doi: 10.1016/j.jbi.2023.104384. Epub 2023 May 8.
9
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
10
Deep Learning Approach for Negation and Speculation Detection for Automated Important Finding Flagging and Extraction in Radiology Report: Internal Validation and Technique Comparison Study.用于放射学报告中自动重要发现标记和提取的否定与推测检测的深度学习方法:内部验证与技术比较研究
JMIR Med Inform. 2023 Apr 25;11:e46348. doi: 10.2196/46348.

本文引用的文献

1
Chinese-Named Entity Recognition From Adverse Drug Event Records: Radical Embedding-Combined Dynamic Embedding-Based BERT in a Bidirectional Long Short-term Conditional Random Field (Bi-LSTM-CRF) Model.从药品不良事件记录中识别中文命名实体:基于激进嵌入与动态嵌入相结合的BERT的双向长短期条件随机场(Bi-LSTM-CRF)模型
JMIR Med Inform. 2021 Dec 1;9(12):e26407. doi: 10.2196/26407.
2
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。
J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.
3
Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.基于大规模电子健康记录笔记对基于变换器的双向编码器表征(BERT)模型进行微调:一项实证研究。
JMIR Med Inform. 2019 Sep 12;7(3):e14830. doi: 10.2196/14830.