• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BertSRC:基于转换器的语义关系分类。

BertSRC: transformer-based semantic relation classification.

机构信息

Department of Library and Information Science, Yonsei University, Seoul, South Korea.

Department of Digital Analytics, Yonsei University, Seoul, South Korea.

出版信息

BMC Med Inform Decis Mak. 2022 Sep 6;22(1):234. doi: 10.1186/s12911-022-01977-5.

DOI:10.1186/s12911-022-01977-5
PMID:36068535
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9446816/
Abstract

The relationship between biomedical entities is complex, and many of them have not yet been identified. For many biomedical research areas including drug discovery, it is of paramount importance to identify the relationships that have already been established through a comprehensive literature survey. However, manually searching through literature is difficult as the amount of biomedical publications continues to increase. Therefore, the relation classification task, which automatically mines meaningful relations from the literature, is spotlighted in the field of biomedical text mining. By applying relation classification techniques to the accumulated biomedical literature, existing semantic relations between biomedical entities that can help to infer previously unknown relationships are efficiently grasped. To develop semantic relation classification models, which is a type of supervised machine learning, it is essential to construct a training dataset that is manually annotated by biomedical experts with semantic relations among biomedical entities. Any advanced model must be trained on a dataset with reliable quality and meaningful scale to be deployed in the real world and can assist biologists in their research. In addition, as the number of such public datasets increases, the performance of machine learning algorithms can be accurately revealed and compared by using those datasets as a benchmark for model development and improvement. In this paper, we aim to build such a dataset. Along with that, to validate the usability of the dataset as training data for relation classification models and to improve the performance of the relation extraction task, we built a relation classification model based on Bidirectional Encoder Representations from Transformers (BERT) trained on our dataset, applying our newly proposed fine-tuning methodology. In experiments comparing performance among several models based on different deep learning algorithms, our model with the proposed fine-tuning methodology showed the best performance. The experimental results show that the constructed training dataset is an important information resource for the development and evaluation of semantic relation extraction models. Furthermore, relation extraction performance can be improved by integrating our proposed fine-tuning methodology. Therefore, this can lead to the promotion of future text mining research in the biomedical field.

摘要

生物医学实体之间的关系复杂,其中许多关系尚未被确定。对于许多包括药物发现在内的生物医学研究领域,通过全面的文献调查来确定已经建立的关系至关重要。然而,由于生物医学出版物的数量不断增加,手动搜索文献非常困难。因此,关系分类任务(即自动从文献中挖掘有意义的关系)成为生物医学文本挖掘领域的焦点。通过将关系分类技术应用于积累的生物医学文献,可以有效地掌握生物医学实体之间有助于推断未知关系的现有语义关系。为了开发语义关系分类模型,这是一种监督机器学习,必须构建一个由生物医学专家手动注释的训练数据集,其中包含生物医学实体之间的语义关系。任何先进的模型都必须在具有可靠质量和有意义规模的数据集上进行训练,以便在现实世界中部署,并帮助生物学家进行研究。此外,随着此类公共数据集数量的增加,可以使用这些数据集作为模型开发和改进的基准,准确揭示和比较机器学习算法的性能。在本文中,我们旨在构建这样的数据集。与此同时,为了验证数据集作为关系分类模型训练数据的可用性,并提高关系提取任务的性能,我们基于我们的数据集构建了一个基于转换器的双向编码器表示(BERT)的关系分类模型,并应用了我们新提出的微调方法。在基于不同深度学习算法的几个模型的性能比较实验中,我们的模型和所提出的微调方法表现出了最佳的性能。实验结果表明,构建的训练数据集是开发和评估语义关系提取模型的重要信息资源。此外,通过集成我们提出的微调方法可以提高关系提取性能。因此,这可以促进未来生物医学领域的文本挖掘研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/6b36373da8ba/12911_2022_1977_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/f69b979c0b97/12911_2022_1977_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/ffeb19f05d03/12911_2022_1977_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/accb7b72cbe3/12911_2022_1977_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/58458fd4205c/12911_2022_1977_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/a79bd589ea0c/12911_2022_1977_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/2767354f7fb7/12911_2022_1977_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/21cba4800a80/12911_2022_1977_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/6b36373da8ba/12911_2022_1977_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/f69b979c0b97/12911_2022_1977_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/ffeb19f05d03/12911_2022_1977_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/accb7b72cbe3/12911_2022_1977_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/58458fd4205c/12911_2022_1977_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/a79bd589ea0c/12911_2022_1977_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/2767354f7fb7/12911_2022_1977_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/21cba4800a80/12911_2022_1977_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d96d/9446816/6b36373da8ba/12911_2022_1977_Fig8_HTML.jpg

相似文献

1
BertSRC: transformer-based semantic relation classification.BertSRC:基于转换器的语义关系分类。
BMC Med Inform Decis Mak. 2022 Sep 6;22(1):234. doi: 10.1186/s12911-022-01977-5.
2
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。
J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.
3
Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction.探讨改进 BERT 模型在生物医学关系抽取中的预训练和微调。
BMC Bioinformatics. 2022 Apr 4;23(1):120. doi: 10.1186/s12859-022-04642-w.
4
Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical-drug relation extraction?句法树是否能增强用于化学药物关系抽取的基于转换器的双向编码器表示(BERT)模型?
Database (Oxford). 2022 Aug 25;2022. doi: 10.1093/database/baac070.
5
Enhancing the coverage of SemRep using a relation classification approach.利用关系分类方法增强 SemRep 的覆盖范围。
J Biomed Inform. 2024 Jul;155:104658. doi: 10.1016/j.jbi.2024.104658. Epub 2024 May 21.
6
CACER: Clinical concept Annotations for Cancer Events and Relations.CACER:癌症事件与关系的临床概念注释。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2583-2594. doi: 10.1093/jamia/ocae231.
7
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
8
An annotated dataset for extracting gene-melanoma relations from scientific literature.从科学文献中提取基因-黑色素瘤关系的带注释数据集。
J Biomed Semantics. 2022 Jan 19;13(1):2. doi: 10.1186/s13326-021-00251-3.
9
Deep scaled dot-product attention based domain adaptation model for biomedical question answering.基于深度尺度点积注意力的生物医学问答领域自适应模型。
Methods. 2020 Feb 15;173:69-74. doi: 10.1016/j.ymeth.2019.06.024. Epub 2019 Jun 26.
10
A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning.一种使用预训练模型和微调改进基于深度学习的医学关系抽取的通用方法。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz116.

引用本文的文献

1
Drug target assessments: classifying target modulation and associated health effects using multi-level BERT-based classification models.药物靶点评估:使用基于多层BERT的分类模型对靶点调节及相关健康影响进行分类
Bioinform Adv. 2025 Mar 8;5(1):vbaf043. doi: 10.1093/bioadv/vbaf043. eCollection 2025.
2
MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.医学主题词表到矩阵:基于PubMed结合医学主题词表关键词与机器学习进行生物医学关系分类
J Biomed Semantics. 2024 Oct 2;15(1):18. doi: 10.1186/s13326-024-00319-w.
3
Unsupervised literature mining approaches for extracting relationships pertaining to habitats and reproductive conditions of plant species.

本文引用的文献

1
Deep transformers and convolutional neural network in identifying DNA N6-methyladenine sites in cross-species genomes.深度转换器和卷积神经网络在跨物种基因组中识别 DNA N6-甲基腺嘌呤位点。
Methods. 2022 Aug;204:199-206. doi: 10.1016/j.ymeth.2021.12.004. Epub 2021 Dec 13.
2
The protein-protein interaction ontology: for better representing and capturing the biological context of protein interaction.蛋白质-蛋白质相互作用本体论:用于更好地表示和捕获蛋白质相互作用的生物学背景。
BMC Genomics. 2021 Nov 16;22(Suppl 5):544. doi: 10.1186/s12864-021-07827-4.
3
srBERT: automatic article classification model for systematic review using BERT.
用于提取与植物物种栖息地和繁殖条件相关关系的无监督文献挖掘方法。
Front Artif Intell. 2024 May 23;7:1371411. doi: 10.3389/frai.2024.1371411. eCollection 2024.
4
A marker-based neural network system for extracting social determinants of health.基于标记的神经网络系统,用于提取健康的社会决定因素。
J Am Med Inform Assoc. 2023 Jul 19;30(8):1398-1407. doi: 10.1093/jamia/ocad041.
5
A hybrid algorithm for clinical decision support in precision medicine based on machine learning.基于机器学习的精准医学临床决策支持的混合算法。
BMC Bioinformatics. 2023 Jan 3;24(1):3. doi: 10.1186/s12859-022-05116-9.
srBERT:基于 BERT 的系统综述自动文章分类模型。
Syst Rev. 2021 Oct 30;10(1):285. doi: 10.1186/s13643-021-01763-w.
4
Relation classification via BERT with piecewise convolution and focal loss.基于分段卷积和焦点损失的 BERT 关系分类。
PLoS One. 2021 Sep 10;16(9):e0257092. doi: 10.1371/journal.pone.0257092. eCollection 2021.
5
Enhancing Biomedical Relation Extraction with Transformer Models using Shortest Dependency Path Features and Triplet Information.利用最短依赖路径特征和三元组信息增强基于 Transformer 的生物医学关系抽取
J Biomed Inform. 2021 Oct;122:103893. doi: 10.1016/j.jbi.2021.103893. Epub 2021 Sep 2.
6
BioPREP: Deep learning-based predicate classification with SemMedDB.BioPREP:基于 SemMedDB 的深度学习谓词分类。
J Biomed Inform. 2021 Oct;122:103888. doi: 10.1016/j.jbi.2021.103888. Epub 2021 Aug 16.
7
Relation Classification for Bleeding Events From Electronic Health Records Using Deep Learning Systems: An Empirical Study.使用深度学习系统对电子健康记录中的出血事件进行关系分类:一项实证研究。
JMIR Med Inform. 2021 Jul 2;9(7):e27527. doi: 10.2196/27527.
8
A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.基于 BERT 和二维卷积神经网络的变压器架构,用于从序列信息中识别 DNA 增强子。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab005.
9
Broad-coverage biomedical relation extraction with SemRep.基于 SemRep 的广谱生物医学关系抽取。
BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.
10
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.