文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.

机构信息

Faculty of Computer Science and Engineering, Ss Cyril and Methodius, University- Skopje, Skopje, the Former Yugoslav Republic of Macedonia.

Computer Systems Department, Jožef Stefan Institute, Ljubljana, Slovenia.

出版信息

J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.


DOI:10.2196/28229
PMID:34383671
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8415558/
Abstract

BACKGROUND: Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources. OBJECTIVE: In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction. METHODS: We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags. RESULTS: All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%. CONCLUSIONS: FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags.

摘要

背景:最近,食品科学受到了广泛关注。作为主要环境因素之一,食物与其他与健康相关的实体(如疾病、治疗方法和药物)之间存在许多开放性的研究问题。在过去的 20 年中,大量的工作已经在自然语言处理和机器学习中完成,以实现生物医学信息提取。然而,食品科学领域的机器学习仍然资源不足,这引起了开发食品信息提取方法的问题。目前仅有少量的食品语义资源和基于规则的食品信息提取方法,这些方法往往依赖于一些外部资源。然而,2019 年使用了几种食品语义资源发布了一个带有食物实体及其规范化的标注语料库。

目的:在这项研究中,我们研究了最近发表的基于转换器的双向编码器表示(BERT)模型,该模型在信息提取方面提供了最新的结果,如何对其进行微调以进行食品信息提取。

方法:我们引入了 FoodNER,这是一个基于语料库的食品命名实体识别方法的集合。它由通过在 5 组语义资源上微调 3 个预训练的 BERT 模型得到的 15 个不同模型组成:食物与非食物实体、Hansard 食物语义标签的 2 个子集、FoodOn 语义标签和 Systematized Nomenclature of Medicine Clinical Terms 食物语义标签。

结果:所有的 BERT 模型在区分食物与非食物实体的任务中都提供了非常有希望的结果,其宏 F1 分数在 93.30%至 94.31%之间,这代表了食品信息提取的最新技术。考虑到需要预测语义标签的任务,所有的 BERT 模型再次获得了非常有希望的结果,其宏 F1 分数在 73.39%至 78.96%之间。

结论:FoodNER 可用于在 5 个不同任务中提取和标注食物实体:食物与非食物实体,以及使用最接近的 Hansard 语义标签、父级 Hansard 语义标签、FoodOn 语义标签或 Systematized Nomenclature of Medicine Clinical Terms 语义标签区分食物实体的食物组水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/e497816a82f1/jmir_v23i8e28229_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/cd7389bd362f/jmir_v23i8e28229_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/551a52288879/jmir_v23i8e28229_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/38545bdd937c/jmir_v23i8e28229_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/924048a93208/jmir_v23i8e28229_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/eb9169946d7a/jmir_v23i8e28229_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/237402e6ed0b/jmir_v23i8e28229_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/e497816a82f1/jmir_v23i8e28229_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/cd7389bd362f/jmir_v23i8e28229_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/551a52288879/jmir_v23i8e28229_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/38545bdd937c/jmir_v23i8e28229_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/924048a93208/jmir_v23i8e28229_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/eb9169946d7a/jmir_v23i8e28229_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/237402e6ed0b/jmir_v23i8e28229_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/e497816a82f1/jmir_v23i8e28229_fig7.jpg

相似文献

[1]
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.

J Med Internet Res. 2021-8-9

[2]
Extracting comprehensive clinical information for breast cancer using deep learning methods.

Int J Med Inform. 2019-10-2

[3]
Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.

JMIR Med Inform. 2019-9-12

[4]
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.

BMC Med Inform Decis Mak. 2022-3-23

[5]
CACER: Clinical concept Annotations for Cancer Events and Relations.

J Am Med Inform Assoc. 2024-11-1

[6]
Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation.

JMIR Med Inform. 2020-5-4

[7]
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.

JMIR Med Inform. 2022-4-21

[8]
Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.

JMIR Med Inform. 2023-5-10

[9]
Evaluation of a prototype machine learning tool to semi-automate data extraction for systematic literature reviews.

Syst Rev. 2023-10-6

[10]
BioBERT and Similar Approaches for Relation Extraction.

Methods Mol Biol. 2022

引用本文的文献

[1]
Zero-shot evaluation of ChatGPT for food named-entity recognition and linking.

Front Nutr. 2024-8-13

[2]
Integrating machine learning and artificial intelligence in life-course epidemiology: pathways to innovative public health solutions.

BMC Med. 2024-9-2

[3]
Decoding the Foodome: Molecular Networks Connecting Diet and Health.

Annu Rev Nutr. 2024-8

[4]
A Deep Learning Model for the Normalization of Institution Names by Multisource Literature Feature Fusion: Algorithm Development Study.

JMIR Form Res. 2023-8-18

[5]
From language models to large-scale food and biomedical knowledge graphs.

Sci Rep. 2023-5-15

[6]
CafeteriaSA corpus: scientific abstracts annotated across different food semantic resources.

Database (Oxford). 2022-12-16

[7]
CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources.

Foods. 2022-9-2

本文引用的文献

[1]
Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition.

J Biomed Inform. 2020-3

[2]
Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes.

BMC Bioinformatics. 2020-1-30

[3]
FoodBase corpus: a new resource of annotated food entities.

Database (Oxford). 2019-1-1

[4]
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020-2-15

[5]
FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration.

NPJ Sci Food. 2018-12-18

[6]
An Ontology to Standardize Research Output of Nutritional Epidemiology: From Paper-Based Standards to Linked Content.

Nutrients. 2019-6-8

[7]
CollaboNet: collaboration of deep neural networks for biomedical named entity recognition.

BMC Bioinformatics. 2019-5-29

[8]
Cross-type biomedical named entity recognition with deep multi-task learning.

Bioinformatics. 2019-5-15

[9]
Transfer learning for biomedical named entity recognition with neural networks.

Bioinformatics. 2018-12-1

[10]
D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information.

Bioinformatics. 2018-10-15

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索