• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.

机构信息

Faculty of Computer Science and Engineering, Ss Cyril and Methodius, University- Skopje, Skopje, the Former Yugoslav Republic of Macedonia.

Computer Systems Department, Jožef Stefan Institute, Ljubljana, Slovenia.

出版信息

J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.

DOI:10.2196/28229
PMID:34383671
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8415558/
Abstract

BACKGROUND

Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources.

OBJECTIVE

In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction.

METHODS

We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags.

RESULTS

All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%.

CONCLUSIONS

FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags.

摘要

背景

最近,食品科学受到了广泛关注。作为主要环境因素之一,食物与其他与健康相关的实体(如疾病、治疗方法和药物)之间存在许多开放性的研究问题。在过去的 20 年中,大量的工作已经在自然语言处理和机器学习中完成,以实现生物医学信息提取。然而,食品科学领域的机器学习仍然资源不足,这引起了开发食品信息提取方法的问题。目前仅有少量的食品语义资源和基于规则的食品信息提取方法,这些方法往往依赖于一些外部资源。然而,2019 年使用了几种食品语义资源发布了一个带有食物实体及其规范化的标注语料库。

目的

在这项研究中,我们研究了最近发表的基于转换器的双向编码器表示(BERT)模型,该模型在信息提取方面提供了最新的结果,如何对其进行微调以进行食品信息提取。

方法

我们引入了 FoodNER,这是一个基于语料库的食品命名实体识别方法的集合。它由通过在 5 组语义资源上微调 3 个预训练的 BERT 模型得到的 15 个不同模型组成:食物与非食物实体、Hansard 食物语义标签的 2 个子集、FoodOn 语义标签和 Systematized Nomenclature of Medicine Clinical Terms 食物语义标签。

结果

所有的 BERT 模型在区分食物与非食物实体的任务中都提供了非常有希望的结果,其宏 F1 分数在 93.30%至 94.31%之间,这代表了食品信息提取的最新技术。考虑到需要预测语义标签的任务,所有的 BERT 模型再次获得了非常有希望的结果,其宏 F1 分数在 73.39%至 78.96%之间。

结论

FoodNER 可用于在 5 个不同任务中提取和标注食物实体:食物与非食物实体,以及使用最接近的 Hansard 语义标签、父级 Hansard 语义标签、FoodOn 语义标签或 Systematized Nomenclature of Medicine Clinical Terms 语义标签区分食物实体的食物组水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/e497816a82f1/jmir_v23i8e28229_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/cd7389bd362f/jmir_v23i8e28229_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/551a52288879/jmir_v23i8e28229_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/38545bdd937c/jmir_v23i8e28229_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/924048a93208/jmir_v23i8e28229_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/eb9169946d7a/jmir_v23i8e28229_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/237402e6ed0b/jmir_v23i8e28229_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/e497816a82f1/jmir_v23i8e28229_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/cd7389bd362f/jmir_v23i8e28229_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/551a52288879/jmir_v23i8e28229_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/38545bdd937c/jmir_v23i8e28229_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/924048a93208/jmir_v23i8e28229_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/eb9169946d7a/jmir_v23i8e28229_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/237402e6ed0b/jmir_v23i8e28229_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/8415558/e497816a82f1/jmir_v23i8e28229_fig7.jpg

相似文献

1
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。
J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.
2
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
3
Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.基于大规模电子健康记录笔记对基于变换器的双向编码器表征(BERT)模型进行微调:一项实证研究。
JMIR Med Inform. 2019 Sep 12;7(3):e14830. doi: 10.2196/14830.
4
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.
5
CACER: Clinical concept Annotations for Cancer Events and Relations.CACER:癌症事件与关系的临床概念注释。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2583-2594. doi: 10.1093/jamia/ocae231.
6
Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation.用于中文医学实体识别的多层次表示学习:模型开发与验证
JMIR Med Inform. 2020 May 4;8(5):e17637. doi: 10.2196/17637.
7
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN(带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合)模型的医患对话多标签分类:命名实体研究
JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.
8
Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.基于多语义特征,利用经过稳健优化的基于变换器预训练方法的全词掩码和卷积神经网络从电子病历中进行中文临床命名实体识别:模型开发与验证
JMIR Med Inform. 2023 May 10;11:e44597. doi: 10.2196/44597.
9
Evaluation of a prototype machine learning tool to semi-automate data extraction for systematic literature reviews.评价一个机器学习原型工具,以半自动提取系统文献综述的数据。
Syst Rev. 2023 Oct 6;12(1):187. doi: 10.1186/s13643-023-02351-w.
10
BioBERT and Similar Approaches for Relation Extraction.BioBERT 及其在关系抽取中的应用。
Methods Mol Biol. 2022;2496:221-235. doi: 10.1007/978-1-0716-2305-3_12.

引用本文的文献

1
Zero-shot evaluation of ChatGPT for food named-entity recognition and linking.ChatGPT在食品命名实体识别与链接方面的零样本评估。
Front Nutr. 2024 Aug 13;11:1429259. doi: 10.3389/fnut.2024.1429259. eCollection 2024.
2
Integrating machine learning and artificial intelligence in life-course epidemiology: pathways to innovative public health solutions.将机器学习和人工智能融入生命历程流行病学:创新公共卫生解决方案的途径。
BMC Med. 2024 Sep 2;22(1):354. doi: 10.1186/s12916-024-03566-x.
3
Decoding the Foodome: Molecular Networks Connecting Diet and Health.

本文引用的文献

1
Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition.基于 CNN 和 LSTM 的组合特征嵌入的生物医学命名实体识别。
J Biomed Inform. 2020 Mar;103:103381. doi: 10.1016/j.jbi.2020.103381. Epub 2020 Jan 28.
2
Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes.基于知识增强的生物医学命名实体识别与规范:在蛋白质和基因上的应用。
BMC Bioinformatics. 2020 Jan 30;21(1):35. doi: 10.1186/s12859-020-3375-3.
3
FoodBase corpus: a new resource of annotated food entities.
解码食物组学:连接饮食与健康的分子网络。
Annu Rev Nutr. 2024 Aug;44(1):257-288. doi: 10.1146/annurev-nutr-062322-030557.
4
A Deep Learning Model for the Normalization of Institution Names by Multisource Literature Feature Fusion: Algorithm Development Study.一种基于多源文献特征融合的机构名称规范化深度学习模型:算法开发研究
JMIR Form Res. 2023 Aug 18;7:e47434. doi: 10.2196/47434.
5
From language models to large-scale food and biomedical knowledge graphs.从语言模型到大尺度的食品和生物医学知识图谱。
Sci Rep. 2023 May 15;13(1):7815. doi: 10.1038/s41598-023-34981-4.
6
CafeteriaSA corpus: scientific abstracts annotated across different food semantic resources.自助餐厅 SA 语料库:在不同的食物语义资源中进行标注的科学摘要。
Database (Oxford). 2022 Dec 16;2022. doi: 10.1093/database/baac107.
7
CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources.自助餐厅FCD语料库:关于不同食物语义资源标注的食物消费数据。
Foods. 2022 Sep 2;11(17):2684. doi: 10.3390/foods11172684.
FoodBase 语料库:一个新的带注释食物实体资源。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz121.
4
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
5
FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration.FoodOn:一个用于提高全球食品可追溯性、质量控制和数据整合的统一食品本体。
NPJ Sci Food. 2018 Dec 18;2:23. doi: 10.1038/s41538-018-0032-6. eCollection 2018.
6
An Ontology to Standardize Research Output of Nutritional Epidemiology: From Paper-Based Standards to Linked Content.营养流行病学研究产出标准化本体:从基于纸张的标准到链接内容。
Nutrients. 2019 Jun 8;11(6):1300. doi: 10.3390/nu11061300.
7
CollaboNet: collaboration of deep neural networks for biomedical named entity recognition.CollaboNet:用于生物医学命名实体识别的深度神经网络协作。
BMC Bioinformatics. 2019 May 29;20(Suppl 10):249. doi: 10.1186/s12859-019-2813-6.
8
Cross-type biomedical named entity recognition with deep multi-task learning.基于深度多任务学习的跨类型生物医学命名实体识别。
Bioinformatics. 2019 May 15;35(10):1745-1752. doi: 10.1093/bioinformatics/bty869.
9
Transfer learning for biomedical named entity recognition with neural networks.基于神经网络的生物医学命名实体识别的迁移学习。
Bioinformatics. 2018 Dec 1;34(23):4087-4094. doi: 10.1093/bioinformatics/bty449.
10
D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information.D3NER:基于条件随机场-双向长短期记忆网络的生物医学命名实体识别,通过各种语言信息的微调嵌入得到改进。
Bioinformatics. 2018 Oct 15;34(20):3539-3546. doi: 10.1093/bioinformatics/bty356.