• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自助餐厅FCD语料库:关于不同食物语义资源标注的食物消费数据。

CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources.

作者信息

Ispirova Gordana, Cenikj Gjorgjina, Ogrinc Matevž, Valenčič Eva, Stojanov Riste, Korošec Peter, Cavalli Ermanno, Koroušić Seljak Barbara, Eftimov Tome

机构信息

Computer Systems Department, Jožef Stefan Institute, 1000 Ljubljana, Slovenia.

Jožef Stefan International Postgraduate School, 1000 Ljubljana, Slovenia.

出版信息

Foods. 2022 Sep 2;11(17):2684. doi: 10.3390/foods11172684.

DOI:10.3390/foods11172684
PMID:36076868
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9455825/
Abstract

Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation process of food consumption data (recipes) with semantic tags from different semantic resources-Hansard taxonomy, FoodOn ontology, SNOMED CT terminology and the FoodEx2 classification system. FoodBase is an annotated corpus of food entities-recipes-which includes a curated version of 1000 instances, considered a gold standard. In this study, we use the curated version of FoodBase and two different approaches for annotating-the NCBO annotator (for the FoodOn and SNOMED CT annotations) and the semi-automatic StandFood method (for the FoodEx2 annotations). The end result is a new version of the golden standard of the FoodBase corpus, called the CafeteriaFCD (Cafeteria Food Consumption Data) corpus. This corpus contains food consumption data-recipes-annotated with semantic tags from the aforementioned four different external semantic resources. With these annotations, data interoperability is achieved between five semantic resources from different domains. This resource can be further utilized for developing and training different information extraction pipelines using state-of-the-art NLP approaches for tracing knowledge about food safety applications.

摘要

除了过去十年中涉及食品和营养数据的大量研究外,该领域的资源仍然匮乏。带注释的语料库对于相关领域的研究人员和专家以及用于分析的数据科学家来说都是非常有用的工具。在本文中,我们展示了使用来自不同语义资源(汉萨德分类法、FoodOn本体、SNOMED CT术语和FoodEx2分类系统)的语义标签对食品消费数据(食谱)进行注释的过程。FoodBase是一个带注释的食品实体(食谱)语料库,其中包括1000个实例的精选版本,被视为黄金标准。在本研究中,我们使用FoodBase的精选版本以及两种不同的注释方法——NCBO注释器(用于FoodOn和SNOMED CT注释)和半自动的StandFood方法(用于FoodEx2注释)。最终结果是FoodBase语料库黄金标准的一个新版本,称为CafeteriaFCD(自助餐厅食品消费数据)语料库。该语料库包含用上述四种不同外部语义资源的语义标签注释的食品消费数据(食谱)。通过这些注释,实现了来自不同领域的五种语义资源之间的数据互操作性。该资源可进一步用于使用最先进的自然语言处理方法开发和训练不同的信息提取管道,以追踪食品安全应用方面的知识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/bd367713c0e9/foods-11-02684-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/51771f6f8d2f/foods-11-02684-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/00e4bdd66087/foods-11-02684-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/55b1dd00fff4/foods-11-02684-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/757f5416211c/foods-11-02684-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/ed1b3c7e0059/foods-11-02684-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/2de7729d4ff0/foods-11-02684-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/5e142a79d4be/foods-11-02684-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/bd367713c0e9/foods-11-02684-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/51771f6f8d2f/foods-11-02684-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/00e4bdd66087/foods-11-02684-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/55b1dd00fff4/foods-11-02684-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/757f5416211c/foods-11-02684-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/ed1b3c7e0059/foods-11-02684-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/2de7729d4ff0/foods-11-02684-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/5e142a79d4be/foods-11-02684-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad8/9455825/bd367713c0e9/foods-11-02684-g008.jpg

相似文献

1
CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources.自助餐厅FCD语料库:关于不同食物语义资源标注的食物消费数据。
Foods. 2022 Sep 2;11(17):2684. doi: 10.3390/foods11172684.
2
FoodBase corpus: a new resource of annotated food entities.FoodBase 语料库:一个新的带注释食物实体资源。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz121.
3
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。
J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.
4
CafeteriaSA corpus: scientific abstracts annotated across different food semantic resources.自助餐厅 SA 语料库:在不同的食物语义资源中进行标注的科学摘要。
Database (Oxford). 2022 Dec 16;2022. doi: 10.1093/database/baac107.
5
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。
J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.
6
Assessment of NER solutions against the first and second CALBC Silver Standard Corpus.针对首个和第二个CALBC银标准语料库对命名实体识别解决方案进行评估。
J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S11. doi: 10.1186/2041-1480-2-S5-S11.
7
Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.PubMed 查询的半自动语义标注:一项关于质量、效率和满意度的研究。
J Biomed Inform. 2011 Apr;44(2):310-8. doi: 10.1016/j.jbi.2010.11.001. Epub 2010 Nov 20.
8
Semantic annotation of consumer health questions.消费者健康问题的语义标注。
BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.
9
Gold-standard ontology-based anatomical annotation in the CRAFT Corpus.CRAFT语料库中基于金标准本体的解剖学标注
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax087.
10
Towards comprehensive syntactic and semantic annotations of the clinical narrative.朝着临床叙述的全面句法和语义标注努力。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.

引用本文的文献

1
NutriBase - management system for the integration and interoperability of food- and nutrition-related data and knowledge.NutriBase - 用于食品和营养相关数据及知识整合与互操作性的管理系统。
Front Nutr. 2025 Jan 6;11:1503389. doi: 10.3389/fnut.2024.1503389. eCollection 2024.
2
PDGFRA is a conserved HAND2 effector during early cardiac development.血小板衍生生长因子受体A(PDGFRA)在心脏早期发育过程中是一种保守的HAND2效应因子。
Nat Cardiovasc Res. 2024 Dec;3(12):1531-1548. doi: 10.1038/s44161-024-00574-1. Epub 2024 Dec 10.
3
Zero-shot evaluation of ChatGPT for food named-entity recognition and linking.

本文引用的文献

1
The R Language: An Engine for Bioinformatics and Data Science.R语言:生物信息学与数据科学的引擎
Life (Basel). 2022 Apr 27;12(5):648. doi: 10.3390/life12050648.
2
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。
J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.
3
FoodBase corpus: a new resource of annotated food entities.FoodBase 语料库:一个新的带注释食物实体资源。
ChatGPT在食品命名实体识别与链接方面的零样本评估。
Front Nutr. 2024 Aug 13;11:1429259. doi: 10.3389/fnut.2024.1429259. eCollection 2024.
4
CafeteriaSA corpus: scientific abstracts annotated across different food semantic resources.自助餐厅 SA 语料库:在不同的食物语义资源中进行标注的科学摘要。
Database (Oxford). 2022 Dec 16;2022. doi: 10.1093/database/baac107.
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz121.
4
FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration.FoodOn:一个用于提高全球食品可追溯性、质量控制和数据整合的统一食品本体。
NPJ Sci Food. 2018 Dec 18;2:23. doi: 10.1038/s41538-018-0032-6. eCollection 2018.
5
A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations.一种基于规则的命名实体识别方法,用于循证饮食建议的知识提取。
PLoS One. 2017 Jun 23;12(6):e0179488. doi: 10.1371/journal.pone.0179488. eCollection 2017.
6
StandFood: Standardization of Foods Using a Semi-Automatic System for Classifying and Describing Foods According to FoodEx2.标准食品:使用半自动系统根据FoodEx2对食品进行分类和描述的食品标准化
Nutrients. 2017 May 26;9(6):542. doi: 10.3390/nu9060542.
7
LanguaL food description: a learning process.语言食物描述:一个学习过程。
Eur J Clin Nutr. 2010 Nov;64 Suppl 3:S44-8. doi: 10.1038/ejcn.2010.209.
8
BioPortal: ontologies and integrated data resources at the click of a mouse.生物门户:一键点击即可获取本体和集成数据资源。
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W170-3. doi: 10.1093/nar/gkp440. Epub 2009 May 29.
9
SNOMED-CT: The advanced terminology and coding system for eHealth.SNOMED-CT:电子健康的先进术语和编码系统。
Stud Health Technol Inform. 2006;121:279-90.