• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

LSD600:首个标注了生活方式与疾病关系的生物医学摘要语料库。

LSD600: the first corpus of biomedical abstracts annotated with lifestyle-disease relations.

作者信息

Nourani Esmaeil, Makri Evangelia-Mantelena, Mao Xiqing, Pyysalo Sampo, Brunak Søren, Nastou Katerina, Jensen Lars Juhl

机构信息

Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3, Copenhagen 2200, Denmark.

Faculty of Information Technology and Computer Engineering, Azarbaijan Shahid Madani University, Tabriz, Iran.

出版信息

Database (Oxford). 2025 Jan 13;2025. doi: 10.1093/database/baae129.

DOI:10.1093/database/baae129
PMID:39824652
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11756709/
Abstract

Lifestyle factors (LSFs) are increasingly recognized as instrumental in both the development and control of diseases. Despite their importance, there is a lack of methods to extract relations between LSFs and diseases from the literature, a step necessary to consolidate the currently available knowledge into a structured form. As simple co-occurrence-based relation extraction (RE) approaches are unable to distinguish between the different types of LSF-disease relations, context-aware models such as transformers are required to extract and classify these relations into specific relation types. However, no comprehensive LSF-disease RE system existed, nor a corpus suitable for developing one. We present LSD600 (available at https://zenodo.org/records/13952449), the first corpus specifically designed for LSF-disease RE, comprising 600 abstracts with 1900 relations of eight distinct types between 5027 diseases and 6930 LSF entities. We evaluated LSD600's quality by training a RoBERTa model on the corpus, achieving an F-score of 68.5% for the multilabel RE task on the held-out test set. We further validated LSD600 by using the trained model on the two Nutrition-Disease and FoodDisease datasets, where it achieved F-scores of 70.7% and 80.7%, respectively. Building on these performance results, LSD600 and the RE system trained on it can be valuable resources to fill the existing gap in this area and pave the way for downstream applications. Database URL: https://zenodo.org/records/13952449.

摘要

生活方式因素(LSFs)在疾病的发生和控制中发挥着越来越重要的作用,这一点已得到广泛认可。尽管它们很重要,但目前缺乏从文献中提取LSFs与疾病之间关系的方法,而这是将现有知识整合为结构化形式的必要步骤。由于基于简单共现的关系提取(RE)方法无法区分不同类型的LSF-疾病关系,因此需要诸如变压器之类的上下文感知模型来提取这些关系并将其分类为特定的关系类型。然而,当时不存在全面的LSF-疾病RE系统,也没有适合开发此类系统的语料库。我们展示了LSD600(可在https://zenodo.org/records/13952449获取),这是第一个专门为LSF-疾病RE设计的语料库,包含600篇摘要,其中在5027种疾病和6930个LSF实体之间存在1900种八种不同类型的关系。我们通过在该语料库上训练RoBERTa模型来评估LSD600的质量,在保留测试集上的多标签RE任务中获得了68.5%的F分数。我们还通过在两个营养-疾病和食物-疾病数据集上使用训练好的模型进一步验证了LSD600,在这两个数据集上它分别获得了70.7%和80.7%的F分数。基于这些性能结果,LSD600及其上训练的RE系统可以成为填补该领域现有空白并为下游应用铺平道路的宝贵资源。数据库网址:https://zenodo.org/records/13952449。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/e746b73782a5/baae129f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/8df8fb5355f3/baae129f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/a23b0c1deddf/baae129f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/394920963b3b/baae129f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/bed7f191a7e9/baae129f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/6b5731e4755b/baae129f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/e746b73782a5/baae129f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/8df8fb5355f3/baae129f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/a23b0c1deddf/baae129f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/394920963b3b/baae129f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/bed7f191a7e9/baae129f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/6b5731e4755b/baae129f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a6/11756709/e746b73782a5/baae129f6.jpg

相似文献

1
LSD600: the first corpus of biomedical abstracts annotated with lifestyle-disease relations.LSD600:首个标注了生活方式与疾病关系的生物医学摘要语料库。
Database (Oxford). 2025 Jan 13;2025. doi: 10.1093/database/baae129.
2
Lifestyle factors in the biomedical literature: an ontology and comprehensive resources for named entity recognition.生物医学文献中的生活方式因素:命名实体识别的本体和综合资源。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae613.
3
RegulaTome: a corpus of typed, directed, and signed relations between biomedical entities in the scientific literature.RegulaTome:科学文献中生物医学实体之间的有类型、有方向和有签名的关系语料库。
Database (Oxford). 2024 Sep 12;2024. doi: 10.1093/database/baae095.
4
Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical-drug relation extraction?句法树是否能增强用于化学药物关系抽取的基于转换器的双向编码器表示(BERT)模型?
Database (Oxford). 2022 Aug 25;2022. doi: 10.1093/database/baac070.
5
NLM-Chem-BC7: manually annotated full-text resources for chemical entity annotation and indexing in biomedical articles.NLM-Chem-BC7:用于生物医学文章中化学实体注释和索引的人工标注全文资源。
Database (Oxford). 2022 Dec 1;2022. doi: 10.1093/database/baac102.
6
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状:生物创意V化学-疾病关系(CDR)任务概述。
Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.
7
Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts.建立用于挖掘人类遗传变异及其与疾病队列关系的文献基线。
BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):68. doi: 10.1186/s12911-016-0294-3.
8
An annotated dataset for extracting gene-melanoma relations from scientific literature.从科学文献中提取基因-黑色素瘤关系的带注释数据集。
J Biomed Semantics. 2022 Jan 19;13(1):2. doi: 10.1186/s13326-021-00251-3.
9
STRING-ing together protein complexes: corpus and methods for extracting physical protein interactions from the biomedical literature.从生物医学文献中提取物理蛋白质相互作用的语料库和方法:将蛋白质复合物串联起来。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae552.
10
BO-LSTM: classifying relations via long short-term memory networks along biomedical ontologies.BO-LSTM:通过生物医学本体论沿长短时记忆网络进行关系分类。
BMC Bioinformatics. 2019 Jan 7;20(1):10. doi: 10.1186/s12859-018-2584-5.

本文引用的文献

1
Lifestyle factors in the biomedical literature: an ontology and comprehensive resources for named entity recognition.生物医学文献中的生活方式因素:命名实体识别的本体和综合资源。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae613.
2
STRING-ing together protein complexes: corpus and methods for extracting physical protein interactions from the biomedical literature.从生物医学文献中提取物理蛋白质相互作用的语料库和方法:将蛋白质复合物串联起来。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae552.
3
RegulaTome: a corpus of typed, directed, and signed relations between biomedical entities in the scientific literature.
RegulaTome:科学文献中生物医学实体之间的有类型、有方向和有签名的关系语料库。
Database (Oxford). 2024 Sep 12;2024. doi: 10.1093/database/baae095.
4
DUVEL: an active-learning annotated biomedical corpus for the recognition of oligogenic combinations.DUVEL:一个用于识别寡基因组合的主动学习标注生物医学语料库。
Database (Oxford). 2024 May 28;2024. doi: 10.1093/database/baae039.
5
Precision Nutrition Unveiled: Gene-Nutrient Interactions, Microbiota Dynamics, and Lifestyle Factors in Obesity Management.精准营养解析:基因-营养素相互作用、微生物组动态及肥胖管理中的生活方式因素。
Nutrients. 2024 Feb 20;16(5):581. doi: 10.3390/nu16050581.
6
KG4NH: A Comprehensive Knowledge Graph for Question Answering in Dietary Nutrition and Human Health.KG4NH:用于膳食营养与人类健康问答的综合知识图谱
IEEE J Biomed Health Inform. 2025 Mar;29(3):1793-1804. doi: 10.1109/JBHI.2023.3338356. Epub 2025 Mar 6.
7
Overview of DrugProt task at BioCreative VII: data and methods for large-scale text mining and knowledge graph generation of heterogenous chemical-protein relations.DrugProt 任务概述在 BioCreative VII 上:大规模文本挖掘和异构化学-蛋白质关系知识图生成的数据和方法。
Database (Oxford). 2023 Nov 28;2023. doi: 10.1093/database/baad080.
8
GENA: A knowledge graph for nutrition and mental health.GENA:一个营养与心理健康的知识图谱。
J Biomed Inform. 2023 Sep;145:104460. doi: 10.1016/j.jbi.2023.104460. Epub 2023 Aug 1.
9
From language models to large-scale food and biomedical knowledge graphs.从语言模型到大尺度的食品和生物医学知识图谱。
Sci Rep. 2023 May 15;13(1):7815. doi: 10.1038/s41598-023-34981-4.
10
The Contribution of Genetic Risk and Lifestyle Factors in the Development of Adult-Onset Inflammatory Bowel Disease: A Prospective Cohort Study.遗传风险和生活方式因素在成人发病炎症性肠病发展中的作用:一项前瞻性队列研究。
Am J Gastroenterol. 2023 Mar 1;118(3):511-522. doi: 10.14309/ajg.0000000000002180. Epub 2023 Jan 9.