• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物医学文章中传统方剂 - 疾病关系的注释语料库。

Annotated corpus for traditional formula-disease relationships in biomedical articles.

作者信息

Yea Sangjun, Jang Ho, Kim Soyoung, Lee Sanghun, Kim Jaeuk U

机构信息

Korean medicine data division, Korea Institute of Oriental Medicine, Daejeon, 34054, Republic of Korea.

Korean convergence medical science, University of Science and Technology, Daejeon, 34113, Republic of Korea.

出版信息

Sci Data. 2025 Jan 7;12(1):26. doi: 10.1038/s41597-025-04377-2.

DOI:10.1038/s41597-025-04377-2
PMID:39774689
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11707285/
Abstract

The Traditional Formula (TF), a combination of herbs prepared in accordance with traditional medicine principles, is increasingly garnering global attention as an alternative to modern medicine. Specifically, there is growing interest in exploring TF's therapeutic effects across various diseases. A significant portion of the state-of-the-art knowledge regarding the relationship between TF and disease is found in scientific publications, where manual knowledge extraction is impractical. Thus, Natural Language Processing (NLP) is being employed to efficiently and accurately search and extract crucial knowledge from unstructured literatures. However, the absence of a high-quality manually annotated corpus focusing on TF-disease relationships hampers the use of NLP in the fields of traditional medicine and modern biomedical science. This article introduces the Traditional Formula-Disease Relationship (TFDR) corpus, a manually annotated corpus designed to facilitate the automatic extraction of TF-disease relationships from biomedical literatures. The TFDR corpus includes information gleaned from 740 PubMed abstracts, encompassing a total of 6,211 TF mentions, 7,166 disease mentions, and 1,109 relationships between them encapsulated within 744 key-sentences.

摘要

传统配方(TF)是根据传统医学原则配制的草药组合,作为现代医学的替代方案,正日益受到全球关注。具体而言,人们对探索TF在各种疾病中的治疗效果的兴趣与日俱增。关于TF与疾病关系的最新知识很大一部分存在于科学出版物中,在这些出版物中手动提取知识是不切实际的。因此,自然语言处理(NLP)正被用于从非结构化文献中高效、准确地搜索和提取关键知识。然而,缺乏专注于TF与疾病关系的高质量人工标注语料库阻碍了NLP在传统医学和现代生物医学科学领域的应用。本文介绍了传统配方-疾病关系(TFDR)语料库,这是一个人工标注的语料库,旨在促进从生物医学文献中自动提取TF与疾病的关系。TFDR语料库包含从740篇PubMed摘要中收集的信息,共计6211次提及TF、7166次提及疾病,以及744个关键句子中包含的它们之间的1109种关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a727/11707285/8cb631907384/41597_2025_4377_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a727/11707285/5c282f031d16/41597_2025_4377_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a727/11707285/cafc52e56075/41597_2025_4377_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a727/11707285/8cb631907384/41597_2025_4377_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a727/11707285/5c282f031d16/41597_2025_4377_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a727/11707285/cafc52e56075/41597_2025_4377_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a727/11707285/8cb631907384/41597_2025_4377_Fig3_HTML.jpg

相似文献

1
Annotated corpus for traditional formula-disease relationships in biomedical articles.生物医学文章中传统方剂 - 疾病关系的注释语料库。
Sci Data. 2025 Jan 7;12(1):26. doi: 10.1038/s41597-025-04377-2.
2
PGxCorpus, a manually annotated corpus for pharmacogenomics.PGxCorpus,一个用于药物基因组学的人工标注语料库。
Sci Data. 2020 Jan 2;7(1):3. doi: 10.1038/s41597-019-0342-9.
3
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
4
An annotated corpus from biomedical articles to construct a drug-food interaction database.一个来自生物医学文章的带注释语料库,用于构建药物-食物相互作用数据库。
J Biomed Inform. 2022 Feb;126:103985. doi: 10.1016/j.jbi.2022.103985. Epub 2022 Jan 7.
5
Plant phenotype relationship corpus for biomedical relationships between plants and phenotypes.植物表型关系语料库,用于描述植物和表型之间的生物医学关系。
Sci Data. 2022 May 26;9(1):235. doi: 10.1038/s41597-022-01350-1.
6
Supervised Relation Extraction Between Suicide-Related Entities and Drugs: Development and Usability Study of an Annotated PubMed Corpus.基于标注 PubMed 语料库的自杀相关实体与药物间监督关系抽取:开发与可用性研究
J Med Internet Res. 2023 Mar 8;25:e41100. doi: 10.2196/41100.
7
The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms.罕见病语料库:一个标注了罕见病、其症状和体征的语料库。
J Biomed Inform. 2022 Jan;125:103961. doi: 10.1016/j.jbi.2021.103961. Epub 2021 Dec 5.
8
GDReCo: Fine-grained gene-disease relationship extraction corpus.GDReCo:细粒度基因-疾病关系提取语料库。
Comput Methods Programs Biomed. 2025 Jun;266:108773. doi: 10.1016/j.cmpb.2025.108773. Epub 2025 Apr 11.
9
Broad-coverage biomedical relation extraction with SemRep.基于 SemRep 的广谱生物医学关系抽取。
BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.
10
BioInfer: a corpus for information extraction in the biomedical domain.生物推理(BioInfer):一个用于生物医学领域信息提取的语料库。
BMC Bioinformatics. 2007 Feb 9;8:50. doi: 10.1186/1471-2105-8-50.

引用本文的文献

1
Comparisons of pharmacokinetics of glimepiride in combination with Ojeok-san versus glimepiride alone: an open-label, one-sequence, two-treatment controlled clinical study.格列美脲与玉烛散联用和单用格列美脲的药代动力学比较:一项开放标签、单序列、双治疗对照临床研究。
Sci Rep. 2025 Jul 16;15(1):25813. doi: 10.1038/s41598-025-09317-z.

本文引用的文献

1
Introduction to Traditional Medicine and Their Role in Prevention and Treatment of Emerging and Re-Emerging Diseases.传统医学概论及其在新发和再发传染病的预防和治疗中的作用。
Biomolecules. 2022 Oct 9;12(10):1442. doi: 10.3390/biom12101442.
2
Comparative Toxicogenomics Database (CTD): update 2023.比较毒理学基因组数据库(CTD):2023 年更新。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1257-D1262. doi: 10.1093/nar/gkac833.
3
From Traditional Ethnopharmacology to Modern Natural Drug Discovery: A Methodology Discussion and Specific Examples.
从传统民族药理学到现代天然药物发现:方法学讨论及具体实例。
Molecules. 2022 Jun 24;27(13):4060. doi: 10.3390/molecules27134060.
4
Plant phenotype relationship corpus for biomedical relationships between plants and phenotypes.植物表型关系语料库,用于描述植物和表型之间的生物医学关系。
Sci Data. 2022 May 26;9(1):235. doi: 10.1038/s41597-022-01350-1.
5
Leveraging a Joint learning Model to Extract Mixture Symptom Mentions from Traditional Chinese Medicine Clinical Notes.利用联合学习模型从中医临床记录中提取混合症状提及。
Biomed Res Int. 2022 Mar 8;2022:2146236. doi: 10.1155/2022/2146236. eCollection 2022.
6
An annotated corpus from biomedical articles to construct a drug-food interaction database.一个来自生物医学文章的带注释语料库,用于构建药物-食物相互作用数据库。
J Biomed Inform. 2022 Feb;126:103985. doi: 10.1016/j.jbi.2022.103985. Epub 2022 Jan 7.
7
A Core Drug Discovery Framework from Large-Scale Literature for Cold Pathogenic Disease Treatment in Traditional Chinese Medicine.基于大规模文献的寒病治疗中药核心药物发现框架
J Healthc Eng. 2021 Aug 4;2021:9930543. doi: 10.1155/2021/9930543. eCollection 2021.
8
Mining a stroke knowledge graph from literature.从文献中挖掘中风知识图谱。
BMC Bioinformatics. 2021 Jul 29;22(Suppl 10):387. doi: 10.1186/s12859-021-04292-4.
9
AI-based language models powering drug discovery and development.基于人工智能的语言模型推动药物发现和开发。
Drug Discov Today. 2021 Nov;26(11):2593-2607. doi: 10.1016/j.drudis.2021.06.009. Epub 2021 Jun 30.
10
Model-Based Reasoning of Clinical Diagnosis in Integrative Medicine: Real-World Methodological Study of Electronic Medical Records and Natural Language Processing Methods.中西医结合临床诊断的基于模型的推理:电子病历与自然语言处理方法的真实世界方法学研究
JMIR Med Inform. 2020 Dec 21;8(12):e23082. doi: 10.2196/23082.