• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PGxCorpus,一个用于药物基因组学的人工标注语料库。

PGxCorpus, a manually annotated corpus for pharmacogenomics.

机构信息

Université de Lorraine, CNRS, Inria, LORIA, Nancy, France.

Sorbonne Université, INSERM, Université Paris 13, LIMICS, Paris, France.

出版信息

Sci Data. 2020 Jan 2;7(1):3. doi: 10.1038/s41597-019-0342-9.

DOI:10.1038/s41597-019-0342-9
PMID:31896797
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6940385/
Abstract

Pharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes PGx-related knowledge a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly reusable by humans or software. Natural language processing techniques have been developed to guide experts who curate this amount of knowledge. But existing works are limited by the absence of a high quality annotated corpus focusing on PGx domain. In particular, this absence restricts the use of supervised machine learning. This article introduces PGxCorpus, a manually annotated corpus, designed to fill this gap and to enable the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly gene variations, genes, drugs and phenotypes), and relationships between those. In this article, we present the corpus itself, its construction and a baseline experiment that illustrates how it may be leveraged to synthesize and summarize PGx knowledge.

摘要

药物基因组学(PGx)研究个体基因变异如何影响药物反应表型,这使得 PGx 相关知识成为精准医学的关键组成部分。PGx 领域的最新知识很大一部分都积累在科学出版物中,人类或软件很难从中重复利用。已经开发了自然语言处理技术来指导整理这些知识的专家。但是,现有的工作受到缺乏专注于 PGx 领域的高质量标注语料库的限制。特别是,这种缺乏限制了监督机器学习的使用。本文介绍了 PGxCorpus,这是一个手动标注的语料库,旨在填补这一空白,并能够从文本中自动提取 PGx 关系。它包含 911 篇 PubMed 摘要中的 945 个句子,标注了感兴趣的 PGx 实体(主要是基因变异、基因、药物和表型)以及它们之间的关系。在本文中,我们介绍了语料库本身、它的构建以及一个基线实验,该实验说明了如何利用它来综合和总结 PGx 知识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/44374d2d476d/41597_2019_342_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/cbeab6fc8a10/41597_2019_342_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/538a213b8315/41597_2019_342_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/ca29fc558dc4/41597_2019_342_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/5d92ad77147c/41597_2019_342_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/44374d2d476d/41597_2019_342_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/cbeab6fc8a10/41597_2019_342_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/538a213b8315/41597_2019_342_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/ca29fc558dc4/41597_2019_342_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/5d92ad77147c/41597_2019_342_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccaf/6940385/44374d2d476d/41597_2019_342_Fig5_HTML.jpg

相似文献

1
PGxCorpus, a manually annotated corpus for pharmacogenomics.PGxCorpus,一个用于药物基因组学的人工标注语料库。
Sci Data. 2020 Jan 2;7(1):3. doi: 10.1038/s41597-019-0342-9.
2
Using text to build semantic networks for pharmacogenomics.利用文本构建药物基因组学的语义网络。
J Biomed Inform. 2010 Dec;43(6):1009-19. doi: 10.1016/j.jbi.2010.08.005. Epub 2010 Aug 17.
3
PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison.PGxO 和 PGxLOD:对各种来源的药物基因组学知识进行协调,从而实现进一步比较。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):139. doi: 10.1186/s12859-019-2693-9.
4
A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text.基于知识的条件方法从自由文本中提取药物基因组学特定的药物-基因关系。
J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.
5
A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine.一种从生物医学文献中提取用于个性化医疗的药物-基因对的半监督方法。
J Biomed Inform. 2013 Aug;46(4):585-93. doi: 10.1016/j.jbi.2013.04.001. Epub 2013 Apr 6.
6
An iterative searching and ranking algorithm for prioritising pharmacogenomics genes.一种用于对药物基因组学基因进行优先级排序的迭代搜索和排名算法。
Int J Comput Biol Drug Des. 2013;6(1-2):18-31. doi: 10.1504/IJCBDD.2013.052199. Epub 2013 Feb 21.
7
Simplifying the use of pharmacogenomics in clinical practice: Building the genomic prescribing system.简化临床实践中药物基因组学的应用:建立基因组处方系统。
J Biomed Inform. 2017 Nov;75:110-121. doi: 10.1016/j.jbi.2017.09.012. Epub 2017 Sep 28.
8
Systematic identification of pharmacogenomics information from clinical trials.从临床试验中系统地识别药物基因组学信息。
J Biomed Inform. 2012 Oct;45(5):870-8. doi: 10.1016/j.jbi.2012.04.005. Epub 2012 Apr 24.
9
An annotated dataset for extracting gene-melanoma relations from scientific literature.从科学文献中提取基因-黑色素瘤关系的带注释数据集。
J Biomed Semantics. 2022 Jan 19;13(1):2. doi: 10.1186/s13326-021-00251-3.
10
Supervised Relation Extraction Between Suicide-Related Entities and Drugs: Development and Usability Study of an Annotated PubMed Corpus.基于标注 PubMed 语料库的自杀相关实体与药物间监督关系抽取:开发与可用性研究
J Med Internet Res. 2023 Mar 8;25:e41100. doi: 10.2196/41100.

引用本文的文献

1
A large language model framework for literature-based disease-gene association prediction.一种基于文献的疾病-基因关联预测的大语言模型框架。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf070.
2
VAIV bio-discovery service using transformer model and retrieval augmented generation.基于 Transformer 模型和检索增强生成的 VAIV 生物发现服务。
BMC Bioinformatics. 2024 Aug 21;25(1):273. doi: 10.1186/s12859-024-05903-6.
3
PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.PubTator 3.0:一款人工智能驱动的文献资源,用于解锁生物医学知识。

本文引用的文献

1
PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison.PGxO 和 PGxLOD:对各种来源的药物基因组学知识进行协调,从而实现进一步比较。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):139. doi: 10.1186/s12859-019-2693-9.
2
Annotation and detection of drug effects in text for pharmacovigilance.用于药物警戒的文本中药物效应的标注与检测。
J Cheminform. 2018 Aug 13;10(1):37. doi: 10.1186/s13321-018-0290-y.
3
Automated Metabolic Phenotyping of Cytochrome Polymorphisms Using PubMed Abstract Mining.
Nucleic Acids Res. 2024 Jul 5;52(W1):W540-W546. doi: 10.1093/nar/gkae235.
4
Extract antibody and antigen names from biomedical literature.从生物医学文献中提取抗体和抗原名称。
BMC Bioinformatics. 2022 Dec 6;23(1):524. doi: 10.1186/s12859-022-04993-4.
5
The OpenDeID corpus for patient de-identification.OpenDeID 患者去识别语料库。
Sci Rep. 2021 Oct 7;11(1):19973. doi: 10.1038/s41598-021-99554-9.
6
Extracting Concepts for Precision Oncology from the Biomedical Literature.从生物医学文献中提取精准肿瘤学概念。
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:276-285. eCollection 2021.
利用PubMed摘要挖掘技术对细胞色素多态性进行自动代谢表型分析
AMIA Annu Symp Proc. 2018 Apr 16;2017:535-544. eCollection 2017.
4
Pharmacogenomics steps toward personalized medicine.药物基因组学迈向个性化医疗的步骤。
Per Med. 2005 Nov;2(4):325-337. doi: 10.2217/17410541.2.4.325.
5
Deep learning of mutation-gene-drug relations from the literature.从文献中深度学习突变-基因-药物关系。
BMC Bioinformatics. 2018 Jan 25;19(1):21. doi: 10.1186/s12859-018-2029-1.
6
SNPPhenA: a corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature.SNPPhenA:一个用于从文献中提取单核苷酸多态性与表型的排序关联的语料库。
J Biomed Semantics. 2017 Apr 7;8(1):14. doi: 10.1186/s13326-017-0116-2.
7
BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations.BRONCO:用于提取基因-变异-疾病-药物关系的生物医学实体关系肿瘤语料库。
Database (Oxford). 2016 Apr 13;2016. doi: 10.1093/database/baw043. Print 2016.
8
Learning the Structure of Biomedical Relationships from Unstructured Text.从非结构化文本中学习生物医学关系的结构
PLoS Comput Biol. 2015 Jul 28;11(7):e1004216. doi: 10.1371/journal.pcbi.1004216. eCollection 2015 Jul.
9
Cadec: A corpus of adverse drug event annotations.Cadec:一个药物不良事件注释语料库。
J Biomed Inform. 2015 Jun;55:73-81. doi: 10.1016/j.jbi.2015.03.010. Epub 2015 Mar 27.
10
Prediction of drug gene associations via ontological profile similarity with application to drug repositioning.通过本体特征相似性预测药物-基因关联及其在药物重新定位中的应用
Methods. 2015 Mar;74:71-82. doi: 10.1016/j.ymeth.2014.11.017. Epub 2014 Dec 8.