• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用文本构建药物基因组学的语义网络。

Using text to build semantic networks for pharmacogenomics.

机构信息

Department of Medicine, 300 Pasteur Drive, Room S101, Mail Code 5110, Stanford University, Stanford, CA 94305, USA.

出版信息

J Biomed Inform. 2010 Dec;43(6):1009-19. doi: 10.1016/j.jbi.2010.08.005. Epub 2010 Aug 17.

DOI:10.1016/j.jbi.2010.08.005
PMID:20723615
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2991587/
Abstract

Most pharmacogenomics knowledge is contained in the text of published studies, and is thus not available for automated computation. Natural Language Processing (NLP) techniques for extracting relationships in specific domains often rely on hand-built rules and domain-specific ontologies to achieve good performance. In a new and evolving field such as pharmacogenomics (PGx), rules and ontologies may not be available. Recent progress in syntactic NLP parsing in the context of a large corpus of pharmacogenomics text provides new opportunities for automated relationship extraction. We describe an ontology of PGx relationships built starting from a lexicon of key pharmacogenomic entities and a syntactic parse of more than 87 million sentences from 17 million MEDLINE abstracts. We used the syntactic structure of PGx statements to systematically extract commonly occurring relationships and to map them to a common schema. Our extracted relationships have a 70-87.7% precision and involve not only key PGx entities such as genes, drugs, and phenotypes (e.g., VKORC1, warfarin, clotting disorder), but also critical entities that are frequently modified by these key entities (e.g., VKORC1 polymorphism, warfarin response, clotting disorder treatment). The result of our analysis is a network of 40,000 relationships between more than 200 entity types with clear semantics. This network is used to guide the curation of PGx knowledge and provide a computable resource for knowledge discovery.

摘要

大多数药物基因组学知识都包含在已发表研究的文本中,因此无法进行自动化计算。提取特定领域关系的自然语言处理 (NLP) 技术通常依赖于手工构建的规则和领域特定的本体论来实现良好的性能。在药物基因组学 (PGx) 等新兴和不断发展的领域,可能没有规则和本体论。句法 NLP 解析在大量药物基因组学文本语料库中的最新进展为自动化关系提取提供了新的机会。我们从关键药物基因组实体的词汇表和来自 1700 万篇 MEDLINE 摘要的超过 8700 万条句子的句法解析开始,构建了一个 PGx 关系本体论。我们使用 PGx 语句的句法结构系统地提取常见的关系,并将其映射到一个通用模式。我们提取的关系具有 70-87.7%的精度,不仅涉及基因、药物和表型等关键 PGx 实体(例如 VKORC1、华法林、凝血障碍),还涉及经常被这些关键实体修饰的关键实体(例如 VKORC1 多态性、华法林反应、凝血障碍治疗)。我们分析的结果是一个由 200 多种实体类型组成的 40000 个关系网络,具有明确的语义。该网络用于指导 PGx 知识的策展,并为知识发现提供可计算资源。

相似文献

1
Using text to build semantic networks for pharmacogenomics.利用文本构建药物基因组学的语义网络。
J Biomed Inform. 2010 Dec;43(6):1009-19. doi: 10.1016/j.jbi.2010.08.005. Epub 2010 Aug 17.
2
PGxCorpus, a manually annotated corpus for pharmacogenomics.PGxCorpus,一个用于药物基因组学的人工标注语料库。
Sci Data. 2020 Jan 2;7(1):3. doi: 10.1038/s41597-019-0342-9.
3
A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text.基于知识的条件方法从自由文本中提取药物基因组学特定的药物-基因关系。
J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.
4
Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics.使用随机索引推断本体中词汇的语义关系:在药物基因组学中的应用
AMIA Annu Symp Proc. 2013 Nov 16;2013:1123-32. eCollection 2013.
5
Towards pharmacogenomics knowledge discovery with the semantic web.迈向利用语义网进行药物基因组学知识发现
Brief Bioinform. 2009 Mar;10(2):153-63. doi: 10.1093/bib/bbn056. Epub 2009 Feb 24.
6
An iterative searching and ranking algorithm for prioritising pharmacogenomics genes.一种用于对药物基因组学基因进行优先级排序的迭代搜索和排名算法。
Int J Comput Biol Drug Des. 2013;6(1-2):18-31. doi: 10.1504/IJCBDD.2013.052199. Epub 2013 Feb 21.
7
PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison.PGxO 和 PGxLOD:对各种来源的药物基因组学知识进行协调,从而实现进一步比较。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):139. doi: 10.1186/s12859-019-2693-9.
8
The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text.自然语言处理中领域知识与语言结构的相互作用:解读生物医学文本中的上位命题
J Biomed Inform. 2003 Dec;36(6):462-77. doi: 10.1016/j.jbi.2003.11.003.
9
Applying semantic-based probabilistic context-free grammar to medical language processing--a preliminary study on parsing medication sentences.应用基于语义的概率上下文无关语法进行医学语言处理——解析药物句子的初步研究。
J Biomed Inform. 2011 Dec;44(6):1068-75. doi: 10.1016/j.jbi.2011.08.009. Epub 2011 Aug 12.
10
Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation.利用数据驱动的子语言模式挖掘来诱导知识模型:在医学图像报告知识表示中的应用。
BMC Med Inform Decis Mak. 2018 Jul 6;18(1):61. doi: 10.1186/s12911-018-0645-3.

引用本文的文献

1
Better understanding the phenotypic effects of drugs through shared targets in genetic disease networks.通过遗传疾病网络中的共享靶点更好地理解药物的表型效应。
Front Pharmacol. 2025 Jan 22;15:1470931. doi: 10.3389/fphar.2024.1470931. eCollection 2024.
2
Advancing pharmacogenomics research: automated extraction of insights from PubMed using SpaCy NLP framework.推进药物基因组学研究:使用SpaCy自然语言处理框架从PubMed中自动提取见解。
Pharmacogenomics. 2024;25(14-15):573-578. doi: 10.1080/14622416.2024.2429946. Epub 2024 Nov 20.
3
RelCurator: a text mining-based curation system for extracting gene-phenotype relationships specific to neurodegenerative disorders.

本文引用的文献

1
Building disease-specific drug-protein connectivity maps from molecular interaction networks and PubMed abstracts.从分子相互作用网络和PubMed摘要构建疾病特异性药物-蛋白质连接图谱。
PLoS Comput Biol. 2009 Jul;5(7):e1000450. doi: 10.1371/journal.pcbi.1000450. Epub 2009 Jul 31.
2
Towards pharmacogenomics knowledge discovery with the semantic web.迈向利用语义网进行药物基因组学知识发现
Brief Bioinform. 2009 Mar;10(2):153-63. doi: 10.1093/bib/bbn056. Epub 2009 Feb 24.
3
Empirical distributional semantics: methods and biomedical applications.
RelCurator:一种基于文本挖掘的策展系统,用于提取特定于神经退行性疾病的基因-表型关系。
Genes Genomics. 2023 Aug;45(8):1025-1036. doi: 10.1007/s13258-023-01405-6. Epub 2023 Jun 10.
4
PharmGKB, an Integrated Resource of Pharmacogenomic Knowledge.PharmGKB,一个综合性的药物基因组学知识库。
Curr Protoc. 2021 Aug;1(8):e226. doi: 10.1002/cpz1.226.
5
Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective.使用通用数据模型实现便携式临床自然语言处理系统——从Lisp的角度来看
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2018 Dec;2018:461-466. doi: 10.1109/bibm.2018.8621521. Epub 2019 Jan 24.
6
Named Entity Recognition and Relation Detection for Biomedical Information Extraction.用于生物医学信息提取的命名实体识别与关系检测
Front Cell Dev Biol. 2020 Aug 28;8:673. doi: 10.3389/fcell.2020.00673. eCollection 2020.
7
AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature.AMELIE 通过将患者的表型和基因型与原始文献相匹配,加速孟德尔遗传病的诊断。
Sci Transl Med. 2020 May 20;12(544). doi: 10.1126/scitranslmed.aau9113.
8
PGxCorpus, a manually annotated corpus for pharmacogenomics.PGxCorpus,一个用于药物基因组学的人工标注语料库。
Sci Data. 2020 Jan 2;7(1):3. doi: 10.1038/s41597-019-0342-9.
9
Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer's disease.结合实体共现与专业词向量来衡量阿尔茨海默病中的实体关系。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):240. doi: 10.1186/s12911-019-0934-5.
10
Natural Language Processing for EHR-Based Computational Phenotyping.基于电子健康记录的自然语言处理计算表型。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25.
实证分布语义学:方法与生物医学应用
J Biomed Inform. 2009 Apr;42(2):390-405. doi: 10.1016/j.jbi.2009.02.002. Epub 2009 Feb 14.
4
Querying parse tree database of Medline text to synthesize user-specific biomolecular networks.
Pac Symp Biocomput. 2009:87-98.
5
Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text.Pharmspresso:一种用于从全文中提取药物基因组学概念和关系的文本挖掘工具。
BMC Bioinformatics. 2009 Feb 5;10 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-10-S2-S6.
6
Unsupervised method for automatic construction of a disease dictionary from a large free text collection.一种从大型自由文本集合中自动构建疾病词典的无监督方法。
AMIA Annu Symp Proc. 2008 Nov 6;2008:820-4.
7
OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression.OpenDMAP:一个开源的、由本体驱动的概念分析引擎,应用于捕获有关蛋白质转运、蛋白质相互作用和细胞类型特异性基因表达的知识。
BMC Bioinformatics. 2008 Jan 31;9:78. doi: 10.1186/1471-2105-9-78.
8
Extracting semantic predications from Medline citations for pharmacogenomics.从医学文献数据库(Medline)引用中提取药物基因组学的语义谓词。
Pac Symp Biocomput. 2007:209-20.
9
RelEx--relation extraction using dependency parse trees.RelEx——使用依存句法分析树进行关系抽取。
Bioinformatics. 2007 Feb 1;23(3):365-71. doi: 10.1093/bioinformatics/btl616. Epub 2006 Dec 1.
10
Extraction of regulatory gene/protein networks from Medline.从医学在线数据库中提取调控基因/蛋白质网络。
Bioinformatics. 2006 Mar 15;22(6):645-50. doi: 10.1093/bioinformatics/bti597. Epub 2005 Jul 26.