文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

通过词嵌入类比任务预测药物-基因关系。

Predicting drug-gene relations via analogy tasks with word embeddings.

作者信息

Yamagiwa Hiroaki, Hashimoto Ryoma, Arakane Kiwamu, Murakami Ken, Soeda Shou, Oyama Momose, Zhu Yihua, Okada Mariko, Shimodaira Hidetoshi

机构信息

Kyoto University, Kyoto, Japan.

Recruit Co., Ltd., Tokyo, Japan.

出版信息

Sci Rep. 2025 May 18;15(1):17240. doi: 10.1038/s41598-025-01418-z.


DOI:10.1038/s41598-025-01418-z
PMID:40383732
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12086191/
Abstract

Natural language processing is utilized in a wide range of fields, where words in text are typically transformed into feature vectors called embeddings. BioConceptVec is a specific example of embeddings tailored for biology, trained on approximately 30 million PubMed abstracts using models such as skip-gram. Generally, word embeddings are known to solve analogy tasks through simple vector arithmetic. For example, subtracting the vector for man from that of king and then adding the vector for woman yields a point that lies closer to queen in the embedding space. In this study, we demonstrate that BioConceptVec embeddings, along with our own embeddings trained on PubMed abstracts, contain information about drug-gene relations and can predict target genes from a given drug through analogy computations. We also show that categorizing drugs and genes using biological pathways improves performance. Furthermore, we illustrate that vectors derived from known relations in the past can predict unknown future relations in datasets divided by year. Despite the simplicity of implementing analogy tasks as vector additions, our approach demonstrated performance comparable to that of large language models such as GPT-4 in predicting drug-gene relations.

摘要

自然语言处理在广泛的领域中得到应用,在这些领域中,文本中的单词通常会被转换为称为嵌入的特征向量。BioConceptVec是专门为生物学量身定制的嵌入的一个具体例子,它使用诸如skip-gram等模型在大约3000万篇PubMed摘要上进行训练。一般来说,词嵌入已知通过简单的向量运算来解决类比任务。例如,从“国王”的向量中减去“男人”的向量,然后加上“女人”的向量,会在嵌入空间中得到一个更接近“女王”的点。在本研究中,我们证明BioConceptVec嵌入以及我们自己在PubMed摘要上训练的嵌入包含有关药物-基因关系的信息,并且可以通过类比计算从给定药物预测靶基因。我们还表明,使用生物途径对药物和基因进行分类可以提高性能。此外,我们说明从过去的已知关系派生的向量可以预测按年份划分的数据集中未知的未来关系。尽管将类比任务实现为向量加法很简单,但我们的方法在预测药物-基因关系方面表现出与GPT-4等大型语言模型相当的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2aba/12086191/5abb485e034b/41598_2025_1418_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2aba/12086191/cf531207777d/41598_2025_1418_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2aba/12086191/5505a1987525/41598_2025_1418_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2aba/12086191/216c038b9183/41598_2025_1418_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2aba/12086191/5abb485e034b/41598_2025_1418_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2aba/12086191/cf531207777d/41598_2025_1418_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2aba/12086191/5505a1987525/41598_2025_1418_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2aba/12086191/216c038b9183/41598_2025_1418_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2aba/12086191/5abb485e034b/41598_2025_1418_Fig4_HTML.jpg

相似文献

[1]
Predicting drug-gene relations via analogy tasks with word embeddings.

Sci Rep. 2025-5-18

[2]
BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.

PLoS Comput Biol. 2020-4-23

[3]
Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases.

BMC Med Inform Decis Mak. 2018-7-23

[4]
A comparison of word embeddings for the biomedical natural language processing.

J Biomed Inform. 2018-9-12

[5]
Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases.

JMIR Med Inform. 2020-8-6

[6]
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.

PLoS One. 2021

[7]
Improved biomedical word embeddings in the transformer era.

J Biomed Inform. 2021-8

[8]
Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research.

AMIA Annu Symp Proc. 2018-12-5

[9]
Domain specific word embeddings for natural language processing in radiology.

J Biomed Inform. 2021-1

[10]
Fine-Tuning Word Embeddings for Hierarchical Representation of Data Using a Corpus and a Knowledge Base for Various Machine Learning Applications.

Comput Math Methods Med. 2021

本文引用的文献

[1]
KEGG: biological systems database as a model of the real world.

Nucleic Acids Res. 2025-1-6

[2]
Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining.

BMC Bioinformatics. 2023-12-19

[3]
Predicting drug characteristics using biomedical text embedding.

BMC Bioinformatics. 2022-12-7

[4]
BISC: accurate inference of transcriptional bursting kinetics from single-cell transcriptomic data.

Brief Bioinform. 2022-11-19

[5]
BioGPT: generative pre-trained transformer for biomedical text generation and mining.

Brief Bioinform. 2022-11-19

[6]
ASURAT: functional annotation-driven unsupervised clustering of single-cell transcriptomes.

Bioinformatics. 2022-9-15

[7]
Poziotinib for EGFR exon 20-mutant NSCLC: Clinical efficacy, resistance mechanisms, and impact of insertion location on drug sensitivity.

Cancer Cell. 2022-7-11

[8]
The JAK/STAT signaling pathway: from bench to clinic.

Signal Transduct Target Ther. 2021-11-26

[9]
Bringing Light Into the Dark: A Large-Scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework.

IEEE Trans Pattern Anal Mach Intell. 2022-12

[10]
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.

PLoS One. 2021

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索