• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过基因本体术语的定义来测量其语义相似性的词和句子嵌入工具。

Word and Sentence Embedding Tools to Measure Semantic Similarity of Gene Ontology Terms by Their Definitions.

作者信息

Duong Dat, Ahmad Wasi Uddin, Eskin Eleazar, Chang Kai-Wei, Li Jingyi Jessica

机构信息

1 Department of Computer Science, University of California, Los Angeles, California.

2 Department of Human Genetics, and University of California, Los Angeles, California.

出版信息

J Comput Biol. 2019 Jan;26(1):38-52. doi: 10.1089/cmb.2018.0093. Epub 2018 Oct 31.

DOI:10.1089/cmb.2018.0093
PMID:30383443
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6350067/
Abstract

The gene ontology (GO) database contains GO terms that describe biological functions of genes. Previous methods for comparing GO terms have relied on the fact that GO terms are organized into a tree structure. Under this paradigm, the locations of two GO terms in the tree dictate their similarity score. In this article, we introduce two new solutions for this problem by focusing instead on the definitions of the GO terms. We apply neural network-based techniques from the natural language processing (NLP) domain. The first method does not rely on the GO tree, whereas the second indirectly depends on the GO tree. In our first approach, we compare two GO definitions by treating them as two unordered sets of words. The word similarity is estimated by a word embedding model that maps words into an N-dimensional space. In our second approach, we account for the word-ordering within a sentence. We use a sentence encoder to embed GO definitions into vectors and estimate how likely one definition entails another. We validate our methods in two ways. In the first experiment, we test the model's ability to differentiate a true protein-protein network from a randomly generated network. In the second experiment, we test the model in identifying orthologs from randomly matched genes in human, mouse, and fly. In both experiments, a hybrid of NLP and GO tree-based method achieves the best classification accuracy.

摘要

基因本体论(GO)数据库包含描述基因生物学功能的GO术语。先前比较GO术语的方法依赖于GO术语被组织成树状结构这一事实。在这种范式下,树中两个GO术语的位置决定了它们的相似性得分。在本文中,我们通过关注GO术语的定义,为这个问题引入了两种新的解决方案。我们应用了自然语言处理(NLP)领域基于神经网络的技术。第一种方法不依赖于GO树,而第二种方法间接依赖于GO树。在我们的第一种方法中,我们将两个GO定义视为两个无序的单词集合来进行比较。单词相似度由一个将单词映射到N维空间的词嵌入模型来估计。在我们的第二种方法中,我们考虑句子中的单词顺序。我们使用句子编码器将GO定义嵌入到向量中,并估计一个定义蕴含另一个定义的可能性。我们通过两种方式验证我们的方法。在第一个实验中,我们测试模型区分真实蛋白质-蛋白质网络和随机生成网络的能力。在第二个实验中,我们测试模型从人类、小鼠和果蝇中随机匹配的基因中识别直系同源基因的能力。在这两个实验中,NLP和基于GO树的方法的混合方法都取得了最佳的分类准确率。

相似文献

1
Word and Sentence Embedding Tools to Measure Semantic Similarity of Gene Ontology Terms by Their Definitions.通过基因本体术语的定义来测量其语义相似性的词和句子嵌入工具。
J Comput Biol. 2019 Jan;26(1):38-52. doi: 10.1089/cmb.2018.0093. Epub 2018 Oct 31.
2
simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes.simDEF:用于基因功能相似性分析的基于定义的基因本体术语语义相似性度量。
Bioinformatics. 2016 May 1;32(9):1380-7. doi: 10.1093/bioinformatics/btv755. Epub 2015 Dec 26.
3
A relation based measure of semantic similarity for Gene Ontology annotations.一种基于关系的基因本体注释语义相似度度量方法。
BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468.
4
TopoICSim: a new semantic similarity measure based on gene ontology.TopoICSim:一种基于基因本体论的新语义相似性度量方法。
BMC Bioinformatics. 2016 Jul 29;17(1):296. doi: 10.1186/s12859-016-1160-0.
5
Assessment of Semantic Similarity between Proteins Using Information Content and Topological Properties of the Gene Ontology Graph.使用信息内容和基因本体论图的拓扑属性评估蛋白质之间的语义相似性。
IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):839-849. doi: 10.1109/TCBB.2017.2689762. Epub 2017 Mar 31.
6
Interspecies gene function prediction using semantic similarity.基于语义相似性的跨物种基因功能预测
BMC Syst Biol. 2016 Dec 23;10(Suppl 4):121. doi: 10.1186/s12918-016-0361-5.
7
Multi-Factored Gene-Gene Proximity Measures Exploiting Biological Knowledge Extracted from Gene Ontology: Application in Gene Clustering.多因素基因-基因邻近度度量方法,利用从基因本体论中提取的生物学知识:在基因聚类中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jan-Feb;17(1):207-219. doi: 10.1109/TCBB.2018.2849362. Epub 2018 Jun 21.
8
GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.GO 功能相似性聚类取决于相似性度量、聚类方法和注释完整性。
BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2.
9
Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.通过结合基因本体注释和基因共功能网络来测量语义相似性。
BMC Bioinformatics. 2015 Feb 14;16:44. doi: 10.1186/s12859-015-0474-7.
10
An improved method for functional similarity analysis of genes based on Gene Ontology.一种基于基因本体论的基因功能相似性分析的改进方法。
BMC Syst Biol. 2016 Dec 23;10(Suppl 4):119. doi: 10.1186/s12918-016-0359-z.

引用本文的文献

1
Pathway Analysis Interpretation in the Multi-Omic Era.多组学时代的通路分析解读
BioTech (Basel). 2025 Jul 29;14(3):58. doi: 10.3390/biotech14030058.
2
iEnhancer-GDM: A Deep Learning Framework Based on Generative Adversarial Network and Multi-head Attention Mechanism to Identify Enhancers and Their Strength.iEnhancer-GDM:一种基于生成对抗网络和多头注意力机制的深度学习框架,用于识别增强子及其强度。
Interdiscip Sci. 2025 May 7. doi: 10.1007/s12539-025-00703-9.
3
Simple and effective embedding model for single-cell biology built from ChatGPT.基于ChatGPT构建的用于单细胞生物学的简单有效嵌入模型。
Nat Biomed Eng. 2025 Apr;9(4):483-493. doi: 10.1038/s41551-024-01284-6. Epub 2024 Dec 6.
4
Interpreting and visualizing pathway analyses using embedding representations with PAVER.使用PAVER的嵌入表示法解释和可视化通路分析。
Bioinformation. 2024 Jul 31;20(7):700-704. doi: 10.6026/973206300200700. eCollection 2024.
5
GenePT: A Simple But Effective Foundation Model for Genes and Cells Built From ChatGPT.GenePT:一种基于ChatGPT构建的用于基因和细胞的简单而有效的基础模型。
bioRxiv. 2024 Mar 5:2023.10.16.562533. doi: 10.1101/2023.10.16.562533.
6
Using language models and ontology topology to perform semantic mapping of traits between biomedical datasets.利用语言模型和本体拓扑结构对生物医学数据集之间的特征进行语义映射。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad169.
7
The positive energy of netizens: development and application of fine-grained sentiment lexicon and emotional intensity model.网民正能量:细粒度情感词典与情感强度模型的发展与应用
Curr Psychol. 2022 Nov 3:1-18. doi: 10.1007/s12144-022-03876-4.
8
InfersentPPI: Prediction of Protein-Protein Interaction Using Protein Sentence Embedding With Gene Ontology Information.InfersentPPI:利用带有基因本体信息的蛋白质句子嵌入预测蛋白质-蛋白质相互作用
Front Genet. 2022 Mar 28;13:827540. doi: 10.3389/fgene.2022.827540. eCollection 2022.
9
DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.DNA 序列通过利用深度学习算法进行自然语言处理,用于识别 N4-甲基胞嘧啶。
Sci Rep. 2021 Jan 8;11(1):212. doi: 10.1038/s41598-020-80430-x.
10
Graph embeddings on gene ontology annotations for protein-protein interaction prediction.基于基因本体论注释的图嵌入在蛋白质相互作用预测中的应用。
BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):560. doi: 10.1186/s12859-020-03816-8.

本文引用的文献

1
Co-expression networks reveal the tissue-specific regulation of transcription and splicing.共表达网络揭示了转录和剪接的组织特异性调控。
Genome Res. 2017 Nov;27(11):1843-1858. doi: 10.1101/gr.216721.116. Epub 2017 Oct 11.
2
Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes.将荟萃分析应用于来自多个组织的基因型-组织表达数据,以识别表达数量性状基因座(eQTL)并增加表达基因(eGenes)的数量。
Bioinformatics. 2017 Jul 15;33(14):i67-i74. doi: 10.1093/bioinformatics/btx227.
3
Expansion of the Gene Ontology knowledgebase and resources.基因本体知识库及资源的扩展。
Nucleic Acids Res. 2017 Jan 4;45(D1):D331-D338. doi: 10.1093/nar/gkw1108. Epub 2016 Nov 29.
4
Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery.基因本体语义相似性工具:生物知识发现的特征与挑战综述
Brief Bioinform. 2017 Sep 1;18(5):886-901. doi: 10.1093/bib/bbw067.
5
simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes.simDEF:用于基因功能相似性分析的基于定义的基因本体术语语义相似性度量。
Bioinformatics. 2016 May 1;32(9):1380-7. doi: 10.1093/bioinformatics/btv755. Epub 2015 Dec 26.
6
Measure the Semantic Similarity of GO Terms Using Aggregate Information Content.使用聚合信息内容测量基因本体术语的语义相似性。
IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):468-76. doi: 10.1109/TCBB.2013.176.
7
Information content-based Gene Ontology functional similarity measures: which one to use for a given biological data type?基于信息内容的基因本体功能相似性度量:对于给定的生物数据类型应使用哪一种?
PLoS One. 2014 Dec 4;9(12):e113859. doi: 10.1371/journal.pone.0113859. eCollection 2014.
8
GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology.GOssTo:一个独立的应用程序和一个网络工具,用于计算基因本体论上的语义相似度。
Bioinformatics. 2014 Aug 1;30(15):2235-6. doi: 10.1093/bioinformatics/btu144. Epub 2014 Mar 22.
9
A topology-based metric for measuring term similarity in the gene ontology.一种用于衡量基因本体中术语相似性的基于拓扑结构的度量方法。
Adv Bioinformatics. 2012;2012:975783. doi: 10.1155/2012/975783. Epub 2012 May 15.
10
Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty.通过探索术语下的本体和建模不确定性来改进 GO 语义相似性度量。
Bioinformatics. 2012 May 15;28(10):1383-9. doi: 10.1093/bioinformatics/bts129. Epub 2012 Apr 19.