• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TopoICSim:一种基于基因本体论的新语义相似性度量方法。

TopoICSim: a new semantic similarity measure based on gene ontology.

作者信息

Ehsani Rezvan, Drabløs Finn

机构信息

Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, P.O. Box 8905, NO-7491, Trondheim, Norway.

Department of Mathematics, University of Zabol, Zabol, Iran.

出版信息

BMC Bioinformatics. 2016 Jul 29;17(1):296. doi: 10.1186/s12859-016-1160-0.

DOI:10.1186/s12859-016-1160-0
PMID:27473391
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4966780/
Abstract

BACKGROUND

The Gene Ontology (GO) is a dynamic, controlled vocabulary that describes the cellular function of genes and proteins according to tree major categories: biological process, molecular function and cellular component. It has become widely used in many bioinformatics applications for annotating genes and measuring their semantic similarity, rather than their sequence similarity. Generally speaking, semantic similarity measures involve the GO tree topology, information content of GO terms, or a combination of both.

RESULTS

Here we present a new semantic similarity measure called TopoICSim (Topological Information Content Similarity) which uses information on the specific paths between GO terms based on the topology of the GO tree, and the distribution of information content along these paths. The TopoICSim algorithm was evaluated on two human benchmark datasets based on KEGG pathways and Pfam domains grouped as clans, using GO terms from either the biological process or molecular function. The performance of the TopoICSim measure compared favorably to five existing methods. Furthermore, the TopoICSim similarity was also tested on gene/protein sets defined by correlated gene expression, using three human datasets, and showed improved performance compared to two previously published similarity measures. Finally we used an online benchmarking resource which evaluates any similarity measure against a set of 11 similarity measures in three tests, using gene/protein sets based on sequence similarity, Pfam domains, and enzyme classifications. The results for TopoICSim showed improved performance relative to most of the measures included in the benchmarking, and in particular a very robust performance throughout the different tests.

CONCLUSIONS

The TopoICSim similarity measure provides a competitive method with robust performance for quantification of semantic similarity between genes and proteins based on GO annotations. An R script for TopoICSim is available at http://bigr.medisin.ntnu.no/tools/TopoICSim.R .

摘要

背景

基因本体论(GO)是一个动态的、受控的词汇表,它根据三个主要类别描述基因和蛋白质的细胞功能:生物过程、分子功能和细胞成分。它已广泛应用于许多生物信息学应用中,用于注释基因并测量它们的语义相似性,而非序列相似性。一般来说,语义相似性度量涉及GO树拓扑结构、GO术语的信息内容或两者的组合。

结果

在此,我们提出一种新的语义相似性度量方法,称为TopoICSim(拓扑信息内容相似性),它基于GO树的拓扑结构使用GO术语之间特定路径的信息,以及沿这些路径的信息内容分布。基于KEGG通路和归为家族的Pfam结构域,使用来自生物过程或分子功能的GO术语,在两个人类基准数据集上评估了TopoICSim算法。与五种现有方法相比,TopoICSim度量的性能更优。此外,还使用三个人类数据集在由相关基因表达定义的基因/蛋白质集上测试了TopoICSim相似性,与之前发表的两种相似性度量相比,性能有所提高。最后,我们使用了一个在线基准测试资源,该资源在三项测试中针对一组11种相似性度量评估任何相似性度量,使用基于序列相似性、Pfam结构域和酶分类的基因/蛋白质集。TopoICSim的结果显示,相对于基准测试中包含的大多数度量,性能有所提高,特别是在不同测试中表现出非常稳健的性能。

结论

TopoICSim相似性度量提供了一种具有竞争力的方法,用于基于GO注释定量基因和蛋白质之间的语义相似性,性能稳健。可在http://bigr.medisin.ntnu.no/tools/TopoICSim.R获取TopoICSim的R脚本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/1c4140b96093/12859_2016_1160_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/f5e6eea90d47/12859_2016_1160_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/17ce98b0e044/12859_2016_1160_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/f810f14d1928/12859_2016_1160_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/e4f077aa026a/12859_2016_1160_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/c7dd83bc092f/12859_2016_1160_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/8c0a3e3d2840/12859_2016_1160_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/5eabb7cb6c0d/12859_2016_1160_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/1c4140b96093/12859_2016_1160_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/f5e6eea90d47/12859_2016_1160_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/17ce98b0e044/12859_2016_1160_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/f810f14d1928/12859_2016_1160_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/e4f077aa026a/12859_2016_1160_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/c7dd83bc092f/12859_2016_1160_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/8c0a3e3d2840/12859_2016_1160_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/5eabb7cb6c0d/12859_2016_1160_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b911/4966780/1c4140b96093/12859_2016_1160_Fig8_HTML.jpg

相似文献

1
TopoICSim: a new semantic similarity measure based on gene ontology.TopoICSim:一种基于基因本体论的新语义相似性度量方法。
BMC Bioinformatics. 2016 Jul 29;17(1):296. doi: 10.1186/s12859-016-1160-0.
2
IntelliGO: a new vector-based semantic similarity measure including annotation origin.IntelliGO:一种新的基于向量的语义相似性度量方法,包含注释来源。
BMC Bioinformatics. 2010 Dec 1;11:588. doi: 10.1186/1471-2105-11-588.
3
GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.GO 功能相似性聚类取决于相似性度量、聚类方法和注释完整性。
BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2.
4
Multi-Factored Gene-Gene Proximity Measures Exploiting Biological Knowledge Extracted from Gene Ontology: Application in Gene Clustering.多因素基因-基因邻近度度量方法,利用从基因本体论中提取的生物学知识:在基因聚类中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jan-Feb;17(1):207-219. doi: 10.1109/TCBB.2018.2849362. Epub 2018 Jun 21.
5
A relation based measure of semantic similarity for Gene Ontology annotations.一种基于关系的基因本体注释语义相似度度量方法。
BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468.
6
Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.通过结合基因本体注释和基因共功能网络来测量语义相似性。
BMC Bioinformatics. 2015 Feb 14;16:44. doi: 10.1186/s12859-015-0474-7.
7
Measure the Semantic Similarity of GO Terms Using Aggregate Information Content.使用聚合信息内容测量基因本体术语的语义相似性。
IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):468-76. doi: 10.1109/TCBB.2013.176.
8
Assessment of Semantic Similarity between Proteins Using Information Content and Topological Properties of the Gene Ontology Graph.使用信息内容和基因本体论图的拓扑属性评估蛋白质之间的语义相似性。
IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):839-849. doi: 10.1109/TCBB.2017.2689762. Epub 2017 Mar 31.
9
Measuring gene functional similarity based on group-wise comparison of GO terms.基于 GO 术语的组间比较来衡量基因功能相似性。
Bioinformatics. 2013 Jun 1;29(11):1424-32. doi: 10.1093/bioinformatics/btt160. Epub 2013 Apr 9.
10
A new method to measure the semantic similarity of GO terms.一种测量基因本体术语语义相似性的新方法。
Bioinformatics. 2007 May 15;23(10):1274-81. doi: 10.1093/bioinformatics/btm087. Epub 2007 Mar 7.

引用本文的文献

1
Influence of multi-species data on gene-disease associations in substance use disorder using random walk with restart models.使用带重启的随机游走模型的多物种数据对物质使用障碍中基因-疾病关联的影响
PLoS One. 2025 Jun 16;20(6):e0325201. doi: 10.1371/journal.pone.0325201. eCollection 2025.
2
Coherent pathway enrichment estimation by modeling inter-pathway dependencies using regularized regression.使用正则化回归建模通路间的依赖关系进行连贯通路富集估计。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad522.
3
Mantis: flexible and consensus-driven genome annotation.

本文引用的文献

1
The Molecular Signatures Database (MSigDB) hallmark gene set collection.分子特征数据库(MSigDB)标志性基因集集合。
Cell Syst. 2015 Dec 23;1(6):417-425. doi: 10.1016/j.cels.2015.12.004.
2
Measure the Semantic Similarity of GO Terms Using Aggregate Information Content.使用聚合信息内容测量基因本体术语的语义相似性。
IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):468-76. doi: 10.1109/TCBB.2013.176.
3
Classification by integrating plant stress response gene expression data with biological knowledge.通过整合植物应激反应基因表达数据与生物学知识进行分类。
螳螂:灵活且基于共识的基因组注释。
Gigascience. 2021 Jun 2;10(6). doi: 10.1093/gigascience/giab042.
4
HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball.HiG2Vec:庞加莱球中基因本体论和基因的层次表示。
Bioinformatics. 2021 Sep 29;37(18):2971-2980. doi: 10.1093/bioinformatics/btab193.
5
A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain.生物医学领域基于知识图的相似度的基准数据集集合。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa078.
6
Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures.使用并行和分布式处理处理生物领域的大数据可扩展性:三个生物语义相似性度量的案例。
Biomed Res Int. 2019 Jan 27;2019:6750296. doi: 10.1155/2019/6750296. eCollection 2019.
7
Measures of co-expression for improved function prediction of long non-coding RNAs.用于提高长非编码 RNA 功能预测的共表达度量。
BMC Bioinformatics. 2018 Dec 19;19(1):533. doi: 10.1186/s12859-018-2546-y.
8
Gene Ontology Enrichment Improves Performances of Functional Similarity of Genes.基因本体论富集提高了基因功能相似性的性能。
Sci Rep. 2018 Aug 14;8(1):12100. doi: 10.1038/s41598-018-30455-0.
9
An improved approach to infer protein-protein interaction based on a hierarchical vector space model.基于层次向量空间模型的改进蛋白质-蛋白质相互作用推断方法。
BMC Bioinformatics. 2018 Apr 27;19(1):161. doi: 10.1186/s12859-018-2152-z.
10
Refine gene functional similarity network based on interaction networks.基于相互作用网络细化基因功能相似性网络。
BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):550. doi: 10.1186/s12859-017-1969-1.
Math Biosci. 2015 Aug;266:65-72. doi: 10.1016/j.mbs.2015.06.005. Epub 2015 Jun 17.
4
Using Semantic Similarities and csbl.go for Analyzing Microarray Data.利用语义相似性和csbl.go分析微阵列数据。
Methods Mol Biol. 2016;1375:105-16. doi: 10.1007/7651_2015_241.
5
Characterisation of semantic similarity on gene ontology based on a shortest path approach.基于最短路径方法的基因本体语义相似性表征
Int J Data Min Bioinform. 2014;10(1):33-48. doi: 10.1504/ijdmb.2014.062887.
6
Gene Expression Correlation and Gene Ontology-Based Similarity: An Assessment of Quantitative Relationships.基因表达相关性与基于基因本体论的相似性:定量关系评估
Proc IEEE Symp Comput Intell Bioinforma Comput Biol. 2004 Oct 7;2004:25-31. doi: 10.1109/CIBCB.2004.1393927.
7
Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation.基于距离相关性从基因表达数据推断非线性基因调控网络。
PLoS One. 2014 Feb 14;9(2):e87446. doi: 10.1371/journal.pone.0087446. eCollection 2014.
8
A comparative study of statistical methods used to identify dependencies between gene expression signals.用于识别基因表达信号之间相关性的统计方法的比较研究。
Brief Bioinform. 2014 Nov;15(6):906-18. doi: 10.1093/bib/bbt051. Epub 2013 Aug 20.
9
Semantic similarity in the biomedical domain: an evaluation across knowledge sources.生物医学领域的语义相似度:跨知识源的评估。
BMC Bioinformatics. 2012 Oct 10;13:261. doi: 10.1186/1471-2105-13-261.
10
Finding disease similarity based on implicit semantic similarity.基于隐语义相似性的疾病相似性发现。
J Biomed Inform. 2012 Apr;45(2):363-71. doi: 10.1016/j.jbi.2011.11.017. Epub 2011 Dec 7.