Suppr超能文献

基因本体论语义相似性的新见解。

A novel insight into Gene Ontology semantic similarity.

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, PR China.

出版信息

Genomics. 2013 Jun;101(6):368-75. doi: 10.1016/j.ygeno.2013.04.010. Epub 2013 Apr 26.

Abstract

Existing methods for computing the semantic similarity between Gene Ontology (GO) terms are often based on external datasets and, therefore are not intrinsic to GO. Furthermore, they not only fail to handle identical annotations but also show a strong bias toward well-annotated proteins when being used for measuring similarity of proteins. Inspired by the concept of cellular differentiation and dedifferentiation in developmental biology, we propose a shortest semantic differentiation distance (SSDD) based on the concept of semantic totipotency to measure the semantic similarity of GO terms and further compare the functional similarity of proteins. Using human ratings and a benchmark dataset, SSDD was found to improve upon existing methods for computing the semantic similarity of GO terms. An in-depth analysis shows that SSDD is able to distinguish identical annotations and does not depend on annotation richness, thus producing more unbiased and reliable results. Online services can be accessed at the Gene Functional Similarity Analysis Tools website (GFSAT: http://nclab.hit.edu.cn/GFSAT).

摘要

现有的计算基因本体论(GO)术语之间语义相似度的方法通常基于外部数据集,因此不是 GO 固有的。此外,当用于测量蛋白质的相似性时,它们不仅无法处理相同的注释,而且对注释良好的蛋白质表现出很强的偏见。受发育生物学中细胞分化和去分化概念的启发,我们提出了一种基于语义全能性概念的最短语义分化距离(SSDD),用于测量 GO 术语的语义相似性,并进一步比较蛋白质的功能相似性。使用人类评分和基准数据集,发现 SSDD 优于现有的计算 GO 术语语义相似性的方法。深入分析表明,SSDD 能够区分相同的注释,并且不依赖于注释丰富度,从而产生更公正和可靠的结果。在线服务可以在基因功能相似性分析工具网站(GFSAT:http://nclab.hit.edu.cn/GFSAT)上访问。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验