Suppr超能文献

一种用于估计基因产物语义相似性的最短路径图核。

A shortest-path graph kernel for estimating gene product semantic similarity.

作者信息

Alvarez Marco A, Qi Xiaojun, Yan Changhui

机构信息

Department of Computer Science, North Dakota State University, Fargo, 58108, USA.

出版信息

J Biomed Semantics. 2011 Jul 29;2:3. doi: 10.1186/2041-1480-2-3.

Abstract

BACKGROUND

Existing methods for calculating semantic similarity between gene products using the Gene Ontology (GO) often rely on external resources, which are not part of the ontology. Consequently, changes in these external resources like biased term distribution caused by shifting of hot research topics, will affect the calculation of semantic similarity. One way to avoid this problem is to use semantic methods that are "intrinsic" to the ontology, i.e. independent of external knowledge.

RESULTS

We present a shortest-path graph kernel (spgk) method that relies exclusively on the GO and its structure. In spgk, a gene product is represented by an induced subgraph of the GO, which consists of all the GO terms annotating it. Then a shortest-path graph kernel is used to compute the similarity between two graphs. In a comprehensive evaluation using a benchmark dataset, spgk compares favorably with other methods that depend on external resources. Compared with simUI, a method that is also intrinsic to GO, spgk achieves slightly better results on the benchmark dataset. Statistical tests show that the improvement is significant when the resolution and EC similarity correlation coefficient are used to measure the performance, but is insignificant when the Pfam similarity correlation coefficient is used.

CONCLUSIONS

Spgk uses a graph kernel method in polynomial time to exploit the structure of the GO to calculate semantic similarity between gene products. It provides an alternative to both methods that use external resources and "intrinsic" methods with comparable performance.

摘要

背景

现有的利用基因本体论(GO)计算基因产物之间语义相似性的方法通常依赖于外部资源,而这些资源并非本体的一部分。因此,这些外部资源的变化,如热门研究主题转移导致的术语分布偏差,会影响语义相似性的计算。避免此问题的一种方法是使用本体“内在”的语义方法,即独立于外部知识的方法。

结果

我们提出了一种仅依赖于GO及其结构的最短路径图核(spgk)方法。在spgk中,基因产物由GO的一个诱导子图表示,该子图由注释它的所有GO术语组成。然后使用最短路径图核来计算两个图之间的相似性。在使用基准数据集进行的综合评估中,spgk与其他依赖外部资源的方法相比具有优势。与同样是GO内在方法的simUI相比,spgk在基准数据集上取得了稍好的结果。统计测试表明,当使用分辨率和EC相似性相关系数来衡量性能时,改进是显著的,但当使用Pfam相似性相关系数时,改进并不显著。

结论

Spgk在多项式时间内使用图核方法来利用GO的结构计算基因产物之间的语义相似性。它为使用外部资源的方法和具有可比性能的“内在”方法提供了一种替代方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/785e/3161911/978f838a8a57/2041-1480-2-3-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验