InfAcrOnt：使用信息流动的随机游走计算跨本体术语相似度。

InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk.

机构信息

College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, People's Republic of China.

Hospital for Sick Children, Toronto, M5G 1X8, Canada.

出版信息

BMC Genomics. 2018 Jan 19;19(Suppl 1):919. doi: 10.1186/s12864-017-4338-6.

DOI:10.1186/s12864-017-4338-6

PMID:29363423

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5780854/

Abstract

BACKGROUND

Since the establishment of the first biomedical ontology Gene Ontology (GO), the number of biomedical ontology has increased dramatically. Nowadays over 300 ontologies have been built including extensively used Disease Ontology (DO) and Human Phenotype Ontology (HPO). Because of the advantage of identifying novel relationships between terms, calculating similarity between ontology terms is one of the major tasks in this research area. Though similarities between terms within each ontology have been studied with in silico methods, term similarities across different ontologies were not investigated as deeply. The latest method took advantage of gene functional interaction network (GFIN) to explore such inter-ontology similarities of terms. However, it only used gene interactions and failed to make full use of the connectivity among gene nodes of the network. In addition, all existent methods are particularly designed for GO and their performances on the extended ontology community remain unknown.

RESULTS

We proposed a method InfAcrOnt to infer similarities between terms across ontologies utilizing the entire GFIN. InfAcrOnt builds a term-gene-gene network which comprised ontology annotations and GFIN, and acquires similarities between terms across ontologies through modeling the information flow within the network by random walk. In our benchmark experiments on sub-ontologies of GO, InfAcrOnt achieves a high average area under the receiver operating characteristic curve (AUC) (0.9322 and 0.9309) and low standard deviations (1.8746e-6 and 3.0977e-6) in both human and yeast benchmark datasets exhibiting superior performance. Meanwhile, comparisons of InfAcrOnt results and prior knowledge on pair-wise DO-HPO terms and pair-wise DO-GO terms show high correlations.

CONCLUSIONS

The experiment results show that InfAcrOnt significantly improves the performance of inferring similarities between terms across ontologies in benchmark set.

摘要

背景

自第一个生物医学本体基因本体（GO）建立以来，生物医学本体的数量急剧增加。如今，已经构建了 300 多个本体，包括广泛使用的疾病本体（DO）和人类表型本体（HPO）。由于识别术语之间新关系的优势，计算本体术语之间的相似度是该研究领域的主要任务之一。尽管已经使用计算机方法研究了每个本体内部术语之间的相似性，但没有深入研究不同本体之间的术语相似性。最新的方法利用基因功能交互网络（GFIN）来探索术语之间的这种跨本体相似性。然而，它仅使用基因相互作用，并且未能充分利用网络中基因节点之间的连通性。此外，所有现有的方法都是专门为 GO 设计的，它们在扩展本体社区中的性能仍不清楚。

结果

我们提出了一种利用整个 GFIN 推断跨本体术语之间相似性的方法 InfAcrOnt。InfAcrOnt 构建了一个由本体注释和 GFIN 组成的术语-基因-基因网络，并通过随机游走在网络内建模信息流来获取跨本体术语之间的相似性。在我们对 GO 子本体的基准实验中，InfAcrOnt 在人类和酵母基准数据集上均取得了高平均接收者操作特征曲线下面积（AUC）（0.9322 和 0.9309）和低标准差（1.8746e-6 和 3.0977e-6）的优异性能。同时，InfAcrOnt 结果与 DO-HPO 术语对和 DO-GO 术语对的先验知识之间的比较显示出高度相关性。