Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-Belval, Luxembourg.
Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, 6, Avenue de la Fonte, Esch-Belval, Luxembourg.
BMC Bioinformatics. 2018 Aug 29;19(1):308. doi: 10.1186/s12859-018-2314-z.
Biomedical knowledge grows in complexity, and becomes encoded in network-based repositories, which include focused, expert-drawn diagrams, networks of evidence-based associations and established ontologies. Combining these structured information sources is an important computational challenge, as large graphs are difficult to analyze visually.
We investigate knowledge discovery in manually curated and annotated molecular interaction diagrams. To evaluate similarity of content we use: i) Euclidean distance in expert-drawn diagrams, ii) shortest path distance using the underlying network and iii) ontology-based distance. We employ clustering with these metrics used separately and in pairwise combinations. We propose a novel bi-level optimization approach together with an evolutionary algorithm for informative combination of distance metrics. We compare the enrichment of the obtained clusters between the solutions and with expert knowledge. We calculate the number of Gene and Disease Ontology terms discovered by different solutions as a measure of cluster quality. Our results show that combining distance metrics can improve clustering accuracy, based on the comparison with expert-provided clusters. Also, the performance of specific combinations of distance functions depends on the clustering depth (number of clusters). By employing bi-level optimization approach we evaluated relative importance of distance functions and we found that indeed the order by which they are combined affects clustering performance. Next, with the enrichment analysis of clustering results we found that both hierarchical and bi-level clustering schemes discovered more Gene and Disease Ontology terms than expert-provided clusters for the same knowledge repository. Moreover, bi-level clustering found more enriched terms than the best hierarchical clustering solution for three distinct distance metric combinations in three different instances of disease maps.
In this work we examined the impact of different distance functions on clustering of a visual biomedical knowledge repository. We found that combining distance functions may be beneficial for clustering, and improve exploration of such repositories. We proposed bi-level optimization to evaluate the importance of order by which the distance functions are combined. Both combination and order of these functions affected clustering quality and knowledge recognition in the considered benchmarks. We propose that multiple dimensions can be utilized simultaneously for visual knowledge exploration.
生物医学知识的复杂性不断增加,并被编码在基于网络的知识库中,其中包括有针对性的、专家绘制的图表、基于证据的关联网络和已建立的本体。将这些结构化信息源结合起来是一个重要的计算挑战,因为大型图难以进行可视化分析。
我们研究了在手动整理和注释的分子相互作用图中进行知识发现。为了评估内容的相似性,我们使用:i)专家绘制的图表中的欧几里得距离,ii)使用基础网络的最短路径距离和 iii)基于本体的距离。我们使用这些指标进行聚类,分别使用和成对组合使用。我们提出了一种新的双层优化方法,以及一种用于距离度量信息组合的进化算法。我们将获得的聚类之间的富集与解决方案和专家知识进行比较。我们计算不同解决方案发现的基因和疾病本体术语数量作为聚类质量的度量。我们的结果表明,基于与专家提供的聚类进行比较,组合距离度量可以提高聚类准确性。此外,特定距离函数组合的性能取决于聚类深度(聚类数量)。通过使用双层优化方法,我们评估了距离函数的相对重要性,发现它们的组合顺序确实会影响聚类性能。接下来,通过对聚类结果的富集分析,我们发现对于相同的知识库,层次聚类和双层聚类方案都比专家提供的聚类发现了更多的基因和疾病本体术语。此外,对于三种不同距离度量组合的三个不同疾病图谱实例,双层聚类比最佳层次聚类解决方案发现了更多的富集术语。
在这项工作中,我们检查了不同距离函数对视觉生物医学知识库聚类的影响。我们发现,组合距离函数可能有益于聚类,并改善此类知识库的探索。我们提出了双层优化来评估组合距离函数的顺序的重要性。这些函数的组合和顺序都影响了所考虑基准中的聚类质量和知识识别。我们提出可以同时利用多个维度进行可视化知识探索。