Yip Kevin Y, Qi Peishen, Schultz Martin, Cheung David W, Cheung Kei-Hoi
Computer Science, Yale University, New Haven, Connecticut, USA,
Pac Symp Biocomput. 2006:188-99.
Clustering is a popular method for analyzing microarray data. Given the large number of clustering algorithms being available, it is difficult to identify the most suitable ones for a particular task. It is also difficult to locate, download, install and run the algorithms. This paper describes a matchmaking system, SemBiosphere, which solves both problems. It recommends clustering algorithms based on some minimal user requirement inputs and the data properties. An ontology was developed in OWL, an expressive ontological language, for describing what the algorithms are and how they perform, in addition to how they can be invoked. This allows machines to "understand" the algorithms and make the recommendations. The algorithm can be implemented by different groups and in different languages, and run on different platforms at geographically distributed sites. Through the use of XML-based web services, they can all be invoked in the same standard way. The current clustering services were transformed from the non-semantic web services of the Biosphere system, which includes a variety of algorithms that have been applied to microarray gene expression data analysis. New algorithms can be incorporated into the system without too much effort. The SemBiosphere system and the complete clustering ontology can be accessed at http://yeasthub2.gersteinlab. org/sembiosphere/.
聚类是一种用于分析微阵列数据的常用方法。鉴于现有大量聚类算法,很难为特定任务识别出最合适的算法。同时,定位、下载、安装和运行这些算法也存在困难。本文介绍了一种匹配系统SemBiosphere,它解决了这两个问题。该系统根据一些最少的用户需求输入和数据属性推荐聚类算法。我们用OWL(一种表达性本体语言)开发了一个本体,用于描述算法是什么、如何运行以及如何调用。这使得机器能够“理解”算法并做出推荐。该算法可以由不同团队用不同语言实现,并在地理上分布的不同平台上运行。通过使用基于XML的网络服务,可以以相同的标准方式调用所有算法。当前的聚类服务是从Biosphere系统的非语义网络服务转换而来的,该系统包含多种已应用于微阵列基因表达数据分析的算法。新算法可以轻松地纳入该系统。可通过http://yeasthub2.gersteinlab.org/sembiosphere/访问SemBiosphere系统和完整的聚类本体。