Blom Evert-Jan, van Hijum Sacha A F T, Hofstede Klaas J, Silvis Remko, Roerdink Jos B T M, Kuipers Oscar P
Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, The Netherlands.
BMC Bioinformatics. 2008 Dec 16;9:535. doi: 10.1186/1471-2105-9-535.
A typical step in the analysis of gene expression data is the determination of clusters of genes that exhibit similar expression patterns. Researchers are confronted with the seemingly arbitrary choice between numerous algorithms to perform cluster analysis.
We developed an exploratory application that benchmarks the results of clustering methods using functional annotations. In addition, a de novo DNA motif discovery algorithm is integrated in our program which identifies overrepresented DNA binding sites in the upstream DNA sequences of genes from the clusters that are indicative of sites of transcriptional control. The performance of our program was evaluated by comparing the original results of a time course experiment with the findings of our application.
DISCLOSE assists researchers in the prokaryotic research community in systematically evaluating results of the application of a range of clustering algorithms to transcriptome data. Different performance measures allow to quickly and comprehensively determine the best suited clustering approach for a given dataset.
基因表达数据分析中的一个典型步骤是确定呈现相似表达模式的基因簇。研究人员在众多用于进行聚类分析的算法之间面临看似随意的选择。
我们开发了一个探索性应用程序,该程序使用功能注释对聚类方法的结果进行基准测试。此外,我们的程序集成了一种从头DNA基序发现算法,该算法可识别来自聚类的基因上游DNA序列中过度表达的DNA结合位点,这些位点指示转录控制位点。通过将时间进程实验的原始结果与我们应用程序的发现进行比较,对我们程序的性能进行了评估。
DISCLOSE有助于原核生物研究领域的研究人员系统地评估一系列聚类算法应用于转录组数据的结果。不同的性能指标有助于快速、全面地确定给定数据集最适合的聚类方法。