Suppr超能文献

一种新型聚类方法中归一化和预聚类问题的评估:具有增强定位的全局最优搜索

Evaluation of normalization and pre-clustering issues in a novel clustering approach: global optimum search with enhanced positioning.

作者信息

Tan Meng P, Broach James R, Floudas Christodoulos A

机构信息

Department of Chemical Engineering, Princeton University, Princeton, NJ 08544, USA.

出版信息

J Bioinform Comput Biol. 2007 Aug;5(4):895-913. doi: 10.1142/s0219720007002941.

Abstract

We study the effects on clustering quality by different normalization and pre-clustering techniques for a novel mixed-integer nonlinear optimization-based clustering algorithm, the Global Optimum Search with Enhanced Positioning (EP_GOS_Clust). These are important issues to be addressed. DNA microarray experiments are informative tools to elucidate gene regulatory networks. But in order for gene expression levels to be comparable across microarrays, normalization procedures have to be properly undertaken. The aim of pre-clustering is to use an adequate amount of discriminatory characteristics to form rough information profiles, so that data with similar features can be pre-grouped together and outliers deemed insignificant to the clustering process can be removed. Using experimental DNA microarray data from the yeast Saccharomyces Cerevisiae, we study the merits of pre-clustering genes based on distance/correlation comparisons and symbolic representations such as {+, o, -}. As a performance metric, we look at the intra- and inter-cluster error sums, two generic but intuitive measures of clustering quality. We also use publicly available Gene Ontology resources to assess the clusters' level of biological coherence. Our analysis indicates a significant effect by normalization and pre-clustering methods on the clustering results. Hence, the outcome of this study has significance in fine-tuning the EP_GOS_Clust clustering approach.

摘要

我们针对一种基于混合整数非线性优化的新型聚类算法——增强定位全局最优搜索(EP_GOS_Clust),研究不同归一化和预聚类技术对聚类质量的影响。这些都是需要解决的重要问题。DNA微阵列实验是阐明基因调控网络的信息工具。但是为了使基因表达水平在不同微阵列之间具有可比性,必须正确进行归一化程序。预聚类的目的是使用适量的判别特征来形成粗略的信息概况,以便将具有相似特征的数据预先分组在一起,并去除对聚类过程无足轻重的异常值。利用来自酿酒酵母的实验性DNA微阵列数据,我们研究基于距离/相关性比较以及诸如{ +, o, - }等符号表示对基因进行预聚类的优点。作为性能指标,我们考察簇内和簇间误差总和,这是两种通用但直观的聚类质量度量。我们还使用公开可用的基因本体资源来评估簇的生物学一致性水平。我们的分析表明归一化和预聚类方法对聚类结果有显著影响。因此,本研究结果对于微调EP_GOS_Clust聚类方法具有重要意义。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验