Suppr超能文献

一种用于聚类基因表达数据的改进算法。

An improved algorithm for clustering gene expression data.

作者信息

Bandyopadhyay Sanghamitra, Mukhopadhyay Anirban, Maulik Ujjwal

机构信息

Machine Intelligence Unit, Indian Statistical Institute, Kolkata-700108, India.

出版信息

Bioinformatics. 2007 Nov 1;23(21):2859-65. doi: 10.1093/bioinformatics/btm418. Epub 2007 Aug 25.

Abstract

MOTIVATION

Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering.

RESULTS

The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.

摘要

动机

微阵列技术的最新进展允许在不同时间点同时监测大量基因的表达水平。聚类是分析此类微阵列数据的重要工具,其典型特性包括固有的不确定性、噪声和不精确性。本文提出了一种两阶段聚类算法,该算法采用了最近提出的可变字符串长度遗传方案和多目标遗传聚类算法。它基于点对多个类具有显著隶属度的新概念。著名的模糊C均值的迭代版本也用于聚类。

结果

在各种人工和公开可用的真实数据集上,与平均连锁法、自组织映射(SOM)和最近开发的基于加权中餐厅的聚类方法(CRC)(广泛用于聚类基因表达数据的方法)相比,所提出的两阶段聚类算法具有显著优势。还分析了聚类解决方案的生物学相关性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验