Suppr超能文献

一种在基因表达数据中发现重叠簇的新方法。

A novel approach for discovering overlapping clusters in gene expression data.

作者信息

Ma Patrick C H, Chan Keith C C

机构信息

Department of Computing, the Hong Kong Polytechnic University, Hong Kong, China.

出版信息

IEEE Trans Biomed Eng. 2009 Jul;56(7):1803-9. doi: 10.1109/TBME.2009.2015055. Epub 2009 Feb 20.

Abstract

Many existing clustering algorithms have been used to identify coexpressed genes in gene expression data. These algorithms are used mainly to partition data in the sense that each gene is allowed to belong only to one cluster. Since proteins typically interact with different groups of proteins in order to serve different biological roles, the genes that produce these proteins are therefore expected to coexpress with more than one group of genes. In other words, some genes are expected to belong to more than one cluster. This poses a challenge to gene expression data clustering as there is a need for overlapping clusters to be discovered in a noisy environment. For this task, we propose an effective information theoretical approach, which consists of an initial clustering phase and a second reclustering phase, in this paper. The proposed approach has been tested with both simulated and real expression data. Experimental results show that it can improve the performances of existing clustering algorithms and is able to effectively uncover interesting patterns in noisy gene expression data so that, based on these patterns, overlapping clusters can be discovered.

摘要

许多现有的聚类算法已被用于识别基因表达数据中的共表达基因。这些算法主要用于对数据进行划分,即每个基因只允许属于一个簇。由于蛋白质通常与不同的蛋白质组相互作用以发挥不同的生物学作用,因此产生这些蛋白质的基因预计会与不止一组基因共表达。换句话说,一些基因预计会属于不止一个簇。这给基因表达数据聚类带来了挑战,因为需要在噪声环境中发现重叠簇。针对此任务,我们在本文中提出了一种有效的信息理论方法,该方法由初始聚类阶段和第二个重新聚类阶段组成。所提出的方法已通过模拟和真实表达数据进行了测试。实验结果表明,它可以提高现有聚类算法的性能,并能够有效地在噪声基因表达数据中发现有趣的模式,从而基于这些模式发现重叠簇。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验