Suppr超能文献

用于从生物分子数据中进行肿瘤聚类的混合模糊聚类集成框架。

Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.

作者信息

Yu Zhiwen, Chen Hantao, You Jane, Han Guoqiang, Li Le

机构信息

South China University of Technology, Guangzhou and Hong Kong Polytechnic University, Hong Kong.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):657-70. doi: 10.1109/TCBB.2013.59.

Abstract

Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.

摘要

利用生物分子数据进行癌症类别发现是癌症诊断和治疗中最重要的任务之一。从基因表达数据中进行肿瘤聚类为癌症类别发现提供了一种新方法。现有的大多数研究工作采用单聚类算法对生物分子数据进行肿瘤聚类,这些算法缺乏鲁棒性、稳定性和准确性。为了进一步提高从生物分子数据中进行肿瘤聚类的性能,我们将模糊理论引入到从生物分子数据中进行肿瘤聚类的聚类集成框架中,并提出了四种混合模糊聚类集成框架(HFCEF),分别命名为HFCEF-I、HFCEF-II、HFCEF-III和HFCEF-IV,以识别属于不同类型癌症的样本。HFCEF-I和HFCEF-II的区别在于它们采用不同的集成生成器方法来生成集成中的一组模糊矩阵。具体来说,HFCEF-I应用亲和传播算法(AP)在样本维度上进行聚类,并基于AP选择的模糊隶属度函数和基础样本在集成中生成一组模糊矩阵。HFCEF-II采用AP在属性维度上进行聚类,生成一组子空间,并通过对子空间执行模糊c均值算法在集成中获得一组模糊矩阵。与HFCEF-I和HFCEF-II相比,HFCEF-III和HFCEF-IV考虑了HFCEF-I和HFCEF-II的特点。HFCEF-III以串行方式结合了HFCEF-I和HFCEF-II,而HFCEF-IV以并行方式集成了HFCEF-I和HFCEF-II。HFCEF采用合适的一致性函数,如模糊c均值算法或归一化割算法(Ncut),来汇总生成的模糊矩阵,并获得最终结果。对来自UCI机器学习库的真实数据集和癌症基因表达谱进行的实验表明:1)所提出的混合模糊聚类集成框架在真实数据集上,特别是生物分子数据上表现良好;2)与最先进的单聚类算法和传统聚类集成方法相比,所提出的方法能够提供更鲁棒、稳定和准确的结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验