基于知识的聚类集成在生物分子数据中的癌症发现。

Knowledge based cluster ensemble for cancer discovery from biomolecular data.

机构信息

School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.

出版信息

IEEE Trans Nanobioscience. 2011 Jun;10(2):76-85. doi: 10.1109/TNB.2011.2144997. Epub 2011 Jul 7.

Abstract

The adoption of microarray techniques in biological and medical research provides a new way for cancer diagnosis and treatment. In order to perform successful diagnosis and treatment of cancer, discovering and classifying cancer types correctly is essential. Class discovery is one of the most important tasks in cancer classification using biomolecular data. Most of the existing works adopt single clustering algorithms to perform class discovery from biomolecular data. However, single clustering algorithms have limitations, which include a lack of robustness, stability, and accuracy. In this paper, we propose a new cluster ensemble approach called knowledge based cluster ensemble (KCE) which incorporates the prior knowledge of the data sets into the cluster ensemble framework. Specifically, KCE represents the prior knowledge of a data set in the form of pairwise constraints. Then, the spectral clustering algorithm (SC) is adopted to generate a set of clustering solutions. Next, KCE transforms pairwise constraints into confidence factors for these clustering solutions. After that, a consensus matrix is constructed by considering all the clustering solutions and their corresponding confidence factors. The final clustering result is obtained by partitioning the consensus matrix. Comparison with single clustering algorithms and conventional cluster ensemble approaches, knowledge based cluster ensemble approaches are more robust, stable and accurate. The experiments on cancer data sets show that: 1) KCE works well on these data sets; 2) KCE not only outperforms most of the state-of-the-art single clustering algorithms, but also outperforms most of the state-of-the-art cluster ensemble approaches.

摘要

微阵列技术在生物和医学研究中的采用为癌症的诊断和治疗提供了新的方法。为了成功地进行癌症的诊断和治疗,正确地发现和分类癌症类型是至关重要的。分类发现是使用生物分子数据进行癌症分类的最重要任务之一。大多数现有的工作采用单一聚类算法从生物分子数据中执行分类发现。然而,单一聚类算法具有缺乏稳健性、稳定性和准确性的局限性。在本文中,我们提出了一种新的聚类集成方法,称为基于知识的聚类集成(KCE),它将数据集的先验知识纳入聚类集成框架中。具体来说,KCE 以成对约束的形式表示数据集的先验知识。然后,采用谱聚类算法(SC)生成一组聚类解决方案。接下来,KCE 将成对约束转换为这些聚类解决方案的置信因子。之后,通过考虑所有聚类解决方案及其相应的置信因子来构建一致矩阵。最后通过分割一致矩阵得到聚类结果。与单一聚类算法和传统聚类集成方法相比,基于知识的聚类集成方法更加稳健、稳定和准确。在癌症数据集上的实验表明:1)KCE 在这些数据集上表现良好;2)KCE 不仅优于大多数最先进的单一聚类算法,而且优于大多数最先进的聚类集成方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索