IEEE Trans Cybern. 2022 Oct;52(10):11027-11040. doi: 10.1109/TCYB.2021.3069434. Epub 2022 Sep 19.
Patient stratification has been studied widely to tackle subtype diagnosis problems for effective treatment. Due to the dimensionality curse and poor interpretability of data, there is always a long-lasting challenge in constructing a stratification model with high diagnostic ability and good generalization. To address these problems, this article proposes two novel evolutionary multiobjective clustering algorithms with ensemble (NSGA-II-ECFE and MOEA/D-ECFE) with four cluster validity indices used as the objective functions. First, an effective ensemble construction method is developed to enrich the ensemble diversity. After that, an ensemble clustering fitness evaluation (ECFE) method is proposed to evaluate the ensembles by measuring the consensus clustering under those four objective functions. To generate the consensus clustering, ECFE exploits the hybrid co-association matrix from the ensembles and then dynamically selects the suitable clustering algorithm on that matrix. Multiple experiments have been conducted to demonstrate the effectiveness of the proposed algorithm in comparison with seven clustering algorithms, twelve ensemble clustering approaches, and two multiobjective clustering algorithms on 55 synthetic datasets and 35 real patient stratification datasets. The experimental results demonstrate the competitive edges of the proposed algorithms over those compared methods. Furthermore, the proposed algorithm is applied to extend its advantages by identifying cancer subtypes from five cancer-related single-cell RNA-seq datasets.
患者分层已被广泛研究,以解决亚类诊断问题,从而进行有效治疗。由于维度诅咒和数据解释能力差,构建具有高诊断能力和良好泛化能力的分层模型一直是一个持久的挑战。针对这些问题,本文提出了两种具有集成功能的新颖进化多目标聚类算法(NSGA-II-ECFE 和 MOEA/D-ECFE),使用四个聚类有效性指标作为目标函数。首先,开发了一种有效的集成构建方法来丰富集成多样性。之后,提出了一种集成聚类适应度评估(ECFE)方法,通过测量四个目标函数下的共识聚类来评估集成。为了生成共识聚类,ECFE 利用来自集合的混合共同关联矩阵,然后在该矩阵上动态选择合适的聚类算法。在 55 个合成数据集和 35 个真实患者分层数据集上进行了多项实验,以证明与七种聚类算法、十二种集成聚类方法和两种多目标聚类算法相比,所提出算法的有效性。实验结果表明,与比较方法相比,所提出的算法具有竞争优势。此外,还应用所提出的算法通过从五个与癌症相关的单细胞 RNA-seq 数据集识别癌症亚型来扩展其优势。