IEEE Trans Cybern. 2019 May;49(5):1680-1693. doi: 10.1109/TCYB.2018.2817480. Epub 2018 Apr 2.
Patient stratification has a major role in enabling efficient and personalized medicine. An important task in patient stratification is to discover disease subtypes for effective treatment. To achieve this goal, the research on clustering algorithms for patient stratification has brought attention from both academia and medical community over the past decades. However, existing clustering algorithms suffer from realistic restrictions such as experimental noises, high dimensionality, and poor interpretability. In particular, the existing clustering algorithms usually determine clustering quality using only one internal evaluation function. Unfortunately, it is obvious that one internal evaluation function is hard to be fitted and robust for all datasets. Therefore, in this paper, a novel multiobjective framework called multiobjective clustering algorithm by fast search and find of density peaks is proposed to address those limitations altogether. In the proposed framework, a parameter candidate population is evolved under multiple objectives to select features and evaluate clustering densities automatically. To guide the multiobjective evolution, five cluster validity indices including compactness, separation, Calinski-Harabasz index, Davies-Bouldin index, and Dunn index, are chosen as the objective functions, capturing multiple characteristics of the evolving clusters. Multiobjective differential evolution algorithm based on decomposition is adopted to optimize those five objective functions simultaneously. To demonstrate its effectiveness, extensive experiments have been conducted, comparing the proposed algorithm with 45 algorithms including nine state-of-the-art clustering algorithms, five multiobjective evolutionary algorithms, and 31 baseline algorithms under different objective subsets on 94 datasets featuring 35 real patient stratification datasets, 55 synthetic datasets based on a real human transcription regulation network model, and four other medical datasets. The numerical results reveal that the proposed algorithm can achieve better or competitive solutions than the others. Besides, time complexity analysis, convergence analysis, and parameter analysis are conducted to demonstrate the robustness of the proposed algorithm from different perspectives.
患者分层在实现高效和个性化医疗方面起着重要作用。在患者分层中,一个重要任务是发现疾病亚型以进行有效治疗。为了实现这一目标,过去几十年来,学术界和医学界都对用于患者分层的聚类算法研究给予了关注。然而,现有的聚类算法受到实验噪声、高维性和可解释性差等现实限制。特别是,现有的聚类算法通常仅使用一个内部评估函数来确定聚类质量。不幸的是,显然一个内部评估函数很难适应所有数据集并具有稳健性。因此,在本文中,提出了一种称为基于快速搜索和密度峰值的多目标聚类算法的新的多目标框架,以共同解决这些限制。在提出的框架中,根据多个目标进化参数候选群体,以自动选择特征和评估聚类密度。为了指导多目标进化,选择了五个聚类有效性指标,包括紧凑性、分离性、Calinski-Harabasz 指数、Davies-Bouldin 指数和 Dunn 指数,作为目标函数,捕获了进化聚类的多个特征。采用基于分解的多目标差分进化算法来同时优化这五个目标函数。为了证明其有效性,在 94 个数据集上进行了广泛的实验,这些数据集包含 35 个真实的患者分层数据集、基于真实人类转录调控网络模型的 55 个合成数据集和其他四个医学数据集,比较了该算法与包括 9 种最新聚类算法、5 种多目标进化算法和 31 种基线算法在内的 45 种算法在不同目标子集下的性能。数值结果表明,该算法可以获得比其他算法更好或有竞争力的解决方案。此外,还进行了时间复杂度分析、收敛性分析和参数分析,从不同角度证明了该算法的鲁棒性。