School of Software, Beijing University of Technology, China.
School of Software, Beijing University of Technology, China.
Comput Methods Programs Biomed. 2020 Jun;189:105337. doi: 10.1016/j.cmpb.2020.105337. Epub 2020 Jan 13.
Cancer subtype analysis, as an extension of cancer diagnosis, can be regarded as a consensus clustering problem. This analysis is beneficial for providing patients with more accurate treatment. Consensus clustering refers to a situation in which several different clusters have been obtained for a particular data set, and it is desired to aggregate those clustering results to get a better clustering solution. In this paper, we propose to generalize the traditional consensus clustering methods in three manners: (1) We provide Bregmannian consensus clustering (BCC), where the loss between the consensus clustering result and all the input clusterings are generalized from a traditional Euclidean distance to a general Bregman loss; (2) we generalize the BCC to a weighted case, where each input clustering has different weights, providing a better solution for the final clustering result; and (3) we propose a novel semi-supervised consensus clustering, which adds some must-link and cannot-link constraints compared with the first two methods. Then, we obtain three cancer (breast, lung, colorectal cancer) data sets from The Cancer Genome Atlas (TCGA). Each data set has three data types (mRNA, mircoRNA, methylation), and each is respectively used to test the accuracy of the proposed algorithms for clusterings. The experimental results demonstrate that the highest aggregation accuracy of the weighted BCC (WBCC) on cancer data sets is 90.2%. Moreover, although the lowest accuracy is 62.3%, it is higher than other methods on the same data set. Therefore, we conclude that as compared with the competition, our method is more effective.
癌症亚型分析可以看作是癌症诊断的一种扩展,可以为患者提供更准确的治疗方案。共识聚类是指对特定数据集进行多次聚类,然后将这些聚类结果进行聚合,从而得到更好的聚类结果。本文提出了三种方法来推广传统的共识聚类方法:(1)我们提出了 Bregman 共识聚类(BCC),将共识聚类结果与所有输入聚类之间的损失从传统的欧几里得距离推广到一般的 Bregman 损失;(2)我们将 BCC 推广到加权情况,其中每个输入聚类具有不同的权重,为最终聚类结果提供更好的解决方案;(3)我们提出了一种新的半监督共识聚类方法,与前两种方法相比,该方法增加了一些必须链接和不能链接的约束。然后,我们从癌症基因组图谱(TCGA)中获得了三个癌症(乳腺癌、肺癌、结直肠癌)数据集。每个数据集有三种数据类型(mRNA、mircoRNA、甲基化),分别用于测试所提出算法在聚类方面的准确性。实验结果表明,加权 BCC(WBCC)在癌症数据集上的聚合精度最高可达 90.2%。尽管最低精度为 62.3%,但在同一数据集上高于其他方法。因此,我们可以得出结论,与竞争方法相比,我们的方法更有效。