Suppr超能文献

关于一种用于聚类癌症患者数据的集成算法。

On an ensemble algorithm for clustering cancer patient data.

作者信息

Qi Ran, Wu Dengyuan, Sheng Li, Henson Donald, Schwartz Arnold, Xu Eric, Xing Kai, Chen Dechang

出版信息

BMC Syst Biol. 2013;7 Suppl 4(Suppl 4):S9. doi: 10.1186/1752-0509-7-S4-S9. Epub 2013 Oct 23.

Abstract

BACKGROUND

The TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis. However, cancer is no longer considered an anatomic disease. Therefore, the TNM should be expanded to accommodate new prognostic factors in order to increase the accuracy of estimating cancer patient outcome. The ensemble algorithm for clustering cancer data (EACCD) by Chen et al. reflects an effort to expand the TNM without changing its basic definitions. Though results on using EACCD have been reported, there has been no study on the analysis of the algorithm. In this report, we examine various aspects of EACCD using a large breast cancer patient dataset. We compared the output of EACCD with the corresponding survival curves, investigated the effect of different settings in EACCD, and compared EACCD with alternative clustering approaches.

RESULTS

Using the basic T and N definitions, EACCD generated a dendrogram that shows a graphic relationship among the survival curves of the breast cancer patients. The dendrograms from EACCD are robust for large values of m (the number of runs in the learning step). When m is large, the dendrograms depend on the linkage functions. The statistical tests, however, employed in the learning step have minimal effect on the dendrogram for large m. In addition, if omitting the step for learning dissimilarity in EACCD, the resulting approaches can have a degraded performance. Furthermore, clustering only based on prognostic factors could generate misleading dendrograms, and direct use of partitioning techniques could lead to misleading assignments to clusters.

CONCLUSIONS

When only the Partitioning Around Medoids (PAM) algorithm is involved in the step of learning dissimilarity, large values of m are required to obtain robust dendrograms, and for a large m EACCD can effectively cluster cancer patient data.

摘要

背景

TNM分期系统基于三个解剖学预后因素:肿瘤、淋巴结和转移。然而,癌症不再被视为一种解剖学疾病。因此,TNM应加以扩展以纳入新的预后因素,从而提高预测癌症患者预后的准确性。Chen等人提出的癌症数据聚类集成算法(EACCD)反映了在不改变TNM基本定义的情况下对其进行扩展的努力。虽然已有关于使用EACCD的结果报道,但尚未有对该算法的分析研究。在本报告中,我们使用一个大型乳腺癌患者数据集研究了EACCD的各个方面。我们将EACCD的输出与相应的生存曲线进行了比较,研究了EACCD中不同设置的影响,并将EACCD与其他聚类方法进行了比较。

结果

使用基本的T和N定义,EACCD生成了一个树形图,展示了乳腺癌患者生存曲线之间的图形关系。对于较大的m值(学习步骤中的运行次数),EACCD生成的树形图是稳健的。当m较大时,树形图取决于连锁函数。然而,学习步骤中使用的统计检验对较大m值的树形图影响极小。此外,如果在EACCD中省略学习差异的步骤,所得方法的性能可能会下降。此外,仅基于预后因素进行聚类可能会生成误导性的树形图,直接使用划分技术可能会导致聚类分配出现误导。

结论

当学习差异的步骤仅涉及围绕中心点划分(PAM)算法时,需要较大的m值才能获得稳健的树形图,并且对于较大的m值,EACCD可以有效地对癌症患者数据进行聚类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd48/3854654/98e8c0f9446c/1752-0509-7-S4-S9-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验