Suppr超能文献

癌症研究中微阵列数据的聚类稳定性评分。

Cluster stability scores for microarray data in cancer studies.

作者信息

Smolkin Mark, Ghosh Debashis

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.

出版信息

BMC Bioinformatics. 2003 Sep 6;4:36. doi: 10.1186/1471-2105-4-36.

Abstract

BACKGROUND

A potential benefit of profiling of tissue samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Hierarchical clustering has been the primary analytical tool used to define disease subtypes from microarray experiments in cancer settings. Assessing cluster reliability poses a major complication in analyzing output from clustering procedures. While most work has focused on estimating the number of clusters in a dataset, the question of stability of individual-level clusters has not been addressed.

RESULTS

We address this problem by developing cluster stability scores using subsampling techniques. These scores exploit the redundancy in biologically discriminatory information on the chip. Our approach is generic and can be used with any clustering method. We propose procedures for calculating cluster stability scores for situations involving both known and unknown numbers of clusters. We also develop cluster-size adjusted stability scores. The method is illustrated by application to data three cancer studies; one involving childhood cancers, the second involving B-cell lymphoma, and the final is from a malignant melanoma study.

AVAILABILITY

Code implementing the proposed analytic method can be obtained at the second author's website.

摘要

背景

使用微阵列对组织样本进行分析的一个潜在好处是生成能够定义疾病亚型的分子指纹。层次聚类一直是在癌症背景下从微阵列实验中定义疾病亚型的主要分析工具。评估聚类的可靠性是分析聚类程序输出时的一个主要难题。虽然大多数工作都集中在估计数据集中的聚类数量上,但个体水平聚类的稳定性问题尚未得到解决。

结果

我们通过使用子采样技术开发聚类稳定性分数来解决这个问题。这些分数利用了芯片上生物学鉴别信息中的冗余。我们的方法是通用的,可用于任何聚类方法。我们提出了在聚类数量已知和未知的情况下计算聚类稳定性分数的程序。我们还开发了聚类大小调整后的稳定性分数。通过将该方法应用于三项癌症研究的数据进行了说明;一项涉及儿童癌症,第二项涉及B细胞淋巴瘤,最后一项来自恶性黑色素瘤研究。

可用性

实现所提出分析方法的代码可在第二作者的网站上获取。

相似文献

9
Unsupervised clustering in mRNA expression profiles.mRNA表达谱中的无监督聚类
Comput Biol Med. 2006 Oct;36(10):1126-42. doi: 10.1016/j.compbiomed.2005.09.003. Epub 2005 Oct 24.

引用本文的文献

1
Cross-Study Replicability in Cluster Analysis.聚类分析中的跨研究可重复性
Stat Sci. 2023 May;38(2):303-316. doi: 10.1214/22-sts871. Epub 2023 Feb 6.
2
Stability estimation for unsupervised clustering: A review.无监督聚类的稳定性估计:综述
Wiley Interdiscip Rev Comput Stat. 2022 Nov-Dec;14(6):e1575. doi: 10.1002/wics.1575. Epub 2022 Jan 9.
5

本文引用的文献

6
Validating clustering for gene expression data.验证基因表达数据的聚类分析
Bioinformatics. 2001 Apr;17(4):309-18. doi: 10.1093/bioinformatics/17.4.309.
7
Coupled two-way clustering analysis of gene microarray data.基因芯片数据的耦合双向聚类分析
Proc Natl Acad Sci U S A. 2000 Oct 24;97(22):12079-84. doi: 10.1073/pnas.210134797.
10
Clustering gene expression patterns.聚类基因表达模式。
J Comput Biol. 1999 Fall-Winter;6(3-4):281-97. doi: 10.1089/106652799318274.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验