• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

癌症研究中微阵列数据的聚类稳定性评分。

Cluster stability scores for microarray data in cancer studies.

作者信息

Smolkin Mark, Ghosh Debashis

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.

出版信息

BMC Bioinformatics. 2003 Sep 6;4:36. doi: 10.1186/1471-2105-4-36.

DOI:10.1186/1471-2105-4-36
PMID:12959646
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC200969/
Abstract

BACKGROUND

A potential benefit of profiling of tissue samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Hierarchical clustering has been the primary analytical tool used to define disease subtypes from microarray experiments in cancer settings. Assessing cluster reliability poses a major complication in analyzing output from clustering procedures. While most work has focused on estimating the number of clusters in a dataset, the question of stability of individual-level clusters has not been addressed.

RESULTS

We address this problem by developing cluster stability scores using subsampling techniques. These scores exploit the redundancy in biologically discriminatory information on the chip. Our approach is generic and can be used with any clustering method. We propose procedures for calculating cluster stability scores for situations involving both known and unknown numbers of clusters. We also develop cluster-size adjusted stability scores. The method is illustrated by application to data three cancer studies; one involving childhood cancers, the second involving B-cell lymphoma, and the final is from a malignant melanoma study.

AVAILABILITY

Code implementing the proposed analytic method can be obtained at the second author's website.

摘要

背景

使用微阵列对组织样本进行分析的一个潜在好处是生成能够定义疾病亚型的分子指纹。层次聚类一直是在癌症背景下从微阵列实验中定义疾病亚型的主要分析工具。评估聚类的可靠性是分析聚类程序输出时的一个主要难题。虽然大多数工作都集中在估计数据集中的聚类数量上,但个体水平聚类的稳定性问题尚未得到解决。

结果

我们通过使用子采样技术开发聚类稳定性分数来解决这个问题。这些分数利用了芯片上生物学鉴别信息中的冗余。我们的方法是通用的,可用于任何聚类方法。我们提出了在聚类数量已知和未知的情况下计算聚类稳定性分数的程序。我们还开发了聚类大小调整后的稳定性分数。通过将该方法应用于三项癌症研究的数据进行了说明;一项涉及儿童癌症,第二项涉及B细胞淋巴瘤,最后一项来自恶性黑色素瘤研究。

可用性

实现所提出分析方法的代码可在第二作者的网站上获取。

相似文献

1
Cluster stability scores for microarray data in cancer studies.癌症研究中微阵列数据的聚类稳定性评分。
BMC Bioinformatics. 2003 Sep 6;4:36. doi: 10.1186/1471-2105-4-36.
2
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses.用于评估DNA微阵列数据分析中患者聚类可靠性的随机图谱。
Artif Intell Med. 2006 Jun;37(2):85-109. doi: 10.1016/j.artmed.2006.03.005. Epub 2006 May 23.
3
DNA microarray data and contextual analysis of correlation graphs.DNA微阵列数据与相关图的背景分析
BMC Bioinformatics. 2003 Apr 29;4:15. doi: 10.1186/1471-2105-4-15.
4
Iterative class discovery and feature selection using Minimal Spanning Trees.使用最小生成树的迭代类发现和特征选择
BMC Bioinformatics. 2004 Sep 8;5:126. doi: 10.1186/1471-2105-5-126.
5
Mass distributed clustering: a new algorithm for repeated measurements in gene expression data.大规模分布式聚类:一种用于基因表达数据重复测量的新算法。
Genome Inform. 2005;16(2):183-94.
6
Clustering threshold gradient descent regularization: with applications to microarray studies.聚类阈值梯度下降正则化:及其在微阵列研究中的应用
Bioinformatics. 2007 Feb 15;23(4):466-72. doi: 10.1093/bioinformatics/btl632. Epub 2006 Dec 20.
7
Clustering gene expression data using adaptive double self-organizing map.使用自适应双自组织映射对基因表达数据进行聚类。
Physiol Genomics. 2003 Jun 24;14(1):35-46. doi: 10.1152/physiolgenomics.00138.2002.
8
Cancer DNA microarray analysis considering multi-subclass with graph-based clustering method.基于图形聚类方法的多亚类癌症DNA微阵列分析
J Biosci Bioeng. 2008 Nov;106(5):442-8. doi: 10.1263/jbb.106.442.
9
Unsupervised clustering in mRNA expression profiles.mRNA表达谱中的无监督聚类
Comput Biol Med. 2006 Oct;36(10):1126-42. doi: 10.1016/j.compbiomed.2005.09.003. Epub 2005 Oct 24.
10
Cross-platform comparison and visualisation of gene expression data using co-inertia analysis.使用共惯性分析对基因表达数据进行跨平台比较和可视化
BMC Bioinformatics. 2003 Nov 21;4:59. doi: 10.1186/1471-2105-4-59.

引用本文的文献

1
Cross-Study Replicability in Cluster Analysis.聚类分析中的跨研究可重复性
Stat Sci. 2023 May;38(2):303-316. doi: 10.1214/22-sts871. Epub 2023 Feb 6.
2
Stability estimation for unsupervised clustering: A review.无监督聚类的稳定性估计:综述
Wiley Interdiscip Rev Comput Stat. 2022 Nov-Dec;14(6):e1575. doi: 10.1002/wics.1575. Epub 2022 Jan 9.
3
Particle-Associated Microbial Community in a Subtropical Lake During Thermal Mixing and Phytoplankton Succession.亚热带湖泊热混合和浮游植物演替过程中与颗粒相关的微生物群落
Front Microbiol. 2019 Sep 13;10:2142. doi: 10.3389/fmicb.2019.02142. eCollection 2019.
4
clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets.clusterExperiment 和 RSEC:一个用于单细胞和其他大型基因表达数据集聚类的 Bioconductor 包和框架。
PLoS Comput Biol. 2018 Sep 4;14(9):e1006378. doi: 10.1371/journal.pcbi.1006378. eCollection 2018 Sep.
5
Clustering cancer gene expression data by projective clustering ensemble.通过投影聚类集成对癌症基因表达数据进行聚类
PLoS One. 2017 Feb 24;12(2):e0171429. doi: 10.1371/journal.pone.0171429. eCollection 2017.
6
Microgeographic Proteomic Networks of the Human Colonic Mucosa and Their Association With Inflammatory Bowel Disease.人类结肠黏膜的微观地理蛋白质组网络及其与炎症性肠病的关联
Cell Mol Gastroenterol Hepatol. 2016 May 17;2(5):567-583. doi: 10.1016/j.jcmgh.2016.05.003. eCollection 2016 Sep.
7
Regulation of infection efficiency in a globally abundant marine Bacteriodetes virus.全球广泛分布的一种海洋拟杆菌病毒感染效率的调控
ISME J. 2017 Jan;11(1):284-295. doi: 10.1038/ismej.2016.81. Epub 2016 May 17.
8
Interpolation based consensus clustering for gene expression time series.基于插值的基因表达时间序列一致性聚类
BMC Bioinformatics. 2015 Apr 16;16:117. doi: 10.1186/s12859-015-0541-0.
9
Comprehensive gene expression analysis of rice aleurone cells: probing the existence of an alternative gibberellin receptor.水稻糊粉层细胞的综合基因表达分析:探寻替代赤霉素受体的存在
Plant Physiol. 2015 Feb;167(2):531-44. doi: 10.1104/pp.114.247940. Epub 2014 Dec 15.
10
Critical limitations of consensus clustering in class discovery.共识聚类在类别发现中的关键局限性。
Sci Rep. 2014 Aug 27;4:6207. doi: 10.1038/srep06207.

本文引用的文献

1
Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data.评估在微阵列数据分析中观察到的聚类模式可重复性的方法。
Bioinformatics. 2002 Nov;18(11):1462-9. doi: 10.1093/bioinformatics/18.11.1462.
2
A prediction-based resampling method for estimating the number of clusters in a dataset.一种基于预测的重采样方法,用于估计数据集中的聚类数量。
Genome Biol. 2002 Jun 25;3(7):RESEARCH0036. doi: 10.1186/gb-2002-3-7-research0036.
3
A stability based method for discovering structure in clustered data.一种基于稳定性的方法,用于在聚类数据中发现结构。
Pac Symp Biocomput. 2002:6-17.
4
Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments.自举聚类分析:评估微阵列实验结论的可靠性。
Proc Natl Acad Sci U S A. 2001 Jul 31;98(16):8961-5. doi: 10.1073/pnas.161273698. Epub 2001 Jul 24.
5
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.利用基因表达谱和人工神经网络进行癌症的分类与诊断预测。
Nat Med. 2001 Jun;7(6):673-9. doi: 10.1038/89044.
6
Validating clustering for gene expression data.验证基因表达数据的聚类分析
Bioinformatics. 2001 Apr;17(4):309-18. doi: 10.1093/bioinformatics/17.4.309.
7
Coupled two-way clustering analysis of gene microarray data.基因芯片数据的耦合双向聚类分析
Proc Natl Acad Sci U S A. 2000 Oct 24;97(22):12079-84. doi: 10.1073/pnas.210134797.
8
Molecular classification of cutaneous malignant melanoma by gene expression profiling.通过基因表达谱分析对皮肤恶性黑色素瘤进行分子分类。
Nature. 2000 Aug 3;406(6795):536-40. doi: 10.1038/35020115.
9
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.通过基因表达谱鉴定出的不同类型弥漫性大B细胞淋巴瘤。
Nature. 2000 Feb 3;403(6769):503-11. doi: 10.1038/35000501.
10
Clustering gene expression patterns.聚类基因表达模式。
J Comput Biol. 1999 Fall-Winter;6(3-4):281-97. doi: 10.1089/106652799318274.