• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

聚类算法:关于学习、验证、性能以及在基因组学中的应用。

Clustering algorithms: on learning, validation, performance, and applications to genomics.

机构信息

Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843-3128, USA.

出版信息

Curr Genomics. 2009 Sep;10(6):430-45. doi: 10.2174/138920209789177601.

DOI:10.2174/138920209789177601
PMID:20190957
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2766793/
Abstract

The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.

摘要

微阵列技术的发展使科学家能够同时测量数千个基因的表达水平,这导致生物学和医学等多个学科对其产生了浓厚的兴趣。虽然数据聚类在图像处理和模式识别中已经使用了几十年,但近年来,它作为一种分析微阵列的流行技术,也加入了这一活动浪潮。为了说明它在基因组学中的应用,聚类应用于一组微阵列数据中的基因,将表达水平在整个样本中表现出相似行为的基因组合在一起,当应用于样本时,它有可能根据基因表达的差异模式来区分病变。尽管聚类在基因表达微阵列的背景下已经使用了多年,但它仍然存在很大的问题。聚类算法和验证指标的选择并不是一件简单的事情,尤其是在将其应用于高通量生物或医学数据时。选择算法时需要考虑的因素包括应用的性质、要分析的对象的特征、预期的聚类数量和形状,以及问题的复杂性与可用的计算能力之间的关系。在某些情况下,一个非常简单的算法可能足以解决问题,但许多情况下可能需要更复杂和强大的算法来更好地处理手头的问题。在本文中,我们将涵盖聚类的理论方面,包括误差和学习,然后概述流行的聚类算法和经典的验证指标。我们还讨论了这些算法和指标的相对性能,并以聚类在计算生物学中的应用为例进行了总结。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/66b290829858/CG-10-430_F8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/fada2ad8c569/CG-10-430_F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/a98b4b122eda/CG-10-430_F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/33a62aff6707/CG-10-430_F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/78763b5c7da8/CG-10-430_F4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/dd113f42e015/CG-10-430_F5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/d3f95fb58e00/CG-10-430_F6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/f7ee6f6b4a90/CG-10-430_F7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/66b290829858/CG-10-430_F8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/fada2ad8c569/CG-10-430_F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/a98b4b122eda/CG-10-430_F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/33a62aff6707/CG-10-430_F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/78763b5c7da8/CG-10-430_F4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/dd113f42e015/CG-10-430_F5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/d3f95fb58e00/CG-10-430_F6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/f7ee6f6b4a90/CG-10-430_F7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83be/2766793/66b290829858/CG-10-430_F8.jpg

相似文献

1
Clustering algorithms: on learning, validation, performance, and applications to genomics.聚类算法:关于学习、验证、性能以及在基因组学中的应用。
Curr Genomics. 2009 Sep;10(6):430-45. doi: 10.2174/138920209789177601.
2
Metric for measuring the effectiveness of clustering of DNA microarray expression.用于测量 DNA 微阵列表达聚类有效性的度量。
BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-7-S2-S5.
3
Comparisons and validation of statistical clustering techniques for microarray gene expression data.微阵列基因表达数据统计聚类技术的比较与验证
Bioinformatics. 2003 Mar 1;19(4):459-66. doi: 10.1093/bioinformatics/btg025.
4
Evaluation of clustering algorithms for gene expression data.基因表达数据聚类算法的评估
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.
5
Digging for Significant Genes in Microarray Expression Data Based on Systematic Sampling and Hierarchal Clustering Algorithm.基于系统抽样和层次聚类算法的基因芯片表达数据中显著基因的挖掘。
Adv Exp Med Biol. 2021;1338:1-6. doi: 10.1007/978-3-030-78775-2_1.
6
Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.使用监督学习组合帕累托最优聚类以识别共表达基因。
BMC Bioinformatics. 2009 Jan 20;10:27. doi: 10.1186/1471-2105-10-27.
7
Validation of computational methods in genomics.基因组学中计算方法的验证。
Curr Genomics. 2007 Mar;8(1):1-19. doi: 10.2174/138920207780076956.
8
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
9
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合:一种蒙特卡洛交叉熵方法。
Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.
10
Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。
Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.

引用本文的文献

1
Clustering Methods in Rheumatic and Musculoskeletal Disease Research: An Educational Guide to Best Research Practices.聚类方法在风湿和肌肉骨骼疾病研究中的应用:最佳研究实践的教育指南。
J Rheumatol. 2024 Dec 1;51(12):1160-1168. doi: 10.3899/jrheum.2024-0519.
2
Factors associated with circulatory death after out-of-hospital cardiac arrest: a population-based cluster analysis.院外心脏骤停后循环死亡的相关因素:基于人群的聚类分析。
Ann Intensive Care. 2023 Jun 9;13(1):49. doi: 10.1186/s13613-023-01143-8.
3
A comprehensive survey on computational learning methods for analysis of gene expression data.

本文引用的文献

1
SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays.SNiPer-HD:通过用于高密度单核苷酸多态性(SNP)阵列的期望最大化算法提高基因型分型准确性。
Bioinformatics. 2007 Jan 1;23(1):57-63. doi: 10.1093/bioinformatics/btl536. Epub 2006 Oct 24.
2
Gene expression profiling predicts survival in conventional renal cell carcinoma.基因表达谱可预测传统型肾细胞癌的生存率。
PLoS Med. 2006 Jan;3(1):e13. doi: 10.1371/journal.pmed.0030013. Epub 2005 Dec 6.
3
Molecular classification of parathyroid neoplasia by gene expression profiling.
关于用于基因表达数据分析的计算学习方法的全面综述。
Front Mol Biosci. 2022 Nov 7;9:907150. doi: 10.3389/fmolb.2022.907150. eCollection 2022.
4
Unsupervised Learning for Automated Detection of Coronary Artery Disease Subgroups.无监督学习在冠状动脉疾病亚组自动检测中的应用。
J Am Heart Assoc. 2021 Dec 7;10(23):e021976. doi: 10.1161/JAHA.121.021976. Epub 2021 Nov 30.
5
Aluminum or Low pH - Which Is the Bigger Enemy of Barley? Transcriptome Analysis of Barley Root Meristem Under Al and Low pH Stress.铝还是低pH值——哪个是大麦的更大敌人?铝和低pH胁迫下大麦根分生组织的转录组分析
Front Genet. 2021 May 19;12:675260. doi: 10.3389/fgene.2021.675260. eCollection 2021.
6
Optimal clustering under uncertainty.不确定性下的最优聚类。
PLoS One. 2018 Oct 2;13(10):e0204627. doi: 10.1371/journal.pone.0204627. eCollection 2018.
7
Sleep in patients with disorders of consciousness characterized by means of machine learning.通过机器学习对意识障碍患者的睡眠进行特征分析。
PLoS One. 2018 Jan 2;13(1):e0190458. doi: 10.1371/journal.pone.0190458. eCollection 2018.
8
Network-driven plasma proteomics expose molecular changes in the Alzheimer's brain.网络驱动的血浆蛋白质组学揭示了阿尔茨海默病大脑中的分子变化。
Mol Neurodegener. 2016 Apr 26;11:31. doi: 10.1186/s13024-016-0095-2.
9
Linking Genes to Cardiovascular Diseases: Gene Action and Gene-Environment Interactions.将基因与心血管疾病相联系:基因作用与基因-环境相互作用
J Cardiovasc Transl Res. 2015 Dec;8(9):506-27. doi: 10.1007/s12265-015-9658-9. Epub 2015 Nov 6.
10
Assessing Dissimilarity Measures for Sample-Based Hierarchical Clustering of RNA Sequencing Data Using Plasmode Datasets.使用模拟数据集评估基于样本的RNA测序数据层次聚类的差异度量
PLoS One. 2015 Jul 10;10(7):e0132310. doi: 10.1371/journal.pone.0132310. eCollection 2015.
通过基因表达谱分析对甲状旁腺肿瘤进行分子分类
Am J Pathol. 2004 Aug;165(2):565-76. doi: 10.1016/S0002-9440(10)63321-4.
4
Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia.利用基因表达谱分析鉴定成人急性髓系白血病的预后亚类。
N Engl J Med. 2004 Apr 15;350(16):1605-16. doi: 10.1056/NEJMoa031046.
5
Metagenes and molecular pattern discovery using matrix factorization.使用矩阵分解发现元基因和分子模式。
Proc Natl Acad Sci U S A. 2004 Mar 23;101(12):4164-9. doi: 10.1073/pnas.0308531101. Epub 2004 Mar 11.
6
Using hidden Markov models to analyze gene expression time course data.使用隐马尔可夫模型分析基因表达时间序列数据。
Bioinformatics. 2003;19 Suppl 1:i255-63. doi: 10.1093/bioinformatics/btg1036.
7
Fuzzy C-means method for clustering microarray data.用于微阵列数据聚类的模糊C均值方法。
Bioinformatics. 2003 May 22;19(8):973-80. doi: 10.1093/bioinformatics/btg119.
8
Genomic profiles and predictive biological networks in oxidant-induced atherogenesis.氧化应激诱导动脉粥样硬化中的基因组图谱和预测性生物网络。
Physiol Genomics. 2003 May 13;13(3):263-75. doi: 10.1152/physiolgenomics.00006.2003.
9
Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study.自组织映射的聚类分析能轻松揭示不同的基因表达模式:淋巴瘤研究的重新分析结果
BMC Bioinformatics. 2002 Nov 24;3:36. doi: 10.1186/1471-2105-3-36.
10
Cluster analysis of gene expression dynamics.基因表达动力学的聚类分析
Proc Natl Acad Sci U S A. 2002 Jul 9;99(14):9121-6. doi: 10.1073/pnas.132656399. Epub 2002 Jun 24.