• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于软约束亲和传播的聚类:在基因表达数据中的应用

Clustering by soft-constraint affinity propagation: applications to gene-expression data.

作者信息

Leone Michele, Weigt Martin

机构信息

Institute for Scientific Interchange, Viale Settimio Severo 65, Villa Gualino, I-10133 Torino, Italy.

出版信息

Bioinformatics. 2007 Oct 15;23(20):2708-15. doi: 10.1093/bioinformatics/btm414. Epub 2007 Sep 25.

DOI:10.1093/bioinformatics/btm414
PMID:17895277
Abstract

MOTIVATION

Similarity-measure-based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck (2007a). In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, e.g. in analyzing gene expression data.

RESULTS

This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new a priori free parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.

摘要

动机

基于相似度度量的聚类是贯穿科学数据分析的一个关键问题。最近,Frey和Dueck(2007a)提出了一种基于消息传递技术的强大新算法——亲和传播(AP)。在AP中,每个聚类由一个共同的范例来标识,同一聚类的所有其他数据点都指向该范例,且范例必须指向自身。尽管AP已被证明具有强大功能,但其当前形式存在一些缺点。每个聚类恰好有一个范例的硬约束将AP限制于规则形状聚类的类别,并导致性能次优,例如在分析基因表达数据时。

结果

通过放宽AP的硬约束可以克服这一限制。一个新参数控制约束相对于最大化整体相似度目标的重要性,并允许在每个数据点选择其最接近邻居作为范例的简单情况与原始AP之间进行插值。由此产生的软约束亲和传播(SCAP)变得更具信息性、准确性,并能产生更稳定的聚类。尽管引入了一个新的先验自由参数,但由于提高了鲁棒性且更自然地出现了参数选择的最优策略,算法对外部调优的总体依赖性降低了。SCAP在生物基准数据上进行了测试,特别是包括与各种癌症类型相关的微阵列数据。我们表明该算法有效地揭示了数据集中存在的层次聚类结构。此外,它允许为每个聚类提取稀疏的基因表达特征。

相似文献

1
Clustering by soft-constraint affinity propagation: applications to gene-expression data.基于软约束亲和传播的聚类:在基因表达数据中的应用
Bioinformatics. 2007 Oct 15;23(20):2708-15. doi: 10.1093/bioinformatics/btm414. Epub 2007 Sep 25.
2
A binary variable model for affinity propagation.一种用于亲和传播的二元变量模型。
Neural Comput. 2009 Jun;21(6):1589-600. doi: 10.1162/neco.2009.05-08-785.
3
Learning kernels from biological networks by maximizing entropy.通过最大化熵从生物网络中学习内核。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i326-33. doi: 10.1093/bioinformatics/bth906.
4
A novel approach for clustering proteomics data using Bayesian fast Fourier transform.一种使用贝叶斯快速傅里叶变换对蛋白质组学数据进行聚类的新方法。
Bioinformatics. 2005 May 15;21(10):2210-24. doi: 10.1093/bioinformatics/bti383. Epub 2005 Mar 15.
5
Mining co-regulated gene profiles for the detection of functional associations in gene expression data.挖掘共调控基因谱以检测基因表达数据中的功能关联。
Bioinformatics. 2007 Aug 1;23(15):1927-35. doi: 10.1093/bioinformatics/btm276. Epub 2007 May 30.
6
Identifying projected clusters from gene expression profiles.从基因表达谱中识别预测的聚类。
J Biomed Inform. 2004 Oct;37(5):345-57. doi: 10.1016/j.jbi.2004.05.002.
7
Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别
Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.
8
Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data.用于高通量生物数据中具有分散对象和先验信息的聚类的惩罚加权K均值算法
Bioinformatics. 2007 Sep 1;23(17):2247-55. doi: 10.1093/bioinformatics/btm320. Epub 2007 Jun 27.
9
Hierarchical tree snipping: clustering guided by prior knowledge.层次树剪枝:由先验知识引导的聚类
Bioinformatics. 2007 Dec 15;23(24):3335-42. doi: 10.1093/bioinformatics/btm526. Epub 2007 Nov 7.
10
An improved algorithm for clustering gene expression data.一种用于聚类基因表达数据的改进算法。
Bioinformatics. 2007 Nov 1;23(21):2859-65. doi: 10.1093/bioinformatics/btm418. Epub 2007 Aug 25.

引用本文的文献

1
Deep clustering of bacterial tree images.细菌树图像的深度聚类。
Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210231. doi: 10.1098/rstb.2021.0231. Epub 2022 Aug 22.
2
A Clustering Approach for Motif Discovery in ChIP-Seq Dataset.一种用于ChIP-Seq数据集中基序发现的聚类方法。
Entropy (Basel). 2019 Aug 16;21(8):802. doi: 10.3390/e21080802.
3
Using affinity propagation clustering for identifying bacterial clades and subclades with whole-genome sequences of Francisella tularensis.使用亲和传播聚类分析鉴定土拉弗朗西斯菌全基因组序列中的细菌进化枝和亚进化枝。
PLoS Negl Trop Dis. 2020 Sep 29;14(9):e0008018. doi: 10.1371/journal.pntd.0008018. eCollection 2020 Sep.
4
Trends in Alzheimer's Disease Research Based upon Machine Learning Analysis of PubMed Abstracts.基于 PubMed 摘要的机器学习分析的阿尔茨海默病研究趋势。
Int J Biol Sci. 2019 Aug 6;15(10):2065-2074. doi: 10.7150/ijbs.35743. eCollection 2019.
5
A hotspots analysis-relation discovery representation model for revealing diabetes mellitus and obesity.一种用于揭示糖尿病和肥胖症的热点分析-关系发现表示模型。
BMC Syst Biol. 2018 Dec 14;12(Suppl 7):116. doi: 10.1186/s12918-018-0640-4.
6
Defining objective clusters for rabies virus sequences using affinity propagation clustering.使用亲和传播聚类定义狂犬病病毒序列的目标聚类。
PLoS Negl Trop Dis. 2018 Jan 22;12(1):e0006182. doi: 10.1371/journal.pntd.0006182. eCollection 2018 Jan.
7
BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation.BinSanity:利用覆盖度和亲和传播对环境微生物组装体进行无监督聚类。
PeerJ. 2017 Mar 8;5:e3035. doi: 10.7717/peerj.3035. eCollection 2017.
8
Clustering Algorithms: Their Application to Gene Expression Data.聚类算法:它们在基因表达数据中的应用。
Bioinform Biol Insights. 2016 Nov 30;10:237-253. doi: 10.4137/BBI.S38316. eCollection 2016.
9
Morphological Neuron Classification Using Machine Learning.基于机器学习的神经元形态分类
Front Neuroanat. 2016 Nov 1;10:102. doi: 10.3389/fnana.2016.00102. eCollection 2016.
10
An Affinity Propagation-Based DNA Motif Discovery Algorithm.一种基于亲和传播的DNA基序发现算法。
Biomed Res Int. 2015;2015:853461. doi: 10.1155/2015/853461. Epub 2015 Aug 10.