• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于算术调和割的层次聚类:复杂度与实验。

Hierarchical clustering using the arithmetic-harmonic cut: complexity and experiments.

机构信息

Dipartimento di Matematica ed Informatica, University of Udine, Udine, Italy.

出版信息

PLoS One. 2010 Dec 2;5(12):e14067. doi: 10.1371/journal.pone.0014067.

DOI:10.1371/journal.pone.0014067
PMID:21151943
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2997101/
Abstract

Clustering, particularly hierarchical clustering, is an important method for understanding and analysing data across a wide variety of knowledge domains with notable utility in systems where the data can be classified in an evolutionary context. This paper introduces a new hierarchical clustering problem defined by a novel objective function we call the arithmetic-harmonic cut. We show that the problem of finding such a cut is NP-hard and APX-hard but is fixed-parameter tractable, which indicates that although the problem is unlikely to have a polynomial time algorithm (even for approximation), exact parameterized and local search based techniques may produce workable algorithms. To this end, we implement a memetic algorithm for the problem and demonstrate the effectiveness of the arithmetic-harmonic cut on a number of datasets including a cancer type dataset and a corona virus dataset. We show favorable performance compared to currently used hierarchical clustering techniques such as k-Means, Graclus and Normalized-Cut. The arithmetic-harmonic cut metric overcoming difficulties other hierarchical methods have in representing both intercluster differences and intracluster similarities.

摘要

聚类,特别是层次聚类,是一种在广泛的知识领域中理解和分析数据的重要方法,在数据可以在进化背景下分类的系统中具有显著的实用性。本文介绍了一个新的层次聚类问题,由我们称之为算术-调和切割的新目标函数定义。我们表明,找到这样一个切割的问题是 NP 难和 APX 难的,但却是固定参数可处理的,这表明尽管该问题不太可能有多项式时间算法(即使是近似的),但基于精确参数化和局部搜索的技术可能会产生可行的算法。为此,我们为该问题实现了一个遗传算法,并在包括癌症类型数据集和冠状病毒数据集在内的多个数据集上展示了算术调和切割的有效性。与 k-Means、Graclus 和 Normalized-Cut 等当前使用的层次聚类技术相比,我们显示出了有利的性能。算术调和切割度量克服了其他层次方法在表示簇间差异和簇内相似性方面的困难。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b28/2997101/1dbf2c9014da/pone.0014067.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b28/2997101/60bc4a1ad1e1/pone.0014067.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b28/2997101/1dbf2c9014da/pone.0014067.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b28/2997101/60bc4a1ad1e1/pone.0014067.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b28/2997101/1dbf2c9014da/pone.0014067.g002.jpg

相似文献

1
Hierarchical clustering using the arithmetic-harmonic cut: complexity and experiments.基于算术调和割的层次聚类:复杂度与实验。
PLoS One. 2010 Dec 2;5(12):e14067. doi: 10.1371/journal.pone.0014067.
2
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类
Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.
3
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.聚类组学:针对异构数据集的整合上下文相关聚类
PLoS Comput Biol. 2017 Oct 16;13(10):e1005781. doi: 10.1371/journal.pcbi.1005781. eCollection 2017 Oct.
4
Biologically supervised hierarchical clustering algorithms for gene expression data.用于基因表达数据的生物监督层次聚类算法。
Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5515-8. doi: 10.1109/IEMBS.2006.260308.
5
Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles.基于双重选择的半监督聚类集成用于从基因表达谱中进行肿瘤聚类
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):727-40. doi: 10.1109/TCBB.2014.2315996.
6
Does Determination of Initial Cluster Centroids Improve the Performance of -Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study.初始聚类质心的确定是否能提高 -Means 聚类算法的性能?在应用研究中,通过遗传算法、最小生成树和层次聚类三种混合方法的比较。
Comput Math Methods Med. 2020 Aug 1;2020:7636857. doi: 10.1155/2020/7636857. eCollection 2020.
7
A memetic-aided approach to hierarchical clustering from distance matrices: application to gene expression clustering and phylogeny.一种基于模因辅助的距离矩阵层次聚类方法:在基因表达聚类和系统发育中的应用。
Biosystems. 2003 Nov;72(1-2):75-97. doi: 10.1016/s0303-2647(03)00136-9.
8
Theoretical Analysis of Local Search and Simple Evolutionary Algorithms for the Generalized Travelling Salesperson Problem.广义旅行商问题的局部搜索和简单进化算法的理论分析。
Evol Comput. 2019 Fall;27(3):525-558. doi: 10.1162/evco_a_00233. Epub 2018 Jun 22.
9
Novel clustering algorithm for microarray expression data in a truncated SVD space.截断奇异值分解空间中微阵列表达数据的新型聚类算法。
Bioinformatics. 2003 Jun 12;19(9):1110-5. doi: 10.1093/bioinformatics/btg053.
10
SC(3): Triple spectral clustering-based consensus clustering framework for class discovery from cancer gene expression profiles.SC(3):基于三重谱聚类的共识聚类框架,用于从癌症基因表达谱中发现类别。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1751-65. doi: 10.1109/TCBB.2012.108.

引用本文的文献

1
Collective correlations, dynamics, and behavioural inconsistencies of the cryptocurrency market over time.加密货币市场随时间的集体相关性、动态变化及行为不一致性。
Nonlinear Dyn. 2022;107(4):4001-4017. doi: 10.1007/s11071-021-07166-9. Epub 2022 Jan 3.
2
Country transition index based on hierarchical clustering to predict next COVID-19 waves.基于层次聚类的国家过渡指数预测下一波 COVID-19 浪潮。
Sci Rep. 2021 Jul 27;11(1):15271. doi: 10.1038/s41598-021-94661-z.
3
Trends in COVID-19 prevalence and mortality: A year in review.2019冠状病毒病的流行趋势和死亡率:年度回顾

本文引用的文献

1
Weighted graph cuts without eigenvectors a multilevel approach.无需特征向量的加权图割:一种多级方法。
IEEE Trans Pattern Anal Mach Intell. 2007 Nov;29(11):1944-57. doi: 10.1109/TPAMI.2007.1115.
2
Open source clustering software.开源聚类软件。
Bioinformatics. 2004 Jun 12;20(9):1453-4. doi: 10.1093/bioinformatics/bth078. Epub 2004 Feb 10.
3
Relationship of SARS-CoV to other pathogenic RNA viruses explored by tetranucleotide usage profiling.通过四核苷酸使用谱分析探索严重急性呼吸综合征冠状病毒与其他致病性RNA病毒的关系。
Physica D. 2021 Nov;425:132968. doi: 10.1016/j.physd.2021.132968. Epub 2021 Jun 7.
4
Association between COVID-19 cases and international equity indices.新型冠状病毒肺炎病例与国际股票指数之间的关联。
Physica D. 2021 Mar;417:132809. doi: 10.1016/j.physd.2020.132809. Epub 2020 Dec 23.
5
COVID-19 in the United States: Trajectories and second surge behavior.美国的 COVID-19:轨迹和第二波疫情行为。
Chaos. 2020 Sep;30(9):091102. doi: 10.1063/5.0024204.
6
Cluster-based dual evolution for multivariate time series: Analyzing COVID-19.基于聚类的多元时间序列对偶进化:分析 COVID-19。
Chaos. 2020 Jun;30(6):061108. doi: 10.1063/5.0013156.
BMC Bioinformatics. 2003 Sep 20;4:43. doi: 10.1186/1471-2105-4-43.
4
Large-scale analysis of the human and mouse transcriptomes.人类和小鼠转录组的大规模分析。
Proc Natl Acad Sci U S A. 2002 Apr 2;99(7):4465-70. doi: 10.1073/pnas.012025199. Epub 2002 Mar 19.
5
Fitness landscapes, memetic algorithms, and greedy operators for graph bipartitioning.用于图二分法的适应度景观、文化算法和贪婪算子。
Evol Comput. 2000 Spring;8(1):61-91. doi: 10.1162/106365600568103.
6
Systematic variation in gene expression patterns in human cancer cell lines.人类癌细胞系中基因表达模式的系统性变化。
Nat Genet. 2000 Mar;24(3):227-35. doi: 10.1038/73432.
7
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.癌症的分子分类:通过基因表达监测进行类别发现和类别预测。
Science. 1999 Oct 15;286(5439):531-7. doi: 10.1126/science.286.5439.531.