optCluster：一个用于确定最优聚类算法的R软件包。

optCluster: An R Package for Determining the Optimal Clustering Algorithm.

作者信息

Sekula Michael, Datta Somnath, Datta Susmita

机构信息

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky, 40202, USA.

Department of Biostatistics, University of Florida, Gainesville, Florida, 32611, USA.

出版信息

Bioinformation. 2017 Mar 31;13(3):101-103. doi: 10.6026/97320630013101. eCollection 2017.

DOI:10.6026/97320630013101

PMID:28584451

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5450252/

Abstract

UNLABELLED

There exist numerous programs and packages that perform validation for a given clustering solution; however, clustering algorithms fare differently as judged by different validation measures. If more than one performance measure is used to evaluate multiple clustering partitions, an optimal result is often difficult to determine by visual inspection alone. This paper introduces optCluster, an R package that uses a single function to simultaneously compare numerous clustering partitions (created by different algorithms and/or numbers of clusters) and obtain a "best" option for a given dataset. The method of weighted rank aggregation is utilized by this package to objectively aggregate various performance measure scores, thereby taking away the guesswork that often follows a visual inspection of cluster results. The optCluster package contains biological validation measures as well as clustering algorithms developed specifically for RNA sequencing data, making it a useful tool for clustering genomic data.

AVAILABILITY

This package is available for free through the Comprehensive R Archive Network (CRAN) at http://cran.rproject.org/web/packages/optCluster/.

摘要

未标注

有许多程序和软件包可对给定的聚类解决方案进行验证；然而，根据不同的验证指标来判断，聚类算法的表现有所不同。如果使用多个性能指标来评估多个聚类划分，仅通过目视检查往往很难确定最优结果。本文介绍了optCluster，这是一个R软件包，它使用单个函数同时比较多个聚类划分（由不同算法和/或聚类数量创建），并为给定数据集获得“最佳”选项。该软件包利用加权秩聚合方法客观地汇总各种性能指标得分，从而消除了对聚类结果进行目视检查后常常需要的猜测。optCluster软件包包含生物学验证指标以及专门为RNA测序数据开发的聚类算法，使其成为聚类基因组数据的有用工具。

可用性

该软件包可通过综合R存档网络（CRAN）免费获取，网址为http://cran.rproject.org/web/packages/optCluster/ 。

相似文献

optCluster: An R Package for Determining the Optimal Clustering Algorithm.optCluster：一个用于确定最优聚类算法的R软件包。

Bioinformation. 2017 Mar 31;13(3):101-103. doi: 10.6026/97320630013101. eCollection 2017.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合：一种蒙特卡洛交叉熵方法。

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

APCluster: an R package for affinity propagation clustering.APCluster：一个用于亲和传播聚类的 R 包。

Bioinformatics. 2011 Sep 1;27(17):2463-4. doi: 10.1093/bioinformatics/btr406. Epub 2011 Jul 6.

Scellpam: an R package/C++ library to perform parallel partitioning around medoids on scRNAseq data sets.Scellpam：一个用于在 scRNAseq 数据集上围绕质心进行并行分区的 R 包/C++ 库。

BMC Bioinformatics. 2023 Sep 14;24(1):342. doi: 10.1186/s12859-023-05471-1.

A systematic performance evaluation of clustering methods for single-cell RNA-seq data.单细胞RNA测序数据聚类方法的系统性能评估

F1000Res. 2018 Jul 26;7:1141. doi: 10.12688/f1000research.15666.3. eCollection 2018.

Model-based clustering for RNA-seq data.基于模型的 RNA-seq 数据聚类。

Bioinformatics. 2014 Jan 15;30(2):197-205. doi: 10.1093/bioinformatics/btt632. Epub 2013 Nov 4.

SimSeq: a nonparametric approach to simulation of RNA-sequence datasets.SimSeq：一种用于RNA序列数据集模拟的非参数方法。

Bioinformatics. 2015 Jul 1;31(13):2131-40. doi: 10.1093/bioinformatics/btv124. Epub 2015 Feb 26.

CePa: an R package for finding significant pathways weighted by multiple network centralities.CePa：一个 R 包，用于发现通过多种网络中心性加权的显著通路。

Bioinformatics. 2013 Mar 1;29(5):658-60. doi: 10.1093/bioinformatics/btt008. Epub 2013 Jan 10.

CINNA: an R/CRAN package to decipher Central Informative Nodes in Network Analysis.CINNA：一个用于解析网络分析中中心信息节点的 R/CRAN 包。

Bioinformatics. 2019 Apr 15;35(8):1436-1437. doi: 10.1093/bioinformatics/bty819.

SillyPutty: Improved clustering by optimizing the silhouette width.SillyPutty：通过优化轮廓宽度实现聚类改进。

PLoS One. 2024 Jun 7;19(6):e0300358. doi: 10.1371/journal.pone.0300358. eCollection 2024.

引用本文的文献

SillyPutty: Improved clustering by optimizing the silhouette width.SillyPutty：通过优化轮廓宽度实现聚类改进。

PLoS One. 2024 Jun 7;19(6):e0300358. doi: 10.1371/journal.pone.0300358. eCollection 2024.

Clustering analysis of lipoprotein profiles to identify subtypes of hypertriglyceridemia in Miniature Schnauzers.对脂蛋白谱进行聚类分析，以鉴定迷你雪纳瑞的高甘油三酯血症亚型。

J Vet Intern Med. 2024 Mar-Apr;38(2):971-979. doi: 10.1111/jvim.17010. Epub 2024 Feb 13.

SillyPutty: Improved clustering by optimizing the silhouette width.橡皮泥：通过优化轮廓宽度改进聚类

bioRxiv. 2023 Nov 11:2023.11.07.566055. doi: 10.1101/2023.11.07.566055.

RNA localization during early development of the axolotl.蝾螈早期发育过程中的RNA定位

Front Cell Dev Biol. 2023 Oct 19;11:1260795. doi: 10.3389/fcell.2023.1260795. eCollection 2023.

Interpretable clinical phenotypes among patients hospitalized with COVID-19 using cluster analysis.使用聚类分析对COVID-19住院患者的可解释临床表型进行研究。

Front Digit Health. 2023 Apr 11;5:1142822. doi: 10.3389/fdgth.2023.1142822. eCollection 2023.

Multi-omic integration via similarity network fusion to detect molecular subtypes of ageing.通过相似性网络融合进行多组学整合以检测衰老的分子亚型。

Brain Commun. 2023 Apr 4;5(2):fcad110. doi: 10.1093/braincomms/fcad110. eCollection 2023.

Comparison of RNA localization during oogenesis within and .卵子发生过程中RNA在[具体范围1]和[具体范围2]内的定位比较。

Front Cell Dev Biol. 2022 Sep 20;10:982732. doi: 10.3389/fcell.2022.982732. eCollection 2022.

Whole Transcriptome Sequencing Unveils the Genomic Determinants of Putative Somaclonal Variation in Mint ( L.).全转录组测序揭示薄荷（L.）疑似体细胞变异的基因组决定因素。

Int J Mol Sci. 2022 May 10;23(10):5291. doi: 10.3390/ijms23105291.

Development of a Longitudinal Diagnosis and Prognosis in Patients with Chronic Kidney Disease: Intelligent Clinical Decision-Making Scheme.慢性肾脏病患者纵向诊断和预后的开发：智能临床决策方案。

Int J Environ Res Public Health. 2021 Dec 4;18(23):12807. doi: 10.3390/ijerph182312807.

Circular functional analysis of OCT data for precise identification of structural phenotypes in the eye.环形功能分析 OCT 数据，精确识别眼部结构表型。

Sci Rep. 2021 Dec 2;11(1):23336. doi: 10.1038/s41598-021-02025-4.

本文引用的文献

Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K.多种聚类算法的联合映射（COMMUNAL）：一种选择聚类数K的稳健方法。

Sci Rep. 2015 Nov 19;5:16971. doi: 10.1038/srep16971.

Model-based clustering for RNA-seq data.基于模型的 RNA-seq 数据聚类。

Bioinformatics. 2014 Jan 15;30(2):197-205. doi: 10.1093/bioinformatics/btt632. Epub 2013 Nov 4.

RankAggreg, an R package for weighted rank aggregation.RankAggreg，一个用于加权排名聚合的R包。

BMC Bioinformatics. 2009 Feb 19;10:62. doi: 10.1186/1471-2105-10-62.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合：一种蒙特卡洛交叉熵方法。

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

Computational cluster validation in post-genomic data analysis.后基因组数据分析中的计算聚类验证

Bioinformatics. 2005 Aug 1;21(15):3201-12. doi: 10.1093/bioinformatics/bti517. Epub 2005 May 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验