Sekula Michael, Datta Somnath, Datta Susmita
Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky, 40202, USA.
Department of Biostatistics, University of Florida, Gainesville, Florida, 32611, USA.
Bioinformation. 2017 Mar 31;13(3):101-103. doi: 10.6026/97320630013101. eCollection 2017.
There exist numerous programs and packages that perform validation for a given clustering solution; however, clustering algorithms fare differently as judged by different validation measures. If more than one performance measure is used to evaluate multiple clustering partitions, an optimal result is often difficult to determine by visual inspection alone. This paper introduces optCluster, an R package that uses a single function to simultaneously compare numerous clustering partitions (created by different algorithms and/or numbers of clusters) and obtain a "best" option for a given dataset. The method of weighted rank aggregation is utilized by this package to objectively aggregate various performance measure scores, thereby taking away the guesswork that often follows a visual inspection of cluster results. The optCluster package contains biological validation measures as well as clustering algorithms developed specifically for RNA sequencing data, making it a useful tool for clustering genomic data.
This package is available for free through the Comprehensive R Archive Network (CRAN) at http://cran.rproject.org/web/packages/optCluster/.
有许多程序和软件包可对给定的聚类解决方案进行验证;然而,根据不同的验证指标来判断,聚类算法的表现有所不同。如果使用多个性能指标来评估多个聚类划分,仅通过目视检查往往很难确定最优结果。本文介绍了optCluster,这是一个R软件包,它使用单个函数同时比较多个聚类划分(由不同算法和/或聚类数量创建),并为给定数据集获得“最佳”选项。该软件包利用加权秩聚合方法客观地汇总各种性能指标得分,从而消除了对聚类结果进行目视检查后常常需要的猜测。optCluster软件包包含生物学验证指标以及专门为RNA测序数据开发的聚类算法,使其成为聚类基因组数据的有用工具。
该软件包可通过综合R存档网络(CRAN)免费获取,网址为http://cran.rproject.org/web/packages/optCluster/ 。