Suppr超能文献

CPS 分析:生物医学数据聚类的自包含验证。

CPS analysis: self-contained validation of biomedical data clustering.

机构信息

Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA.

出版信息

Bioinformatics. 2020 Jun 1;36(11):3516-3521. doi: 10.1093/bioinformatics/btaa165.

Abstract

MOTIVATION

Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community.

RESULTS

We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods.

AVAILABILITY AND IMPLEMENTATION

The method is implemented in an R package called OTclust, available on CRAN.

CONTACT

lzz46@psu.edu or jiali@psu.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

聚类分析被广泛用于识别生物医学数据中的有趣子组。由于在无监督设置中不知道真实的类别标签,因此很难对计算得到的任何聚类进行验证,这是研究社区几乎没有解决的一个重要问题。

结果

我们开发了一个名为覆盖点集(CPS)分析的工具包,用于量化个体聚类和整体分区水平的不确定性。已经开发了函数,可有效地可视化高维数据中任何聚类的固有变化,并提供有关数据中潜在有趣子组的更全面视图。将其应用于生物医学数据的三个使用场景,我们证明与最先进的度量标准相比,CPS 分析在评估聚类的不确定性方面更为有效。我们还展示了如何使用 CPS 分析来选择数据生成技术或可视化方法。

可用性和实现

该方法在一个名为 OTclust 的 R 包中实现,可在 CRAN 上获得。

联系方式

lzz46@psu.edujiali@psu.edu

补充信息

补充数据可在 Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验