Suppr超能文献

生物医学数据集的数据凸性和参数独立聚类。

Data Convexity and Parameter Independent Clustering for Biomedical Datasets.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):765-772. doi: 10.1109/TCBB.2020.2978188. Epub 2021 Apr 6.

Abstract

In machine learning, the nature of the dataset itself such as convexity of the data point sets affects the right choice of clustering algorithm to give good performance. This brief paper first focuses on how data convexity influences the clustering performance on biomedical datasets. Then it addresses the main challenges of two well-known clustering groups which are centroid-based and density-based clustering. These techniques typically require a set of parameters to be provided by the user before the algorithms can perform well in terms of good clustering and give the optimal number of clusters. Two parameter independent clustering techniques utilizing unique neighborhood sets (UNSs) called Parameter Independent Convex Centroid-based Clustering (ConvexClust) for convex-dominated datasets and Parameter Independent Non-Convex Density-based Clustering (NonConvexClust) for nonconvex-dominated datasets are introduced. The ConvexClust and NonConvex Clust algorithms are extensively evaluated on real-world biomedical datasets. Their performances are also compared with other clustering algorithms using evaluation criteria such as SSE, entropy and purity. The results have revealed the good performance of the proposed parameter-independent clustering techniques and also shown that most of the biomedical datasets in the experiments demonstrated their tendency towards convex-dominated data point sets.

摘要

在机器学习中,数据集本身的性质,如数据点集的凸性,会影响聚类算法的正确选择,以获得良好的性能。本文首先重点介绍数据凸性如何影响生物医学数据集上的聚类性能。然后,它解决了基于质心和基于密度的聚类这两个著名聚类组的主要挑战。这些技术通常需要用户提供一组参数,然后算法才能在良好的聚类方面表现良好,并给出最佳的聚类数量。本文介绍了两种利用独特邻域集(UNS)的参数独立聚类技术,称为基于凸中心的参数独立聚类(ConvexClust)用于凸主导数据集,以及基于非凸密度的参数独立非凸聚类(NonConvexClust)用于非凸主导数据集。ConvexClust 和 NonConvexClust 算法在真实的生物医学数据集上进行了广泛的评估。还使用 SSE、熵和纯度等评估标准将它们的性能与其他聚类算法进行了比较。结果表明,所提出的参数独立聚类技术具有良好的性能,并且实验中的大多数生物医学数据集都表现出其数据点集倾向于凸性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验