Suppr超能文献

FilterK:一种用于身体活动 k 均值聚类的新异常值检测方法。

FilterK: A new outlier detection method for k-means clustering of physical activity.

机构信息

Leicester Diabetes Centre, University Hospitals of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK; Diabetes Research Centre, University of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK.

School of Physics and Astronomy, University of Leicester, University Road, Leicester LE1 7RH, UK.

出版信息

J Biomed Inform. 2020 Apr;104:103397. doi: 10.1016/j.jbi.2020.103397. Epub 2020 Feb 26.

Abstract

In this paper, a new algorithm denoted as FilterK is proposed for improving the purity of k-means derived physical activity clusters by reducing outlier influence. We applied it to physical activity data obtained with body-worn accelerometers and clustered using k-means. We compared its performance with three existing outlier detection methods: Local Outlier Factor, Isolation Forests and KNN using the ground truth (class labels), average cluster and event purity (ACEP). FilterK provided comparable gains in ACEP (0.581 → 0.596 compared to 0.580-0.617) whilst removing a lower number of outliers than the other methods (4% total dataset size vs 10% to achieve this ACEP). The main focus of our new outlier detection method is to improve the cluster purities of physical activity accelerometer data, but we also suggest it may be potentially applied to other types of dataset captured by k-means clustering. We demonstrate our method using a k-means model trained on two independent accelerometer datasets (training n = 90) and re-applied to an independent dataset (test n = 41). Labelled physical activities include lying down, sitting, standing, household chores, walking (laboratory and non-laboratory based), stairs and running. This type of clustering algorithm could be used to assist with identifying optimal physical activity patterns for health.

摘要

本文提出了一种新的算法 FilterK,用于通过减少异常值的影响来提高 k-均值衍生的体力活动聚类的纯度。我们将其应用于使用佩戴在身上的加速度计获得的体力活动数据,并使用 k-均值对其进行聚类。我们将其性能与三种现有的异常值检测方法进行了比较:局部离群因子、隔离森林和 KNN 使用地面真实值(类别标签)、平均聚类和事件纯度(ACEP)。FilterK 在 ACEP 方面提供了可比的增益(与 0.580-0.617 相比为 0.596),同时比其他方法删除的异常值数量更少(总数据集大小的 4% 与 10% 相比达到此 ACEP)。我们新的异常值检测方法的主要重点是提高体力活动加速度计数据的聚类纯度,但我们还建议它可能潜在地应用于 k-均值聚类捕获的其他类型的数据集。我们使用在两个独立的加速度计数据集(训练 n=90)上训练的 k-均值模型演示了我们的方法,并将其重新应用于一个独立的数据集(测试 n=41)。标记的体力活动包括躺下、坐下、站立、家务、行走(实验室和非实验室)、楼梯和跑步。这种聚类算法可用于帮助识别健康的最佳体力活动模式。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验