Suppr
超能文献

FilterK：一种用于身体活动 k 均值聚类的新异常值检测方法。

FilterK: A new outlier detection method for k-means clustering of physical activity.

机构信息

Leicester Diabetes Centre, University Hospitals of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK; Diabetes Research Centre, University of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK.

School of Physics and Astronomy, University of Leicester, University Road, Leicester LE1 7RH, UK.

出版信息

J Biomed Inform. 2020 Apr;104:103397. doi: 10.1016/j.jbi.2020.103397. Epub 2020 Feb 26.

DOI:10.1016/j.jbi.2020.103397

PMID:32113005

Abstract

In this paper, a new algorithm denoted as FilterK is proposed for improving the purity of k-means derived physical activity clusters by reducing outlier influence. We applied it to physical activity data obtained with body-worn accelerometers and clustered using k-means. We compared its performance with three existing outlier detection methods: Local Outlier Factor, Isolation Forests and KNN using the ground truth (class labels), average cluster and event purity (ACEP). FilterK provided comparable gains in ACEP (0.581 → 0.596 compared to 0.580-0.617) whilst removing a lower number of outliers than the other methods (4% total dataset size vs 10% to achieve this ACEP). The main focus of our new outlier detection method is to improve the cluster purities of physical activity accelerometer data, but we also suggest it may be potentially applied to other types of dataset captured by k-means clustering. We demonstrate our method using a k-means model trained on two independent accelerometer datasets (training n = 90) and re-applied to an independent dataset (test n = 41). Labelled physical activities include lying down, sitting, standing, household chores, walking (laboratory and non-laboratory based), stairs and running. This type of clustering algorithm could be used to assist with identifying optimal physical activity patterns for health.

摘要

本文提出了一种新的算法 FilterK，用于通过减少异常值的影响来提高 k-均值衍生的体力活动聚类的纯度。我们将其应用于使用佩戴在身上的加速度计获得的体力活动数据，并使用 k-均值对其进行聚类。我们将其性能与三种现有的异常值检测方法进行了比较：局部离群因子、隔离森林和 KNN 使用地面真实值（类别标签）、平均聚类和事件纯度（ACEP）。FilterK 在 ACEP 方面提供了可比的增益（与 0.580-0.617 相比为 0.596），同时比其他方法删除的异常值数量更少（总数据集大小的 4% 与 10% 相比达到此 ACEP）。我们新的异常值检测方法的主要重点是提高体力活动加速度计数据的聚类纯度，但我们还建议它可能潜在地应用于 k-均值聚类捕获的其他类型的数据集。我们使用在两个独立的加速度计数据集（训练 n=90）上训练的 k-均值模型演示了我们的方法，并将其重新应用于一个独立的数据集（测试 n=41）。标记的体力活动包括躺下、坐下、站立、家务、行走（实验室和非实验室）、楼梯和跑步。这种聚类算法可用于帮助识别健康的最佳体力活动模式。

相似文献

FilterK: A new outlier detection method for k-means clustering of physical activity.

J Biomed Inform. 2020 Apr;104:103397. doi: 10.1016/j.jbi.2020.103397. Epub 2020 Feb 26.

Towards a Portable Model to Discriminate Activity Clusters from Accelerometer Data.

Sensors (Basel). 2019 Oct 17;19(20):4504. doi: 10.3390/s19204504.

Entropy-based grid approach for handling outliers: a case study to environmental monitoring data.

Environ Sci Pollut Res Int. 2023 Dec;30(60):125138-125157. doi: 10.1007/s11356-023-26780-1. Epub 2023 Jun 12.

How the Outliers Influence the Quality of Clustering?

Entropy (Basel). 2022 Jun 30;24(7):917. doi: 10.3390/e24070917.

Hip and Wrist Accelerometer Algorithms for Free-Living Behavior Classification.

Med Sci Sports Exerc. 2016 May;48(5):933-40. doi: 10.1249/MSS.0000000000000840.

Exploration of Outliers in If-Then Rule-Based Knowledge Bases.

Entropy (Basel). 2020 Sep 29;22(10):1096. doi: 10.3390/e22101096.

Detecting EEG outliers for BCI on the Riemannian manifold using spectral clustering.

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:438-441. doi: 10.1109/EMBC44109.2020.9175456.

A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost.

Entropy (Basel). 2020 Aug 17;22(8):902. doi: 10.3390/e22080902.

Reliable recognition of lying, sitting, and standing with a hip-worn accelerometer.

Scand J Med Sci Sports. 2018 Mar;28(3):1092-1102. doi: 10.1111/sms.13017. Epub 2017 Dec 13.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

Anomaly-based threat detection in smart health using machine learning.

BMC Med Inform Decis Mak. 2024 Nov 19;24(1):347. doi: 10.1186/s12911-024-02760-4.

A Clustering Ensemble Method for Drug Safety Signal Detection in Post-Marketing Surveillance.

Ther Innov Regul Sci. 2025 Jan;59(1):89-101. doi: 10.1007/s43441-024-00705-7. Epub 2024 Oct 20.

The childhood arthritis radiographic score of the hip: the proposal cut-off value using cluster analysis.

Clin Rheumatol. 2024 Jan;43(1):465-472. doi: 10.1007/s10067-023-06749-8. Epub 2023 Aug 28.

Characterisation of Temporal Patterns in Step Count Behaviour from Smartphone App Data: An Unsupervised Machine Learning Approach.

Int J Environ Res Public Health. 2021 Oct 31;18(21):11476. doi: 10.3390/ijerph182111476.

Evaluating the Impact of a Two-Stage Multivariate Data Cleansing Approach to Improve to the Performance of Machine Learning Classifiers: A Case Study in Human Activity Recognition.

Sensors (Basel). 2020 Mar 27;20(7):1858. doi: 10.3390/s20071858.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

FilterK：一种用于身体活动 k 均值聚类的新异常值检测方法。

FilterK: A new outlier detection method for k-means clustering of physical activity.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译