Suppr超能文献

用于加速度计数据身体活动簇无监督机器学习的特征选择——一项系统综述

Feature selection for unsupervised machine learning of accelerometer data physical activity clusters - A systematic review.

作者信息

Jones Petra J, Catt Mike, Davies Melanie J, Edwardson Charlotte L, Mirkes Evgeny M, Khunti Kamlesh, Yates Tom, Rowlands Alex V

机构信息

Leicester Diabetes Centre, University Hospitals of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK; Diabetes Research Centre, University of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK.

Population Health Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.

出版信息

Gait Posture. 2021 Oct;90:120-128. doi: 10.1016/j.gaitpost.2021.08.007. Epub 2021 Aug 13.

Abstract

BACKGROUND

Identifying clusters of physical activity (PA) from accelerometer data is important to identify levels of sedentary behaviour and physical activity associated with risks of serious health conditions and time spent engaging in healthy PA. Unsupervised machine learning models can capture PA in everyday free-living activity without the need for labelled data. However, there is scant research addressing the selection of features from accelerometer data. The aim of this systematic review is to summarise feature selection techniques applied in studies concerned with unsupervised machine learning of accelerometer-based device obtained physical activity, and to identify commonly used features identified through these techniques. Feature selection methods can reduce the complexity and computational burden of these models by removing less important features and assist in understanding the relative importance of feature sets and individual features in clustering.

METHOD

We conducted a systematic search of Pubmed, Medline, Google Scholar, Scopus, Arxiv and Web of Science databases to identify studies published before January 2021 which used feature selection methods to derive PA clusters using unsupervised machine learning models.

RESULTS

A total of 13 studies were eligible for inclusion within the review. The most popular feature selection techniques were Principal Component Analysis (PCA) and correlation-based methods, with k-means frequently used in clustering accelerometer data. Cluster quality evaluation methods were diverse, including both external (e.g. cluster purity) or internal evaluation measures (silhouette score most frequently). Only four of the 13 studies had more than 25 participants and only four studies included two or more datasets.

CONCLUSION

There is a need to assess multiple feature selection methods upon large cohort data consisting of multiple (3 or more) PA datasets. The cut-off criteria e.g. number of components, pairwise correlation value, explained variance ratio for PCA, etc. should be expressly stated along with any hyperparameters used in clustering.

摘要

背景

从加速度计数据中识别身体活动(PA)集群对于确定久坐行为水平以及与严重健康状况风险相关的身体活动,以及参与健康PA的时间非常重要。无监督机器学习模型可以在日常自由生活活动中捕捉PA,而无需标记数据。然而,针对从加速度计数据中选择特征的研究很少。本系统评价的目的是总结在基于加速度计设备获得的身体活动的无监督机器学习相关研究中应用的特征选择技术,并识别通过这些技术确定的常用特征。特征选择方法可以通过去除不太重要的特征来降低这些模型的复杂性和计算负担,并有助于理解特征集和单个特征在聚类中的相对重要性。

方法

我们对PubMed、Medline、谷歌学术、Scopus、Arxiv和科学网数据库进行了系统检索,以识别2021年1月之前发表的使用特征选择方法通过无监督机器学习模型得出PA集群的研究。

结果

共有13项研究符合纳入本评价的条件。最常用的特征选择技术是主成分分析(PCA)和基于相关性的方法,k均值常用于对加速度计数据进行聚类。聚类质量评估方法多种多样,包括外部(如聚类纯度)或内部评估指标(最常用轮廓系数)。13项研究中只有4项有超过25名参与者,只有4项研究包括两个或更多数据集。

结论

有必要在由多个(3个或更多)PA数据集组成的大型队列数据上评估多种特征选择方法。应明确说明截止标准,例如成分数量、成对相关值、PCA的解释方差比等,以及聚类中使用的任何超参数。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验