• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用隐马尔可夫模型对多变量时间序列进行聚类。

Clustering multivariate time series using Hidden Markov Models.

作者信息

Ghassempour Shima, Girosi Federico, Maeder Anthony

机构信息

School of Computing, Engineering and Mathematics, University of Western Sydney, Campbelltown, NSW 2751 , Australia.

Centre for Health Research, University of Western Sydney, Campbelltown, NSW 2751 , Australia.

出版信息

Int J Environ Res Public Health. 2014 Mar 6;11(3):2741-63. doi: 10.3390/ijerph110302741.

DOI:10.3390/ijerph110302741
PMID:24662996
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3968966/
Abstract

In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers.

摘要

在本文中,我们描述了一种用于对多变量时间序列进行聚类的算法,这些时间序列中的变量同时包含分类值和连续值。这种类型的时间序列在医疗保健领域很常见,它们代表了个体的健康轨迹。该问题具有挑战性,因为分类变量使得难以定义轨迹之间有意义的距离。我们提出了一种基于隐马尔可夫模型(HMM)的方法,首先将每个轨迹映射到一个HMM中,然后定义HMM之间合适的距离,最后使用基于距离矩阵的方法对HMM进行聚类。我们在一个模拟但现实的数据集上测试了我们的方法,该数据集包含1255条45岁及以上个体的轨迹,在一个具有已知聚类结构的合成验证集上,以及在从纵向健康与退休调查中提取的268条轨迹的较小数据集上。所提出的方法可以使用R和Matlab中的标准包非常简单地实现,并且可能是使用不需要高级统计知识的工具来解决具有分类变量的多变量时间序列聚类难题的一个很好的候选方法,因此广大研究人员都可以使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/13bd8524d81d/ijerph-11-02741-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/a9088e35cbbe/ijerph-11-02741-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/d0820a176b0a/ijerph-11-02741-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/66cdded28ec9/ijerph-11-02741-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/3b0cbc9eb1e8/ijerph-11-02741-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/13bd8524d81d/ijerph-11-02741-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/a9088e35cbbe/ijerph-11-02741-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/d0820a176b0a/ijerph-11-02741-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/66cdded28ec9/ijerph-11-02741-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/3b0cbc9eb1e8/ijerph-11-02741-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c56/3968966/13bd8524d81d/ijerph-11-02741-g005.jpg

相似文献

1
Clustering multivariate time series using Hidden Markov Models.使用隐马尔可夫模型对多变量时间序列进行聚类。
Int J Environ Res Public Health. 2014 Mar 6;11(3):2741-63. doi: 10.3390/ijerph110302741.
2
Towards Unsupervised Detection of Process Models in Healthcare.迈向医疗保健领域过程模型的无监督检测
Stud Health Technol Inform. 2018;247:381-385.
3
A mixed non-homogeneous hidden Markov model for categorical data, with application to alcohol consumption.一种用于分类数据的混合非齐次隐马尔可夫模型及其在酒精消费中的应用。
Stat Med. 2012 Apr 30;31(9):871-86. doi: 10.1002/sim.4478. Epub 2012 Feb 3.
4
Semisupervised learning of hidden Markov models via a homotopy method.通过同伦方法对隐马尔可夫模型进行半监督学习。
IEEE Trans Pattern Anal Mach Intell. 2009 Feb;31(2):275-87. doi: 10.1109/TPAMI.2008.71.
5
Sparsely correlated hidden Markov models with application to genome-wide location studies.稀疏相关隐马尔可夫模型及其在全基因组定位研究中的应用。
Bioinformatics. 2013 Mar 1;29(5):533-41. doi: 10.1093/bioinformatics/btt012. Epub 2013 Jan 16.
6
Modeling Movement Primitives with Hidden Markov Models for Robotic and Biomedical Applications.用于机器人和生物医学应用的基于隐马尔可夫模型的运动基元建模
Methods Mol Biol. 2017;1552:199-213. doi: 10.1007/978-1-4939-6753-7_15.
7
Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory.在线性内存中实现隐马尔可夫模型的期望最大化(EM)算法和维特比(Viterbi)算法。
BMC Bioinformatics. 2008 Apr 30;9:224. doi: 10.1186/1471-2105-9-224.
8
Objective classification of latent behavioral states in bio-logging data using multivariate-normal hidden Markov models.使用多元正态隐马尔可夫模型对生物标记数据中的潜在行为状态进行目标分类。
Ecol Appl. 2015 Jul;25(5):1244-58. doi: 10.1890/14-0862.1.
9
A latent topic model with Markov transition for process data.具有马尔可夫转换的过程数据潜在主题模型。
Br J Math Stat Psychol. 2020 Nov;73(3):474-505. doi: 10.1111/bmsp.12197. Epub 2020 Jan 8.
10
Handling underlying discrete variables with bivariate mixed hidden Markov models in NONMEM.在 NONMEM 中使用双变量混合隐马尔可夫模型处理潜在离散变量。
J Pharmacokinet Pharmacodyn. 2019 Dec;46(6):591-604. doi: 10.1007/s10928-019-09658-z. Epub 2019 Oct 26.

引用本文的文献

1
MD-Former: Multiscale Dual Branch Transformer for Multivariate Time Series Classification.MD-Former:用于多元时间序列分类的多尺度双分支Transformer
Sensors (Basel). 2025 Feb 28;25(5):1487. doi: 10.3390/s25051487.
2
Increasing the accuracy of single-molecule data analysis using tMAVEN.使用tMAVEN提高单分子数据分析的准确性。
Biophys J. 2024 Sep 3;123(17):2765-2780. doi: 10.1016/j.bpj.2024.01.022. Epub 2024 Jan 24.
3
XGSleeve: detecting sleeve incidents in well completion by using XGBoost classifier.XGSleeve:使用XGBoost分类器检测完井过程中的套管事故。

本文引用的文献

1
A cluster separation measure.一种聚类分离度量。
IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224-7.
2
Clustering metagenomic sequences with interpolated Markov models.基于内插马尔可夫模型的宏基因组序列聚类。
BMC Bioinformatics. 2010 Nov 2;11:544. doi: 10.1186/1471-2105-11-544.
3
A new distance measure for model-based sequence clustering.一种用于基于模型的序列聚类的新距离度量。
Front Artif Intell. 2023 Sep 13;6:1243584. doi: 10.3389/frai.2023.1243584. eCollection 2023.
4
Increasing the accuracy of single-molecule data analysis using tMAVEN.使用tMAVEN提高单分子数据分析的准确性。
bioRxiv. 2024 Jan 21:2023.08.15.553409. doi: 10.1101/2023.08.15.553409.
5
Effect of LSD and music on the time-varying brain dynamics.LSD 和音乐对时变脑动力学的影响。
Psychopharmacology (Berl). 2023 Jul;240(7):1601-1614. doi: 10.1007/s00213-023-06394-8. Epub 2023 Jun 9.
6
FedBranched: Leveraging Federated Learning for Anomaly-Aware Load Forecasting in Energy Networks.FedBranched:利用联邦学习实现能源网络中的异常感知负荷预测。
Sensors (Basel). 2023 Mar 29;23(7):3570. doi: 10.3390/s23073570.
7
Analyzing Patient Trajectories With Artificial Intelligence.利用人工智能分析患者轨迹。
J Med Internet Res. 2021 Dec 3;23(12):e29812. doi: 10.2196/29812.
8
Novel Features for Binary Time Series Based on Branch Length Similarity Entropy.基于分支长度相似性熵的二元时间序列新特征
Entropy (Basel). 2021 Apr 18;23(4):480. doi: 10.3390/e23040480.
9
Utilization of Time Series Tools in Life-sciences and Neuroscience.时间序列工具在生命科学和神经科学中的应用。
Neurosci Insights. 2020 Dec 8;15:2633105520963045. doi: 10.1177/2633105520963045. eCollection 2020.
10
Differences in Driving Intention Transitions Caused by Driver's Emotion Evolutions.驾驶员情绪演变引起的驾驶意图转变差异。
Int J Environ Res Public Health. 2020 Sep 23;17(19):6962. doi: 10.3390/ijerph17196962.
IEEE Trans Pattern Anal Mach Intell. 2009 Jul;31(7):1325-31. doi: 10.1109/TPAMI.2008.268.
4
Nonstationary time series analysis by temporal clustering.基于时间聚类的非平稳时间序列分析
IEEE Trans Syst Man Cybern B Cybern. 2000;30(2):339-43. doi: 10.1109/3477.836381.
5
Cohort profile: the 45 and up study.队列简介:45岁及以上研究。
Int J Epidemiol. 2008 Oct;37(5):941-7. doi: 10.1093/ije/dym184. Epub 2007 Sep 19.
6
Investigating Hidden Markov Models' capabilities in 2D shape classification.研究隐马尔可夫模型在二维形状分类中的能力。
IEEE Trans Pattern Anal Mach Intell. 2004 Feb;26(2):281-6. doi: 10.1109/TPAMI.2004.1262200.
7
Using the Fisher kernel method to detect remote protein homologies.使用费舍尔核方法检测远程蛋白质同源性。
Proc Int Conf Intell Syst Mol Biol. 1999:149-58.