聚类MLD：一种用于多变量纵向数据的高效层次聚类方法。

clusterMLD: An Efficient Hierarchical Clustering Method for Multivariate Longitudinal Data.

作者信息

Zhou Junyi, Zhang Ying, Tu Wanzhu

机构信息

Department of Biostatistics and Health Data Science, Indiana University.

Department of Biostatistics, University of Nebraska Medical Center.

出版信息

J Comput Graph Stat. 2023;32(3):1131-1144. doi: 10.1080/10618600.2022.2149540. Epub 2023 Jan 12.

DOI:10.1080/10618600.2022.2149540

PMID:37859643

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10584088/

Abstract

Longitudinal data clustering is challenging because the grouping has to account for the similarity of individual trajectories in the presence of sparse and irregular times of observation. This paper puts forward a hierarchical agglomerative clustering method based on a dissimilarity metric that quantifies the cost of merging two distinct groups of curves, which are depicted by -splines for the repeatedly measured data. Extensive simulations show that the proposed method has superior performance in determining the number of clusters, classifying individuals into the correct clusters, and in computational efficiency. Importantly, the method is not only suitable for clustering multivariate longitudinal data with sparse and irregular measurements but also for intensely measured functional data. Towards this end, we provide an R package for the implementation of such analyses. To illustrate the use of the proposed clustering method, two large clinical data sets from real-world clinical studies are analyzed.

摘要

纵向数据聚类具有挑战性，因为在观测时间稀疏且不规则的情况下进行分组时，必须考虑个体轨迹的相似性。本文提出了一种基于差异度量的层次凝聚聚类方法，该差异度量量化了合并两组不同曲线的成本，对于重复测量的数据，这些曲线由样条表示。大量模拟表明，该方法在确定聚类数量、将个体正确分类到聚类中以及计算效率方面具有卓越性能。重要的是，该方法不仅适用于对具有稀疏和不规则测量的多变量纵向数据进行聚类，也适用于密集测量的函数型数据。为此，我们提供了一个用于实现此类分析的R包。为了说明所提出聚类方法的使用，我们分析了来自真实世界临床研究的两个大型临床数据集。

相似文献

clusterMLD: An Efficient Hierarchical Clustering Method for Multivariate Longitudinal Data.聚类MLD：一种用于多变量纵向数据的高效层次聚类方法。

J Comput Graph Stat. 2023;32(3):1131-1144. doi: 10.1080/10618600.2022.2149540. Epub 2023 Jan 12.

Stat Methods Med Res. 2018 Nov;27(11):3492-3504. doi: 10.1177/0962280217710050. Epub 2017 May 24.

Resolving the structure of interactomes with hierarchical agglomerative clustering.利用层次凝聚聚类解析互作组学结构。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S44. doi: 10.1186/1471-2105-12-S1-S44.

Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。

BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.

Method for Determining the Optimal Number of Clusters Based on Agglomerative Hierarchical Clustering.基于凝聚层次聚类的最佳聚类数确定方法。

IEEE Trans Neural Netw Learn Syst. 2017 Dec;28(12):3007-3017. doi: 10.1109/TNNLS.2016.2608001. Epub 2016 Oct 5.

The Choice of an Appropriate Information Dissimilarity Measure for Hierarchical Clustering of River Streamflow Time Series, Based on Calculated Lyapunov Exponent and Kolmogorov Measures.基于计算的李雅普诺夫指数和柯尔莫哥洛夫测度，为河川径流时间序列层次聚类选择合适的信息差异测度

Entropy (Basel). 2019 Feb 23;21(2):215. doi: 10.3390/e21020215.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Grouping of variables to facilitate statistical disclosure limitation methods in multivariate data sets.对变量进行分组，以促进多变量数据集中统计披露限制方法的应用。

Priv Stat Databases. 2018 Jan;?. doi: 10.1007/978-3-319-99771-1_13.

Implementation and evaluation of a multivariate abstraction-based, interval-based dynamic time-warping method as a similarity measure for longitudinal medical records.基于多元抽象和区间的动态时间规整方法的实现和评估，作为一种用于纵向医疗记录的相似性度量方法。

J Biomed Inform. 2021 Nov;123:103919. doi: 10.1016/j.jbi.2021.103919. Epub 2021 Oct 8.

Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines.找到一种合适的方程来衡量二进制向量之间的相似性：印度尼西亚和日本草药的案例研究。

BMC Bioinformatics. 2016 Dec 7;17(1):520. doi: 10.1186/s12859-016-1392-z.

引用本文的文献

Multivariate longitudinal clustering reveals neuropsychological factors as dementia predictors in an Alzheimer's disease progression study.多变量纵向聚类分析揭示了在一项阿尔茨海默病进展研究中，神经心理学因素可作为痴呆症的预测指标。

BioData Min. 2025 Mar 28;18(1):26. doi: 10.1186/s13040-025-00441-0.

本文引用的文献

Huntington's Disease Clinical Trials Corner: January 2019.亨廷顿舞蹈症临床试验聚焦：2019年1月

J Huntingtons Dis. 2019;8(1):115-125. doi: 10.3233/JHD-190001.

Sample enrichment for clinical trials to show delay of onset in huntington disease.用于临床试验的样本富集，以显示亨廷顿病发病延迟。

Mov Disord. 2019 Feb;34(2):274-280. doi: 10.1002/mds.27595. Epub 2019 Jan 14.

A Randomized Trial of Intensive versus Standard Blood-Pressure Control.强化与标准血压控制的随机试验

N Engl J Med. 2015 Nov 26;373(22):2103-16. doi: 10.1056/NEJMoa1511939. Epub 2015 Nov 9.

Multivariate prediction of motor diagnosis in Huntington's disease: 12 years of PREDICT-HD.亨廷顿舞蹈病运动诊断的多变量预测：PREDICT-HD研究的12年随访

Mov Disord. 2015 Oct;30(12):1664-72. doi: 10.1002/mds.26364. Epub 2015 Sep 4.

Prediction of manifest Huntington's disease with clinical and imaging measures: a prospective observational study.利用临床和影像学指标预测明显亨廷顿病：一项前瞻性观察研究。

Lancet Neurol. 2014 Dec;13(12):1193-201. doi: 10.1016/S1474-4422(14)70238-8. Epub 2014 Nov 3.

Cognitive decline in prodromal Huntington Disease: implications for clinical trials.前驱期亨廷顿病的认知能力下降：对临床试验的影响。

J Neurol Neurosurg Psychiatry. 2013 Nov;84(11):1233-9. doi: 10.1136/jnnp-2013-305114. Epub 2013 Aug 2.

Wavelet-based clustering for mixed-effects functional models in high dimension.基于小波的高维混合效应功能模型聚类

Biometrics. 2013 Mar;69(1):31-40. doi: 10.1111/j.1541-0420.2012.01828.x. Epub 2013 Feb 4.

Cognitive domains that predict time to diagnosis in prodromal Huntington disease.预测亨廷顿病前驱期诊断时间的认知领域。

J Neurol Neurosurg Psychiatry. 2012 Jun;83(6):612-9. doi: 10.1136/jnnp-2011-301732. Epub 2012 Mar 26.

Indexing disease progression at study entry with individuals at-risk for Huntington disease.在研究开始时对亨廷顿病高危个体进行疾病进展的指标检测。

Am J Med Genet B Neuropsychiatr Genet. 2011 Dec;156B(7):751-63. doi: 10.1002/ajmg.b.31232. Epub 2011 Aug 19.

KmL: a package to cluster longitudinal data.KmL：用于聚类纵向数据的软件包。

Comput Methods Programs Biomed. 2011 Dec;104(3):e112-21. doi: 10.1016/j.cmpb.2011.05.008. Epub 2011 Jun 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验