• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

电子健康记录中纵向临床测量的无监督聚类

Unsupervised clustering of longitudinal clinical measurements in electronic health records.

作者信息

Mariam Arshiya, Javidi Hamed, Zabor Emily C, Zhao Ran, Radivoyevitch Tomas, Rotroff Daniel M

机构信息

Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America.

Center for Quantitative Metabolic Research, Cleveland Clinic, Cleveland, Ohio, United States of America.

出版信息

PLOS Digit Health. 2024 Oct 15;3(10):e0000628. doi: 10.1371/journal.pdig.0000628. eCollection 2024 Oct.

DOI:10.1371/journal.pdig.0000628
PMID:39405315
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11478862/
Abstract

Longitudinal electronic health records (EHR) can be utilized to identify patterns of disease development and progression in real-world settings. Unsupervised temporal matching algorithms are being repurposed to EHR from signal processing- and protein-sequence alignment tasks where they have shown immense promise for gaining insight into disease. The robustness of these algorithms for classifying EHR clinical data remains to be determined. Timeseries compiled from clinical measurements, such as blood pressure, have far more irregularity in sampling and missingness than the data for which these algorithms were developed, necessitating a systematic evaluation of these methods. We applied 30 state-of-the-art unsupervised machine learning algorithms to 6,912 systematically generated simulated clinical datasets across five parameters. These algorithms included eight temporal matching algorithms with fourteen partitional and eight fuzzy clustering methods. Nemenyi tests were used to determine differences in accuracy using the Adjusted Rand Index (ARI). Dynamic time warping and its lower-bound variants had the highest accuracies across all cohorts (median ARI>0.70). All 30 methods were better at discriminating classes with differences in magnitude compared to differences in trajectory shapes. Missingness impacted accuracies only when classes were different by trajectory shape. The method with the highest ARI was then used to cluster a large pediatric metabolic syndrome (MetS) cohort (N = 43,426). We identified three unique childhood BMI patterns with high average cluster consensus (>70%). The algorithm identified a cluster with consistently high BMI which had the greatest risk of MetS, consistent with prior literature (OR = 4.87, 95% CI: 3.93-6.12). While these algorithms have been shown to have similar accuracies for regular timeseries, their accuracies in clinical applications vary substantially in discriminating differences in shape and especially with moderate to high missingness (>10%). This systematic assessment also shows that the most robust algorithms tested here can derive meaningful insights from longitudinal clinical data.

摘要

纵向电子健康记录(EHR)可用于识别现实环境中疾病发展和进展的模式。无监督时间匹配算法正从信号处理和蛋白质序列比对任务中被重新应用于EHR,在这些任务中,它们已显示出在洞察疾病方面的巨大潜力。这些算法对EHR临床数据进行分类的稳健性仍有待确定。从临床测量(如血压)汇编的时间序列在采样和缺失方面比开发这些算法所使用的数据具有更多的不规则性,因此需要对这些方法进行系统评估。我们将30种先进的无监督机器学习算法应用于6912个系统生成的、跨越五个参数的模拟临床数据集。这些算法包括八种时间匹配算法以及十四种划分方法和八种模糊聚类方法。使用Nemenyi检验通过调整兰德指数(ARI)来确定准确性的差异。动态时间规整及其下限变体在所有队列中具有最高的准确率(中位数ARI>0.70)。与轨迹形状的差异相比,所有30种方法在区分幅度差异的类别方面表现更好。只有当类别在轨迹形状上不同时,缺失才会影响准确率。然后使用ARI最高的方法对一个大型儿科代谢综合征(MetS)队列(N = 43426)进行聚类。我们确定了三种独特的儿童BMI模式,平均聚类一致性较高(>70%)。该算法识别出一个BMI持续较高的聚类,其患MetS的风险最大,这与先前的文献一致(OR = 4.87,95% CI:3.93 - 6.12)。虽然这些算法已被证明在处理常规时间序列时具有相似的准确率,但它们在临床应用中区分形状差异的准确率差异很大,尤其是在存在中度到高度缺失(>10%)的情况下。这种系统评估还表明,这里测试的最稳健的算法可以从纵向临床数据中得出有意义的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/dab7c25dcac1/pdig.0000628.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/121bac0dcd9f/pdig.0000628.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/b30816790773/pdig.0000628.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/93a49ab89322/pdig.0000628.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/fb644afb5a16/pdig.0000628.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/dab7c25dcac1/pdig.0000628.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/121bac0dcd9f/pdig.0000628.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/b30816790773/pdig.0000628.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/93a49ab89322/pdig.0000628.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/fb644afb5a16/pdig.0000628.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/11478862/dab7c25dcac1/pdig.0000628.g005.jpg

相似文献

1
Unsupervised clustering of longitudinal clinical measurements in electronic health records.电子健康记录中纵向临床测量的无监督聚类
PLOS Digit Health. 2024 Oct 15;3(10):e0000628. doi: 10.1371/journal.pdig.0000628. eCollection 2024 Oct.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Implementation and evaluation of a multivariate abstraction-based, interval-based dynamic time-warping method as a similarity measure for longitudinal medical records.基于多元抽象和区间的动态时间规整方法的实现和评估,作为一种用于纵向医疗记录的相似性度量方法。
J Biomed Inform. 2021 Nov;123:103919. doi: 10.1016/j.jbi.2021.103919. Epub 2021 Oct 8.
4
Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择
Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.
5
Simulation-derived best practices for clustering clinical data.基于模拟的临床数据聚类最佳实践。
J Biomed Inform. 2021 Jun;118:103788. doi: 10.1016/j.jbi.2021.103788. Epub 2021 Apr 20.
6
7
A clustering approach for detecting implausible observation values in electronic health records data.一种用于检测电子健康记录数据中不合理观测值的聚类方法。
BMC Med Inform Decis Mak. 2019 Jul 23;19(1):142. doi: 10.1186/s12911-019-0852-6.
8
In simulated data and health records, latent class analysis was the optimum multimorbidity clustering algorithm.在模拟数据和健康记录中,潜在类别分析是最优的多病种聚类算法。
J Clin Epidemiol. 2022 Dec;152:164-175. doi: 10.1016/j.jclinepi.2022.10.011. Epub 2022 Oct 11.
9
Digging for Significant Genes in Microarray Expression Data Based on Systematic Sampling and Hierarchal Clustering Algorithm.基于系统抽样和层次聚类算法的基因芯片表达数据中显著基因的挖掘。
Adv Exp Med Biol. 2021;1338:1-6. doi: 10.1007/978-3-030-78775-2_1.
10
Identification of robust deep neural network models of longitudinal clinical measurements.纵向临床测量的稳健深度神经网络模型的识别。
NPJ Digit Med. 2022 Jul 27;5(1):106. doi: 10.1038/s41746-022-00651-4.

本文引用的文献

1
An interpretable predictive deep learning platform for pediatric metabolic diseases.一个可解释的预测性深度学习平台,用于儿科代谢疾病。
J Am Med Inform Assoc. 2024 May 20;31(6):1227-1238. doi: 10.1093/jamia/ocae049.
2
Machine Learning Approach for Metabolic Syndrome Diagnosis Using Explainable Data-Augmentation-Based Classification.基于可解释数据增强分类的代谢综合征诊断机器学习方法
Diagnostics (Basel). 2022 Dec 10;12(12):3117. doi: 10.3390/diagnostics12123117.
3
Phenotypic prevalence of obesity and metabolic syndrome among an underdiagnosed and underscreened population of over 50 million children and adults.
在超过5000万未被充分诊断和筛查的儿童及成人人群中,肥胖和代谢综合征的表型患病率。
Front Genet. 2022 Sep 6;13:961116. doi: 10.3389/fgene.2022.961116. eCollection 2022.
4
Identification of robust deep neural network models of longitudinal clinical measurements.纵向临床测量的稳健深度神经网络模型的识别。
NPJ Digit Med. 2022 Jul 27;5(1):106. doi: 10.1038/s41746-022-00651-4.
5
A Type 2 Diabetes Subtype Responsive to ACCORD Intensive Glycemia Treatment.对 ACCORD 强化血糖治疗有反应的 2 型糖尿病亚型。
Diabetes Care. 2021 Jun;44(6):1410-1418. doi: 10.2337/dc20-2700. Epub 2021 Apr 16.
6
Understanding personalized dynamics to inform precision medicine: a dynamic time warp analysis of 255 depressed inpatients.理解个性化动态以推动精准医学:对 255 名抑郁住院患者的动态时间 warp 分析。
BMC Med. 2020 Dec 23;18(1):400. doi: 10.1186/s12916-020-01867-5.
7
Obesity as the Main Risk Factor for Metabolic Syndrome in Children.肥胖是儿童代谢综合征的主要危险因素。
Front Endocrinol (Lausanne). 2019 Aug 16;10:568. doi: 10.3389/fendo.2019.00568. eCollection 2019.
8
The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances.伟大的时间序列分类竞赛:对近期算法进展的综述与实验评估
Data Min Knowl Discov. 2017;31(3):606-660. doi: 10.1007/s10618-016-0483-9. Epub 2016 Nov 23.
9
Metabolic Syndrome Prediction Using Machine Learning Models with Genetic and Clinical Information from a Nonobese Healthy Population.使用具有来自非肥胖健康人群的遗传和临床信息的机器学习模型预测代谢综合征
Genomics Inform. 2018 Dec;16(4):e31. doi: 10.5808/GI.2018.16.4.e31. Epub 2018 Dec 28.
10
Identifying temporal patterns in patient disease trajectories using dynamic time warping: A population-based study.使用动态时间规整识别患者疾病轨迹中的时间模式:基于人群的研究。
Sci Rep. 2018 Mar 9;8(1):4216. doi: 10.1038/s41598-018-22578-1.