• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Ehr数据中的潜在结构:使用稀疏非负矩阵分解重建糖尿病标志物

Latent Structure in Ehr Data: Reconstruction of Diabetes Markers with Sparse NMF.

作者信息

Elhussein Ahmed, Hripcsak George

机构信息

Department of Biomedical Informatics, Columbia University, NY, USA.

New York Genome Center, NY, USA.

出版信息

medRxiv. 2025 Apr 1:2025.03.31.25324972. doi: 10.1101/2025.03.31.25324972.

DOI:10.1101/2025.03.31.25324972
PMID:40236392
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11998802/
Abstract

The dimensionality of electronic health record (EHR) data continues to grow as more clinical variables are recorded, often resulting in redundancy, sparsity, and analytical intractability. In this study, we apply non-negative matrix factorization (NMF) to a high-dimensional laboratory dataset of patients with type II diabetes to estimate the minimum latent dimensionality required to preserve clinically meaningful information. Using both within-patient imputation and across-patient generalization tasks, we evaluate the ability of the learned representations to reconstruct two key clinical lab values: blood glucose and HbA1c. Our findings show that clinically acceptable accuracy can be achieved with a dimensionality reduction of up to 80% and a dimensionality of 230 to 300, supporting the presence of a compact, low-dimensional latent structure underlying high-dimensional clinical data.

摘要

随着越来越多的临床变量被记录,电子健康记录(EHR)数据的维度持续增长,这常常导致冗余、稀疏性以及分析上的难处理性。在本研究中,我们将非负矩阵分解(NMF)应用于一个针对II型糖尿病患者的高维实验室数据集,以估计保留具有临床意义的信息所需的最小潜在维度。通过患者内插补和跨患者泛化任务,我们评估了学习到的表示重构两个关键临床实验室值(血糖和糖化血红蛋白)的能力。我们的研究结果表明,在维度降低高达80%且维度为230至300时,可以实现临床上可接受的准确性,这支持了高维临床数据背后存在紧凑的低维潜在结构。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1785/11998802/1d2c41e632f0/nihpp-2025.03.31.25324972v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1785/11998802/002d57b95dd2/nihpp-2025.03.31.25324972v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1785/11998802/1d2c41e632f0/nihpp-2025.03.31.25324972v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1785/11998802/002d57b95dd2/nihpp-2025.03.31.25324972v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1785/11998802/1d2c41e632f0/nihpp-2025.03.31.25324972v1-f0002.jpg

相似文献

1
Latent Structure in Ehr Data: Reconstruction of Diabetes Markers with Sparse NMF.Ehr数据中的潜在结构:使用稀疏非负矩阵分解重建糖尿病标志物
medRxiv. 2025 Apr 1:2025.03.31.25324972. doi: 10.1101/2025.03.31.25324972.
2
Interpretable dimensionality reduction and classification of mass spectrometry imaging data in a visceral pain model via non-negative matrix factorization.基于非负矩阵分解的内脏痛模型中质谱成像数据的可解释降维和分类。
PLoS One. 2024 Oct 10;19(10):e0300526. doi: 10.1371/journal.pone.0300526. eCollection 2024.
3
Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization.使用约束鲁棒非负矩阵分解对单细胞RNA测序数据进行降维
NAR Genom Bioinform. 2020 Aug 28;2(3):lqaa064. doi: 10.1093/nargab/lqaa064. eCollection 2020 Sep.
4
Reconstruction of Raman Spectra of Biochemical Mixtures Using Group and Basis Restricted Non-Negative Matrix Factorization.基于群组和基限制非负矩阵分解的生化混合物拉曼光谱重建。
Appl Spectrosc. 2023 Jul;77(7):698-709. doi: 10.1177/00037028231169971. Epub 2023 Apr 25.
5
Probabilistic non-negative matrix factorization: theory and application to microarray data analysis.概率非负矩阵分解:理论及其在微阵列数据分析中的应用
J Bioinform Comput Biol. 2014 Feb;12(1):1450001. doi: 10.1142/S0219720014500012. Epub 2014 Jan 9.
6
IDENTIFYING GENETIC ASSOCIATIONS WITH VARIABILITY IN METABOLIC HEALTH AND BLOOD COUNT LABORATORY VALUES: DIVING INTO THE QUANTITATIVE TRAITS BY LEVERAGING LONGITUDINAL DATA FROM AN EHR.识别与代谢健康和血细胞计数实验室值变异性相关的基因关联:利用电子健康记录中的纵向数据深入研究数量性状。
Pac Symp Biocomput. 2017;22:533-544. doi: 10.1142/9789813207813_0049.
7
Modeling electronic health record data using an end-to-end knowledge-graph-informed topic model.使用端到端知识图启发的主题模型对电子健康记录数据进行建模。
Sci Rep. 2022 Oct 25;12(1):17868. doi: 10.1038/s41598-022-22956-w.
8
Cortical Motion Perception Emerges from Dimensionality Reduction with Evolved Spike-Timing-Dependent Plasticity Rules.皮层运动知觉源自具有进化的尖峰时间依赖可塑性规则的降维处理。
J Neurosci. 2022 Jul 27;42(30):5882-5898. doi: 10.1523/JNEUROSCI.0384-22.2022. Epub 2022 Jun 22.
9
Structurally Incoherent Low-Rank Nonnegative Matrix Factorization for Image Classification.基于结构不连贯的低秩非负矩阵分解的图像分类。
IEEE Trans Image Process. 2018 Nov;27(11):5248-5260. doi: 10.1109/TIP.2018.2855433. Epub 2018 Jul 12.
10
Sparse Graph Regularization Non-Negative Matrix Factorization Based on Huber Loss Model for Cancer Data Analysis.基于Huber损失模型的稀疏图正则化非负矩阵分解用于癌症数据分析
Front Genet. 2019 Nov 20;10:1054. doi: 10.3389/fgene.2019.01054. eCollection 2019.

本文引用的文献

1
Predicting responsiveness to GLP-1 pathway drugs using real-world data.利用真实世界数据预测对胰高血糖素样肽-1(GLP-1)通路药物的反应性。
BMC Endocr Disord. 2024 Dec 18;24(1):269. doi: 10.1186/s12902-024-01798-9.
2
Revolutionizing healthcare: the role of artificial intelligence in clinical practice.人工智能在临床实践中的应用:医疗保健的革命。
BMC Med Educ. 2023 Sep 22;23(1):689. doi: 10.1186/s12909-023-04698-z.
3
Accurate prediction of HbA1c by continuous glucose monitoring using a kinetic model with patient-specific parameters for red blood cell lifespan and glucose uptake.
使用具有红细胞寿命和葡萄糖摄取患者特异性参数的动力学模型对 HbA1c 进行准确预测的连续血糖监测。
Diab Vasc Dis Res. 2021 May-Jun;18(3):14791641211013734. doi: 10.1177/14791641211013734.
4
Inferring multimodal latent topics from electronic health records.从电子健康记录中推断多模态潜在主题。
Nat Commun. 2020 May 21;11(1):2536. doi: 10.1038/s41467-020-16378-3.
5
An overview of clinical decision support systems: benefits, risks, and strategies for success.临床决策支持系统概述:益处、风险及成功策略。
NPJ Digit Med. 2020 Feb 6;3:17. doi: 10.1038/s41746-020-0221-y. eCollection 2020.
6
Evaluation of Precision and Bias Specifications Required to Achieve the 2018 FDA Guidance Criteria for Glucose Meter Performance Using Simulation Models.使用仿真模型评估达到 2018 年 FDA 血糖仪性能指导标准所需的精度和偏差规范。
J Diabetes Sci Technol. 2020 May;14(3):513-518. doi: 10.1177/1932296819889639. Epub 2019 Nov 21.
7
Toward a Quantitative Survey of Dimension Reduction Techniques.迈向降维技术的定量调查。
IEEE Trans Vis Comput Graph. 2021 Mar;27(3):2153-2173. doi: 10.1109/TVCG.2019.2944182. Epub 2021 Jan 28.
8
A comparison of methods for estimating the temporal change in a continuous variable: Example of HbA1c in patients with diabetes.比较连续变量时间变化估计方法:以糖尿病患者 HbA1c 为例。
Pharmacoepidemiol Drug Saf. 2017 Dec;26(12):1474-1482. doi: 10.1002/pds.4273. Epub 2017 Aug 15.
9
Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records.深度患者:一种从电子健康记录中预测患者未来的无监督表示。
Sci Rep. 2016 May 17;6:26094. doi: 10.1038/srep26094.
10
Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination.贝叶斯 CP 因子分解具有自动秩确定的不完全张量。
IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1751-63. doi: 10.1109/TPAMI.2015.2392756.