• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

鲁比克:用于健康数据分析的知识引导张量分解与补全

Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics.

作者信息

Wang Yichen, Chen Robert, Ghosh Joydeep, Denny Joshua C, Kho Abel, Chen You, Malin Bradley A, Sun Jimeng

机构信息

Georgia Institute of Technology.

University of Texas, Austin.

出版信息

KDD. 2015 Aug;2015:1265-1274. doi: 10.1145/2783258.2783395.

DOI:10.1145/2783258.2783395
PMID:31452969
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6709413/
Abstract

Computational phenotyping is the process of converting heterogeneous electronic health records (EHRs) into meaningful clinical concepts. Unsupervised phenotyping methods have the potential to leverage a vast amount of labeled EHR data for phenotype discovery. However, existing unsupervised phenotyping methods do not incorporate current medical knowledge and cannot directly handle missing, or noisy data. We propose Rubik, a constrained non-negative tensor factorization and completion method for phenotyping. Rubik incorporates 1) guidance constraints to align with existing medical knowledge, and 2) pairwise constraints for obtaining distinct, non-overlapping phenotypes. Rubik also has built-in tensor completion that can significantly alleviate the impact of noisy and missing data. We utilize the Alternating Direction Method of Multipliers (ADMM) framework to tensor factorization and completion, which can be easily scaled through parallel computing. We evaluate Rubik on two EHR datasets, one of which contains 647,118 records for 7,744 patients from an outpatient clinic, the other of which is a public dataset containing 1,018,614 CMS claims records for 472,645 patients. Our results show that Rubik can discover more meaningful and distinct phenotypes than the baselines. In particular, by using knowledge guidance constraints, Rubik can also discover sub-phenotypes for several major diseases. Rubik also runs around seven times faster than current state-of-the-art tensor methods. Finally, Rubik is scalable to large datasets containing millions of EHR records.

摘要

计算表型分析是将异构电子健康记录(EHR)转换为有意义的临床概念的过程。无监督表型分析方法有潜力利用大量带标签的EHR数据进行表型发现。然而,现有的无监督表型分析方法没有纳入当前医学知识,并且无法直接处理缺失或有噪声的数据。我们提出了Rubik,一种用于表型分析的约束非负张量分解与补全方法。Rubik纳入了1)指导约束以与现有医学知识对齐,以及2)成对约束以获得不同的、不重叠的表型。Rubik还具有内置的张量补全功能,可显著减轻噪声和缺失数据的影响。我们利用交替方向乘子法(ADMM)框架进行张量分解与补全,该框架可通过并行计算轻松扩展。我们在两个EHR数据集上评估了Rubik,其中一个包含来自门诊诊所的7744名患者的647118条记录,另一个是包含472645名患者的1018614条CMS理赔记录的公共数据集。我们的结果表明,Rubik能比基线方法发现更有意义且不同的表型。特别是,通过使用知识指导约束,Rubik还能发现几种主要疾病的亚表型。Rubik的运行速度也比当前最先进的张量方法快约七倍。最后,Rubik可扩展到包含数百万条EHR记录的大型数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/9155fb1b2964/nihms-1046935-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/c6cad09e80c8/nihms-1046935-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/f3427434c15a/nihms-1046935-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/d6a3879be3cc/nihms-1046935-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/2a04f0cb5ac0/nihms-1046935-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/15befd2dad2f/nihms-1046935-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/4a667b97598a/nihms-1046935-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/db8848741a93/nihms-1046935-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/eca737b991a7/nihms-1046935-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/9155fb1b2964/nihms-1046935-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/c6cad09e80c8/nihms-1046935-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/f3427434c15a/nihms-1046935-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/d6a3879be3cc/nihms-1046935-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/2a04f0cb5ac0/nihms-1046935-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/15befd2dad2f/nihms-1046935-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/4a667b97598a/nihms-1046935-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/db8848741a93/nihms-1046935-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/eca737b991a7/nihms-1046935-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7929/6709413/9155fb1b2964/nihms-1046935-f0009.jpg

相似文献

1
Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics.鲁比克:用于健康数据分析的知识引导张量分解与补全
KDD. 2015 Aug;2015:1265-1274. doi: 10.1145/2783258.2783395.
2
COPA: Constrained PARAFAC2 for Sparse & Large Datasets.COPA:用于稀疏和大型数据集的约束PARAFAC2
Proc ACM Int Conf Inf Knowl Manag. 2018 Oct;2018:793-802. doi: 10.1145/3269206.3271775.
3
Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis.用于协作式健康数据分析的隐私保护张量分解
Proc ACM Int Conf Inf Knowl Manag. 2019 Nov;2019:1291-1300. doi: 10.1145/3357384.3357878.
4
Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization.通过 PARAFAC2 张量分解对医学上复杂的儿童进行时间表型分析。
J Biomed Inform. 2019 May;93:103125. doi: 10.1016/j.jbi.2019.103125. Epub 2019 Feb 8.
5
Federated Tensor Factorization for Computational Phenotyping.用于计算表型分析的联邦张量分解
KDD. 2017 Aug;2017:887-895. doi: 10.1145/3097983.3098118.
6
Communication Efficient Tensor Factorization for Decentralized Healthcare Networks.用于分散式医疗网络的通信高效张量分解
Proc IEEE Int Conf Data Min. 2021 Dec;2021:1216-1221. doi: 10.1109/icdm51629.2021.00147. Epub 2022 Jan 24.
7
TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records.TASTE:用于电子健康记录表型分析的时间和静态张量分解
Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:193-203. doi: 10.1145/3368555.3384464.
8
Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics.用于协作式健康数据分析的通信高效联邦广义张量分解
Proc Int World Wide Web Conf. 2021 Apr;2021:171-182. doi: 10.1145/3442381.3449832.
9
Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: Cardiovascular disease case study.基于电子健康记录的张量分解检测时变表型主题:心血管疾病案例研究。
J Biomed Inform. 2019 Oct;98:103270. doi: 10.1016/j.jbi.2019.103270. Epub 2019 Aug 22.
10
SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping.SUSTain:张量的可扩展无监督评分及其在表型分析中的应用。
KDD. 2018 Jul;2018:2080-2089. doi: 10.1145/3219819.3219999.

引用本文的文献

1
MULTIPAR: Supervised Irregular Tensor Factorization with Multi-task Learning for Computational Phenotyping.MULTIPAR:用于计算表型分析的多任务学习监督不规则张量分解
Proc Mach Learn Res. 2023 Dec;225:498-511.
2
MASMDDI: multi-layer adaptive soft-mask graph neural network for drug-drug interaction prediction.MASMDDI:用于药物相互作用预测的多层自适应软掩码图神经网络
Front Pharmacol. 2024 May 20;15:1369403. doi: 10.3389/fphar.2024.1369403. eCollection 2024.
3
Creating High-Quality Synthetic Health Data: Framework for Model Development and Validation.

本文引用的文献

1
Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200×.Turbo-SMT:将耦合稀疏矩阵-张量分解加速200倍。
Proc SIAM Int Conf Data Min. 2014;2014:118-126. doi: 10.1137/1.9781611973440.14.
2
Limestone: high-throughput candidate phenotype generation via tensor factorization.石灰岩:通过张量分解进行高通量候选表型生成。
J Biomed Inform. 2014 Dec;52:199-211. doi: 10.1016/j.jbi.2014.07.001. Epub 2014 Jul 16.
3
Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.电子健康记录驱动的表型分析:挑战、最新进展与展望
创建高质量合成健康数据:模型开发与验证框架。
JMIR Form Res. 2024 Apr 22;8:e53241. doi: 10.2196/53241.
4
Communication Efficient Tensor Factorization for Decentralized Healthcare Networks.用于分散式医疗网络的通信高效张量分解
Proc IEEE Int Conf Data Min. 2021 Dec;2021:1216-1221. doi: 10.1109/icdm51629.2021.00147. Epub 2022 Jan 24.
5
CCPE: cell cycle pseudotime estimation for single cell RNA-seq data.CCPE:用于单细胞 RNA-seq 数据的细胞周期伪时间估计。
Nucleic Acids Res. 2022 Jan 25;50(2):704-716. doi: 10.1093/nar/gkab1236.
6
Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records.Phe2vec:基于电子健康记录的无监督嵌入进行自动疾病表型分析。
Patterns (N Y). 2021 Sep 2;2(9):100337. doi: 10.1016/j.patter.2021.100337. eCollection 2021 Sep 10.
7
Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics.用于协作式健康数据分析的通信高效联邦广义张量分解
Proc Int World Wide Web Conf. 2021 Apr;2021:171-182. doi: 10.1145/3442381.3449832.
8
SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping.SUSTain:张量的可扩展无监督评分及其在表型分析中的应用。
KDD. 2018 Jul;2018:2080-2089. doi: 10.1145/3219819.3219999.
9
COPA: Constrained PARAFAC2 for Sparse & Large Datasets.COPA:用于稀疏和大型数据集的约束PARAFAC2
Proc ACM Int Conf Inf Knowl Manag. 2018 Oct;2018:793-802. doi: 10.1145/3269206.3271775.
10
Untangling the complexity of multimorbidity with machine learning.运用机器学习厘清多种共病的复杂性。
Mech Ageing Dev. 2020 Sep;190:111325. doi: 10.1016/j.mad.2020.111325. Epub 2020 Aug 6.
J Am Med Inform Assoc. 2013 Dec;20(e2):e206-11. doi: 10.1136/amiajnl-2013-002428.
4
Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data.系统比较电子病历数据的表型全基因组关联研究和全基因组关联研究数据。
Nat Biotechnol. 2013 Dec;31(12):1102-10. doi: 10.1038/nbt.2749.
5
Applying active learning to high-throughput phenotyping algorithms for electronic health records data.将主动学习应用于电子健康记录数据的高通量表型算法。
J Am Med Inform Assoc. 2013 Dec;20(e2):e253-9. doi: 10.1136/amiajnl-2013-001945. Epub 2013 Jul 13.
6
Next-generation phenotyping of electronic health records.电子健康记录的下一代表型分析。
J Am Med Inform Assoc. 2013 Jan 1;20(1):117-21. doi: 10.1136/amiajnl-2012-001145. Epub 2012 Sep 6.
7
Tensor completion for estimating missing values in visual data.张量完成在视觉数据中估计缺失值。
IEEE Trans Pattern Anal Mach Intell. 2013 Jan;35(1):208-20. doi: 10.1109/TPAMI.2012.39.
8
Electronic medical records for genetic research: results of the eMERGE consortium.电子病历用于基因研究:eMERGE 联盟的研究结果。
Sci Transl Med. 2011 Apr 20;3(79):79re1. doi: 10.1126/scitranslmed.3001807.
9
PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations.表型-全基因组关联研究:探索表型-全基因组关联研究发现基因-疾病关联的可行性。
Bioinformatics. 2010 May 1;26(9):1205-10. doi: 10.1093/bioinformatics/btq126. Epub 2010 Mar 24.