Suppr超能文献

鲁比克:用于健康数据分析的知识引导张量分解与补全

Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics.

作者信息

Wang Yichen, Chen Robert, Ghosh Joydeep, Denny Joshua C, Kho Abel, Chen You, Malin Bradley A, Sun Jimeng

机构信息

Georgia Institute of Technology.

University of Texas, Austin.

出版信息

KDD. 2015 Aug;2015:1265-1274. doi: 10.1145/2783258.2783395.

Abstract

Computational phenotyping is the process of converting heterogeneous electronic health records (EHRs) into meaningful clinical concepts. Unsupervised phenotyping methods have the potential to leverage a vast amount of labeled EHR data for phenotype discovery. However, existing unsupervised phenotyping methods do not incorporate current medical knowledge and cannot directly handle missing, or noisy data. We propose Rubik, a constrained non-negative tensor factorization and completion method for phenotyping. Rubik incorporates 1) guidance constraints to align with existing medical knowledge, and 2) pairwise constraints for obtaining distinct, non-overlapping phenotypes. Rubik also has built-in tensor completion that can significantly alleviate the impact of noisy and missing data. We utilize the Alternating Direction Method of Multipliers (ADMM) framework to tensor factorization and completion, which can be easily scaled through parallel computing. We evaluate Rubik on two EHR datasets, one of which contains 647,118 records for 7,744 patients from an outpatient clinic, the other of which is a public dataset containing 1,018,614 CMS claims records for 472,645 patients. Our results show that Rubik can discover more meaningful and distinct phenotypes than the baselines. In particular, by using knowledge guidance constraints, Rubik can also discover sub-phenotypes for several major diseases. Rubik also runs around seven times faster than current state-of-the-art tensor methods. Finally, Rubik is scalable to large datasets containing millions of EHR records.

摘要

计算表型分析是将异构电子健康记录(EHR)转换为有意义的临床概念的过程。无监督表型分析方法有潜力利用大量带标签的EHR数据进行表型发现。然而,现有的无监督表型分析方法没有纳入当前医学知识,并且无法直接处理缺失或有噪声的数据。我们提出了Rubik,一种用于表型分析的约束非负张量分解与补全方法。Rubik纳入了1)指导约束以与现有医学知识对齐,以及2)成对约束以获得不同的、不重叠的表型。Rubik还具有内置的张量补全功能,可显著减轻噪声和缺失数据的影响。我们利用交替方向乘子法(ADMM)框架进行张量分解与补全,该框架可通过并行计算轻松扩展。我们在两个EHR数据集上评估了Rubik,其中一个包含来自门诊诊所的7744名患者的647118条记录,另一个是包含472645名患者的1018614条CMS理赔记录的公共数据集。我们的结果表明,Rubik能比基线方法发现更有意义且不同的表型。特别是,通过使用知识指导约束,Rubik还能发现几种主要疾病的亚表型。Rubik的运行速度也比当前最先进的张量方法快约七倍。最后,Rubik可扩展到包含数百万条EHR记录的大型数据集。

相似文献

1
Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics.
KDD. 2015 Aug;2015:1265-1274. doi: 10.1145/2783258.2783395.
2
COPA: Constrained PARAFAC2 for Sparse & Large Datasets.
Proc ACM Int Conf Inf Knowl Manag. 2018 Oct;2018:793-802. doi: 10.1145/3269206.3271775.
3
Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis.
Proc ACM Int Conf Inf Knowl Manag. 2019 Nov;2019:1291-1300. doi: 10.1145/3357384.3357878.
4
Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization.
J Biomed Inform. 2019 May;93:103125. doi: 10.1016/j.jbi.2019.103125. Epub 2019 Feb 8.
5
Federated Tensor Factorization for Computational Phenotyping.
KDD. 2017 Aug;2017:887-895. doi: 10.1145/3097983.3098118.
6
Communication Efficient Tensor Factorization for Decentralized Healthcare Networks.
Proc IEEE Int Conf Data Min. 2021 Dec;2021:1216-1221. doi: 10.1109/icdm51629.2021.00147. Epub 2022 Jan 24.
7
TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records.
Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:193-203. doi: 10.1145/3368555.3384464.
8
Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics.
Proc Int World Wide Web Conf. 2021 Apr;2021:171-182. doi: 10.1145/3442381.3449832.
10
SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping.
KDD. 2018 Jul;2018:2080-2089. doi: 10.1145/3219819.3219999.

引用本文的文献

2
MASMDDI: multi-layer adaptive soft-mask graph neural network for drug-drug interaction prediction.
Front Pharmacol. 2024 May 20;15:1369403. doi: 10.3389/fphar.2024.1369403. eCollection 2024.
4
Communication Efficient Tensor Factorization for Decentralized Healthcare Networks.
Proc IEEE Int Conf Data Min. 2021 Dec;2021:1216-1221. doi: 10.1109/icdm51629.2021.00147. Epub 2022 Jan 24.
5
CCPE: cell cycle pseudotime estimation for single cell RNA-seq data.
Nucleic Acids Res. 2022 Jan 25;50(2):704-716. doi: 10.1093/nar/gkab1236.
6
Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records.
Patterns (N Y). 2021 Sep 2;2(9):100337. doi: 10.1016/j.patter.2021.100337. eCollection 2021 Sep 10.
7
Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics.
Proc Int World Wide Web Conf. 2021 Apr;2021:171-182. doi: 10.1145/3442381.3449832.
8
SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping.
KDD. 2018 Jul;2018:2080-2089. doi: 10.1145/3219819.3219999.
9
COPA: Constrained PARAFAC2 for Sparse & Large Datasets.
Proc ACM Int Conf Inf Knowl Manag. 2018 Oct;2018:793-802. doi: 10.1145/3269206.3271775.
10
Untangling the complexity of multimorbidity with machine learning.
Mech Ageing Dev. 2020 Sep;190:111325. doi: 10.1016/j.mad.2020.111325. Epub 2020 Aug 6.

本文引用的文献

1
Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200×.
Proc SIAM Int Conf Data Min. 2014;2014:118-126. doi: 10.1137/1.9781611973440.14.
2
Limestone: high-throughput candidate phenotype generation via tensor factorization.
J Biomed Inform. 2014 Dec;52:199-211. doi: 10.1016/j.jbi.2014.07.001. Epub 2014 Jul 16.
3
Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.
J Am Med Inform Assoc. 2013 Dec;20(e2):e206-11. doi: 10.1136/amiajnl-2013-002428.
5
Applying active learning to high-throughput phenotyping algorithms for electronic health records data.
J Am Med Inform Assoc. 2013 Dec;20(e2):e253-9. doi: 10.1136/amiajnl-2013-001945. Epub 2013 Jul 13.
6
Next-generation phenotyping of electronic health records.
J Am Med Inform Assoc. 2013 Jan 1;20(1):117-21. doi: 10.1136/amiajnl-2012-001145. Epub 2012 Sep 6.
7
Tensor completion for estimating missing values in visual data.
IEEE Trans Pattern Anal Mach Intell. 2013 Jan;35(1):208-20. doi: 10.1109/TPAMI.2012.39.
8
Electronic medical records for genetic research: results of the eMERGE consortium.
Sci Transl Med. 2011 Apr 20;3(79):79re1. doi: 10.1126/scitranslmed.3001807.
9
PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations.
Bioinformatics. 2010 May 1;26(9):1205-10. doi: 10.1093/bioinformatics/btq126. Epub 2010 Mar 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验