Wang Yuening, Benavides Rodrigo, Diatchenko Luda, Grant Audrey V, Li Yue
School of Computer Science, McGill University, Canada.
Department of Anesthesiology, Centro Nacional de Rehabilitación, San Jose, Costa Rica.
iScience. 2022 May 12;25(6):104390. doi: 10.1016/j.isci.2022.104390. eCollection 2022 Jun 17.
Large biobank repositories of clinical conditions and medications data open opportunities to investigate the phenotypic disease network. We present a graph embedded topic model (GETM). We integrate existing biomedical knowledge graph information in the form of pre-trained graph embedding into the embedded topic model. Via a variational autoencoder framework, we infer patient phenotypic mixture by modeling multi-modal discrete patient medical records. We applied GETM to UK Biobank (UKB) self-reported clinical phenotype data, which contains 443 self-reported medical conditions and 802 medications for 457,461 individuals. Compared to existing methods, GETM demonstrates good imputation performance. With a more focused application on characterizing pain phenotypes, we observe that GETM-inferred phenotypes not only accurately predict the status of chronic musculoskeletal (CMK) pain but also reveal known pain-related topics. Intriguingly, medications and conditions in the cardiovascular category are enriched among the most predictive topics of chronic pain.
大型临床病症和药物数据生物样本库为研究表型疾病网络提供了机会。我们提出了一种图嵌入主题模型(GETM)。我们将以预训练图嵌入形式存在的现有生物医学知识图谱信息整合到嵌入主题模型中。通过变分自编码器框架,我们通过对多模态离散患者病历进行建模来推断患者表型混合情况。我们将GETM应用于英国生物样本库(UKB)的自我报告临床表型数据,该数据包含457461名个体的443种自我报告病症和802种药物。与现有方法相比,GETM表现出良好的插补性能。通过更专注于表征疼痛表型的应用,我们观察到GETM推断的表型不仅能准确预测慢性肌肉骨骼(CMK)疼痛的状态,还能揭示已知的疼痛相关主题。有趣的是,心血管类别的药物和病症在慢性疼痛的最具预测性主题中富集。