Institute for Informatics (I2), Washington University School of Medicine, St. Louis, MO, USA.
Health Informatics and Analytics, Centers for Health Metrics and Evaluation, American Heart Association, Dallas, TX, USA.
Sci Rep. 2021 Oct 25;11(1):20969. doi: 10.1038/s41598-021-00345-z.
Certain diseases have strong comorbidity and co-occurrence with others. Understanding disease-disease associations can potentially increase awareness among healthcare providers of co-occurring conditions and facilitate earlier diagnosis, prevention and treatment of patients. In this study, we utilized the valuable and large The Guideline Advantage (TGA) longitudinal electronic health record dataset from 70 outpatient clinics across the United States to investigate potential disease-disease associations. Specifically, the most prevalent 50 disease diagnoses were manually identified from 165,732 unique patients. To investigate the co-occurrence or dependency associations among the 50 diseases, the categorical disease terms were first mapped into numerical vectors based on disease co-occurrence frequency in individual patients using the Word2Vec approach. Then the novel and interesting disease association clusters were identified using correlation and clustering analyses in the numerical space. Moreover, the distribution of time delay (Δt) between pair-wise strongly associated diseases (correlation coefficients ≥ 0.5) were calculated to show the dependency among the diseases. The results can indicate the risk of disease comorbidity and complications, and facilitate disease prevention and optimal treatment decision-making.
某些疾病与其他疾病具有很强的共病和并发关系。了解疾病-疾病之间的关联可能会提高医疗保健提供者对同时发生的疾病的认识,并有助于及早诊断、预防和治疗患者。在这项研究中,我们利用了来自美国 70 家门诊诊所的有价值的大型 The Guideline Advantage (TGA) 纵向电子健康记录数据集来研究潜在的疾病-疾病关联。具体来说,从 165732 名独特的患者中手动识别出最常见的 50 种疾病诊断。为了研究 50 种疾病之间的共现或依赖关系,首先使用 Word2Vec 方法根据个体患者中疾病共现的频率将类别疾病术语映射到数值向量中。然后使用相关和聚类分析在数值空间中识别出新颖有趣的疾病关联群集。此外,还计算了两两强相关疾病(相关系数≥0.5)之间的时间延迟 (Δt) 的分布,以显示疾病之间的依赖性。研究结果可以指示疾病共病和并发症的风险,并有助于疾病预防和最佳治疗决策。