Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
Department of Statistics, University of Oxford, Oxford, UK.
Nat Genet. 2023 Nov;55(11):1854-1865. doi: 10.1038/s41588-023-01522-8. Epub 2023 Oct 9.
The analysis of longitudinal data from electronic health records (EHRs) has the potential to improve clinical diagnoses and enable personalized medicine, motivating efforts to identify disease subtypes from patient comorbidity information. Here we introduce an age-dependent topic modeling (ATM) method that provides a low-rank representation of longitudinal records of hundreds of distinct diseases in large EHR datasets. We applied ATM to 282,957 UK Biobank samples, identifying 52 diseases with heterogeneous comorbidity profiles; analyses of 211,908 All of Us samples produced concordant results. We defined subtypes of the 52 heterogeneous diseases based on their comorbidity profiles and compared genetic risk across disease subtypes using polygenic risk scores (PRSs), identifying 18 disease subtypes whose PRS differed significantly from other subtypes of the same disease. We further identified specific genetic variants with subtype-dependent effects on disease risk. In conclusion, ATM identifies disease subtypes with differential genome-wide and locus-specific genetic risk profiles.
从电子健康记录 (EHR) 中分析纵向数据有可能改善临床诊断并实现个性化医疗,这促使人们从患者合并症信息中识别疾病亚型。在这里,我们介绍了一种依赖年龄的主题建模 (ATM) 方法,该方法为大型 EHR 数据集中数百种不同疾病的纵向记录提供了低秩表示。我们将 ATM 应用于 282957 名英国生物库样本,确定了 52 种具有异质合并症特征的疾病;对 211908 名所有美国人样本的分析产生了一致的结果。我们根据合并症特征定义了 52 种异质疾病的亚型,并使用多基因风险评分 (PRS) 比较了疾病亚型之间的遗传风险,确定了 18 种疾病亚型的 PRS 与同一疾病的其他亚型有显著差异。我们进一步确定了具有疾病风险的亚型依赖特定遗传变异。总之,ATM 确定了具有不同全基因组和特定基因座遗传风险特征的疾病亚型。