Zhang Yidong, Jiang Xilin, Mentzer Alexander J, McVean Gil, Lunter Gerton
Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK.
Chinese Academy of Medical Sciences Oxford Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK.
Cell Genom. 2023 Aug 1;3(8):100371. doi: 10.1016/j.xgen.2023.100371. eCollection 2023 Aug 9.
Many diseases show patterns of co-occurrence, possibly driven by systemic dysregulation of underlying processes affecting multiple traits. We have developed a method (treeLFA) for identifying such multimorbidities from routine health-care data, which combines topic modeling with an informative prior derived from medical ontology. We apply treeLFA to UK Biobank data and identify a variety of topics representing multimorbidity clusters, including a healthy topic. We find that loci identified using topic weights as traits in a genome-wide association study (GWAS) analysis, which we validated with a range of approaches, only partially overlap with loci from GWASs on constituent single diseases. We also show that treeLFA improves upon existing methods like latent Dirichlet allocation in various ways. Overall, our findings indicate that topic models can characterize multimorbidity patterns and that genetic analysis of these patterns can provide insight into the etiology of complex traits that cannot be determined from the analysis of constituent traits alone.
许多疾病呈现出共现模式,这可能是由影响多个性状的潜在过程的系统性失调所驱动。我们开发了一种方法(treeLFA),用于从常规医疗保健数据中识别此类多病共患情况,该方法将主题建模与源自医学本体的信息先验相结合。我们将treeLFA应用于英国生物银行数据,并识别出代表多病共患集群的各种主题,包括一个健康主题。我们发现,在全基因组关联研究(GWAS)分析中使用主题权重作为性状识别出的基因座,我们用一系列方法进行了验证,这些基因座仅与单一疾病的GWAS中的基因座部分重叠。我们还表明,treeLFA在多种方面优于现有的方法,如潜在狄利克雷分配。总体而言,我们的研究结果表明,主题模型可以表征多病共患模式,并且对这些模式的遗传分析可以为复杂性状的病因提供见解,而这些见解无法仅从对构成性状的分析中确定。