School of Computer Science, University of Oklahoma, Norman, Oklahoma, United States of America.
Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma, United States of America.
PLoS Comput Biol. 2023 Jul 7;19(7):e1011211. doi: 10.1371/journal.pcbi.1011211. eCollection 2023 Jul.
Many complex diseases share common genetic determinants and are comorbid in a population. We hypothesized that the co-occurrences of diseases and their overlapping genetic etiology can be exploited to simultaneously improve multiple diseases' polygenic risk scores (PRS). This hypothesis was tested using a multi-task learning (MTL) approach based on an explainable neural network architecture. We found that parallel estimations of the PRS for 17 prevalent cancers in a pan-cancer MTL model were generally more accurate than independent estimations for individual cancers in comparable single-task learning (STL) models. Such performance improvement conferred by positive transfer learning was also observed consistently for 60 prevalent non-cancer diseases in a pan-disease MTL model. Interpretation of the MTL models revealed significant genetic correlations between the important sets of single nucleotide polymorphisms used by the neural network for PRS estimation. This suggested a well-connected network of diseases with shared genetic basis.
许多复杂疾病具有共同的遗传决定因素,并在人群中同时发生。我们假设,疾病的同时发生及其重叠的遗传病因可以被利用来同时提高多种疾病的多基因风险评分(PRS)。我们使用基于可解释神经网络架构的多任务学习(MTL)方法来检验这一假设。我们发现,在泛癌 MTL 模型中,对 17 种常见癌症的 PRS 的并行估计通常比可比单任务学习(STL)模型中对单个癌症的独立估计更为准确。在泛疾病 MTL 模型中,这种由正迁移学习带来的性能提升也在 60 种常见非癌症疾病中得到了一致的观察。对 MTL 模型的解释表明,神经网络用于 PRS 估计的重要单核苷酸多态性集合之间存在显著的遗传相关性。这表明存在一个具有共同遗传基础的疾病的紧密连接网络。