Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, 19104, United States.
The Center for Health AI and Synthesis of Evidence (CHASE), The University of Pennsylvania, Philadelphia, PA, 19104, United States.
Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae096.
Gaussian graphical models (GGMs) are useful for understanding the complex relationships between biological entities. Transfer learning can improve the estimation of GGMs in a target dataset by incorporating relevant information from related source studies. However, biomedical research often involves intrinsic and latent heterogeneity within a study, such as heterogeneous subpopulations. This heterogeneity can make it difficult to identify informative source studies or lead to negative transfer if the source study is improperly used. To address this challenge, we developed a heterogeneous latent transfer learning (Latent-TL) approach that accounts for both within-sample and between-sample heterogeneity. The idea behind this approach is to "learn from the alike" by leveraging the similarities between source and target GGMs within each subpopulation. The Latent-TL algorithm simultaneously identifies common subpopulation structures among samples and facilitates the learning of target GGMs using source samples from the same subpopulation. Through extensive simulations and real data application, we have shown that the proposed method outperforms single-site learning and standard transfer learning that ignores the latent structures. We have also demonstrated the applicability of the proposed algorithm in characterizing gene co-expression networks in breast cancer patients, where the inferred genetic networks identified many biologically meaningful gene-gene interactions.
高斯图形模型(GGMs)可用于理解生物实体之间的复杂关系。迁移学习可以通过整合相关来源研究的相关信息来提高目标数据集中 GGM 的估计。然而,生物医学研究通常涉及研究内部的固有和潜在异质性,例如异质亚群。这种异质性可能难以识别信息丰富的源研究,或者如果源研究使用不当,可能导致负迁移。为了解决这一挑战,我们开发了一种异构潜在迁移学习(Latent-TL)方法,该方法考虑了样本内和样本间的异质性。这种方法的思路是通过利用源和目标 GGM 之间的相似性,从相似的方面“学习”。Latent-TL 算法同时识别样本之间的常见亚群结构,并利用来自同一亚群的源样本促进目标 GGM 的学习。通过广泛的模拟和真实数据应用,我们表明所提出的方法优于单站点学习和忽略潜在结构的标准迁移学习。我们还证明了所提出的算法在表征乳腺癌患者基因共表达网络中的适用性,其中推断的遗传网络确定了许多具有生物学意义的基因-基因相互作用。