Maciejewski Emily, Horvath Steve, Ernst Jason
Computer Science Department, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
Genome Biol. 2025 May 20;26(1):133. doi: 10.1186/s13059-025-03561-2.
The large-scale application of the mammalian methylation array has substantially expanded the availability of DNA methylation data in mammalian species. However, this data captures only a small portion of species-tissue combinations. To address this, we develop CMImpute (Cross-species Methylation Imputation), a method based on a conditional variational autoencoder, to impute DNA methylation representing species-tissue combinations. We demonstrate that CMImpute achieves strong sample-wise correlation between imputed and observed values. Using CMImpute and data from 348 species and 59 tissue types, we impute methylation data for 19,786 new species-tissue combinations. We expect CMImpute will be a useful resource for DNA methylation analyses.
哺乳动物甲基化阵列的大规模应用极大地扩展了哺乳动物物种中DNA甲基化数据的可得性。然而,这些数据仅涵盖了一小部分物种-组织组合。为了解决这一问题,我们开发了CMImpute(跨物种甲基化插补),一种基于条件变分自编码器的方法,用于插补代表物种-组织组合的DNA甲基化。我们证明CMImpute在插补值和观测值之间实现了很强的样本相关性。利用CMImpute和来自348个物种及59种组织类型的数据,我们插补了19786种新的物种-组织组合的甲基化数据。我们期望CMImpute将成为DNA甲基化分析的有用资源。