Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.
Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA.
Genome Biol. 2019 Jul 12;20(1):138. doi: 10.1186/s13059-019-1743-y.
Methylation datasets are affected by innumerable sources of variability, both biological (cell-type composition, genetics) and technical (batch effects). Here, we propose a reference-free method based on sparse canonical correlation analysis to separate the biological from technical sources of variability. We show through simulations and real data that our method, CONFINED, is not only more accurate than the state-of-the-art reference-free methods for capturing known, replicable biological variability, but it is also considerably more robust to dataset-specific technical variability than previous approaches. CONFINED is available as an R package as detailed at https://github.com/cozygene/CONFINED .
甲基化数据集受到无数生物(细胞类型组成、遗传学)和技术(批次效应)来源的变异性的影响。在这里,我们提出了一种基于稀疏典型相关分析的无参考方法,以将生物学和技术来源的变异性分开。我们通过模拟和真实数据表明,我们的方法 CONFINED 不仅比现有的无参考方法更准确地捕获已知的、可复制的生物学变异性,而且比以前的方法对特定于数据集的技术变异性更稳健。CONFINED 可作为 R 包在 https://github.com/cozygene/CONFINED 上获得。