California Pacific Medical Center Research Institute, Sutter Health, San Francisco, CA 94143, United States.
Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, 19104, United States.
Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae423.
Infinium DNA methylation BeadChips are widely used for genome-wide DNA methylation profiling at the population scale. Recent updates to probe content and naming conventions in the EPIC version 2 (EPICv2) arrays have complicated integrating new data with previous Infinium array platforms, such as the MethylationEPIC (EPIC) and the HumanMethylation450 (HM450) BeadChip.
We present mLiftOver, a user-friendly tool that harmonizes probe ID, methylation level, and signal intensity data across different Infinium platforms. It manages probe replicates, missing data imputation, and platform-specific bias for accurate data conversion. We validated the tool by applying HM450-based cancer classifiers to EPICv2 cancer data, achieving high accuracy. Additionally, we successfully integrated EPICv2 healthy tissue data with legacy HM450 data for tissue identity analysis and produced consistent copy number profiles in cancer cells.
mLiftOver is implemented R and available in the Bioconductor package SeSAMe (version 1.21.13+): https://bioconductor.org/packages/release/bioc/html/sesame.html. Analysis of EPIC and EPICv2 platform-specific bias and high-confidence mapping is available at https://github.com/zhou-lab/InfiniumAnnotationV1/raw/main/Anno/EPICv2/EPICv2ToEPIC_conversion.tsv.gz. The source code is available at https://github.com/zwdzwd/sesame/blob/devel/R/mLiftOver.R under the MIT license.
Infinium DNA 甲基化 BeadChips 广泛用于在人群规模上进行全基因组 DNA 甲基化谱分析。EPIC 版本 2(EPICv2)阵列中探针内容和命名约定的最新更新使得将新数据与以前的 Infinium 阵列平台(如 MethylationEPIC(EPIC)和 HumanMethylation450(HM450)BeadChip)集成变得复杂。
我们提出了 mLiftOver,这是一种用户友好的工具,可以跨不同的 Infinium 平台协调探针 ID、甲基化水平和信号强度数据。它管理探针副本、缺失数据插补以及特定于平台的偏差,以实现准确的数据转换。我们通过将基于 HM450 的癌症分类器应用于 EPICv2 癌症数据来验证该工具,实现了高精度。此外,我们成功地将 EPICv2 健康组织数据与传统的 HM450 数据集成,用于组织身份分析,并在癌细胞中产生一致的拷贝数谱。
mLiftOver 是用 R 编写的,并可在 Bioconductor 包 SeSAMe(版本 1.21.13+)中使用:https://bioconductor.org/packages/release/bioc/html/sesame.html。EPIC 和 EPICv2 平台特定偏差和高置信度映射的分析可在 https://github.com/zhou-lab/InfiniumAnnotationV1/raw/main/Anno/EPICv2/EPICv2ToEPIC_conversion.tsv.gz 上获得。源代码可在 https://github.com/zwdzwd/sesame/blob/devel/R/mLiftOver.R 下获得,根据 MIT 许可证。