DNA 甲基化预测器在数据预处理和标准化策略中的性能存在显著差异。

Significant variation in the performance of DNA methylation predictors across data preprocessing and normalization strategies.

机构信息

Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA, 90095-176, USA.

Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.

出版信息

Genome Biol. 2022 Oct 24;23(1):225. doi: 10.1186/s13059-022-02793-w.

DOI:10.1186/s13059-022-02793-w

PMID:36280888

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9590227/

Abstract

BACKGROUND

DNA methylation (DNAm)-based predictors hold great promise to serve as clinical tools for health interventions and disease management. While these algorithms often have high prediction accuracy, the consistency of their performance remains to be determined. We therefore conduct a systematic evaluation across 101 different DNAm data preprocessing and normalization strategies and assess how each analytical strategy affects the consistency of 41 DNAm-based predictors.

RESULTS

Our analyses are conducted in a large EPIC DNAm array dataset from the Jackson Heart Study (N = 2053) that included 146 pairs of technical replicate samples. By estimating the average absolute agreement between replicate pairs, we show that 32 out of 41 predictors (78%) demonstrate excellent consistency when appropriate data processing and normalization steps are implemented. Across all pairs of predictors, we find a moderate correlation in performance across analytical strategies (mean rho = 0.40, SD = 0.27), highlighting significant heterogeneity in performance across algorithms. Successful or unsuccessful removal of technical variation furthermore significantly impacts downstream phenotypic association analysis, such as all-cause mortality risk associations.

CONCLUSIONS

We show that DNAm-based algorithms are sensitive to technical variation. The right choice of data processing strategy is important to achieve reproducible estimates and improve prediction accuracy in downstream phenotypic association analyses. For each of the 41 DNAm predictors, we report its degree of consistency and provide the best performing analytical strategy as a guideline for the research community. As DNAm-based predictors become more and more widely used, our work helps improve their performance and standardize their implementation.

摘要

背景

基于 DNA 甲基化（DNAm）的预测因子有望成为健康干预和疾病管理的临床工具。虽然这些算法通常具有较高的预测准确性，但它们的性能一致性仍有待确定。因此，我们对 101 种不同的 DNAm 数据预处理和标准化策略进行了系统评估，并评估了每种分析策略如何影响 41 种基于 DNAm 的预测因子的一致性。

结果

我们的分析是在杰克逊心脏研究（N=2053）的大型 EPIC DNAm 阵列数据集中进行的，其中包括 146 对技术重复样本。通过估计重复对之间的平均绝对一致性，我们表明，在实施适当的数据处理和标准化步骤时，41 个预测因子中的 32 个（78%）表现出极好的一致性。在所有预测因子对中，我们发现分析策略之间的性能相关性适中（平均 rho=0.40，SD=0.27），突出了算法之间性能的显著异质性。技术变异的成功或不成功去除也会显著影响下游表型关联分析，例如全因死亡率风险关联。

结论

我们表明，基于 DNAm 的算法对技术变异敏感。选择正确的数据处理策略对于实现可重复的估计并提高下游表型关联分析中的预测准确性非常重要。对于 41 个 DNAm 预测因子中的每一个，我们报告其一致性程度，并提供性能最佳的分析策略作为研究界的指南。随着基于 DNAm 的预测因子越来越广泛地使用，我们的工作有助于提高它们的性能并标准化它们的实施。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a81/9590227/794acb92ef59/13059_2022_2793_Fig1_HTML.jpg

相似文献

Significant variation in the performance of DNA methylation predictors across data preprocessing and normalization strategies.

Genome Biol. 2022 Oct 24;23(1):225. doi: 10.1186/s13059-022-02793-w.

Systematic evaluation of DNA methylation age estimation with common preprocessing methods and the Infinium MethylationEPIC BeadChip array.

Clin Epigenetics. 2018 Oct 16;10(1):123. doi: 10.1186/s13148-018-0556-2.

Low reliability of DNA methylation across Illumina Infinium platforms in cord blood: implications for replication studies and meta-analyses of prenatal exposures.

Clin Epigenetics. 2022 Jun 28;14(1):80. doi: 10.1186/s13148-022-01299-3.

Seven-CpG DNA Methylation Age Determined by Single Nucleotide Primer Extension and Illumina's Infinium MethylationEPIC Array Provide Highly Comparable Results.

Front Genet. 2022 Jan 17;12:759357. doi: 10.3389/fgene.2021.759357. eCollection 2021.

A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data.

Clin Epigenetics. 2023 Mar 11;15(1):41. doi: 10.1186/s13148-023-01459-z.

Evaluating DNA methylation age on the Illumina MethylationEPIC Bead Chip.

PLoS One. 2019 Apr 19;14(4):e0207834. doi: 10.1371/journal.pone.0207834. eCollection 2019.

DNA methylation mediates the effect of maternal smoking on offspring birthweight: a birth cohort study of multi-ethnic US mother-newborn pairs.

Clin Epigenetics. 2021 Mar 4;13(1):47. doi: 10.1186/s13148-021-01032-6.

Epigenetic prediction of complex traits and mortality in a cohort of individuals with oropharyngeal cancer.

Clin Epigenetics. 2020 Apr 22;12(1):58. doi: 10.1186/s13148-020-00850-4.

Critical evaluation of the reliability of DNA methylation probes on the Illumina MethylationEPIC v1.0 BeadChip microarrays.

Epigenetics. 2024 Dec;19(1):2333660. doi: 10.1080/15592294.2024.2333660. Epub 2024 Apr 2.

Associations of DNA methylation algorithms of aging and cancer risk: Results from a prospective cohort study.

EBioMedicine. 2022 Jul;81:104083. doi: 10.1016/j.ebiom.2022.104083. Epub 2022 May 27.

引用本文的文献

Accounting for differences between Infinium MethylationEPIC v2 and v1 in DNA methylation-based tools.

Life Sci Alliance. 2025 Jul 8;8(9). doi: 10.26508/lsa.202403155. Print 2025 Sep.

MskAge-An Epigenetic Biomarker of Musculoskeletal Age Derived From a Genetic Algorithm Islands Model.

Aging Cell. 2025 Sep;24(9):e70149. doi: 10.1111/acel.70149. Epub 2025 Jun 19.

Epigenome-Wide Association Study of Depressive Symptoms in Black Women in the InterGEN Study.

Int J Mol Sci. 2024 Jul 12;25(14):7681. doi: 10.3390/ijms25147681.

Discrepancies in readouts between Infinium MethylationEPIC v2.0 and v1.0 reflected in DNA methylation-based tools: implications and considerations for human population epigenetic studies.

bioRxiv. 2024 Sep 28:2024.07.02.600461. doi: 10.1101/2024.07.02.600461.

Meta-analysis of epigenetic aging in schizophrenia reveals multifaceted relationships with age, sex, illness duration, and polygenic risk.

Clin Epigenetics. 2024 Apr 8;16(1):53. doi: 10.1186/s13148-024-01660-8.

Characterisation and reproducibility of the HumanMethylationEPIC v2.0 BeadChip for DNA methylation profiling.

BMC Genomics. 2024 Mar 6;25(1):251. doi: 10.1186/s12864-024-10027-5.

Circulating Leukocyte Subsets Before and After a Breast Cancer Diagnosis and Therapy.

JAMA Netw Open. 2024 Feb 5;7(2):e2356113. doi: 10.1001/jamanetworkopen.2023.56113.

Analysis of epigenetic clocks links yoga, sleep, education, reduced meat intake, coffee, and a SOCS2 gene variant to slower epigenetic aging.

Geroscience. 2024 Apr;46(2):2583-2604. doi: 10.1007/s11357-023-01029-4. Epub 2023 Dec 16.

Integration of datasets for individual prediction of DNA methylation-based biomarkers.

Genome Biol. 2023 Dec 5;24(1):278. doi: 10.1186/s13059-023-03114-5.

Changes in methylation-based aging in women who do and do not develop breast cancer.

J Natl Cancer Inst. 2023 Nov 8;115(11):1329-1336. doi: 10.1093/jnci/djad117.

本文引用的文献

Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies.

Genome Biol. 2019 Nov 14;20(1):235. doi: 10.1186/s13059-019-1878-x.

Epigenome-wide association study of leukocyte telomere length.

Aging (Albany NY). 2019 Aug 26;11(16):5876-5894. doi: 10.18632/aging.102230.

Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing.

Genome Med. 2019 Aug 23;11(1):54. doi: 10.1186/s13073-019-0667-1.

DNA methylation-based estimator of telomere length.

Aging (Albany NY). 2019 Aug 18;11(16):5895-5923. doi: 10.18632/aging.102173.

The diverse roles of DNA methylation in mammalian development and disease.

Nat Rev Mol Cell Biol. 2019 Oct;20(10):590-607. doi: 10.1038/s41580-019-0159-6. Epub 2019 Aug 9.

DNA methylation GrimAge strongly predicts lifespan and healthspan.

Aging (Albany NY). 2019 Jan 21;11(2):303-327. doi: 10.18632/aging.101684.

Systematic evaluation of DNA methylation age estimation with common preprocessing methods and the Infinium MethylationEPIC BeadChip array.

Clin Epigenetics. 2018 Oct 16;10(1):123. doi: 10.1186/s13148-018-0556-2.

Epigenetic prediction of complex traits and death.

Genome Biol. 2018 Sep 27;19(1):136. doi: 10.1186/s13059-018-1514-1.

Characterizing genetic and environmental influences on variable DNA methylation using monozygotic and dizygotic twins.

PLoS Genet. 2018 Aug 9;14(8):e1007544. doi: 10.1371/journal.pgen.1007544. eCollection 2018 Aug.

Epigenetic clock for skin and blood cells applied to Hutchinson Gilford Progeria Syndrome and studies.

Aging (Albany NY). 2018 Jul 26;10(7):1758-1775. doi: 10.18632/aging.101508.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DNA 甲基化预测器在数据预处理和标准化策略中的性能存在显著差异。

Significant variation in the performance of DNA methylation predictors across data preprocessing and normalization strategies.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献