Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran.
Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran.
BMC Bioinformatics. 2024 Jan 23;25(1):37. doi: 10.1186/s12859-024-05658-0.
DNA methylation is a major epigenetic modification involved in many physiological processes. Normal methylation patterns are disrupted in many diseases and methylation-based biomarkers have shown promise in several contexts. Marker discovery typically involves the analysis of publicly available DNA methylation data from high-throughput assays. Numerous methods for identification of differentially methylated biomarkers have been developed, making the need for best practices guidelines and context-specific analyses workflows exceedingly high. To this end, here we propose TASA, a novel method for simulating methylation array data in various scenarios. We then comprehensively assess different data analysis workflows using real and simulated data and suggest optimal start-to-finish analysis workflows. Our study demonstrates that the choice of analysis pipeline for DNA methylation-based marker discovery is crucial and different across different contexts.
DNA 甲基化是一种重要的表观遗传修饰,参与许多生理过程。许多疾病中正常的甲基化模式被打乱,基于甲基化的生物标志物在几种情况下显示出了前景。标志物的发现通常涉及对高通量检测中公开的 DNA 甲基化数据进行分析。已经开发了许多用于识别差异甲基化生物标志物的方法,这使得最佳实践指南和特定于上下文的分析工作流程的需求变得非常高。为此,我们在这里提出了 TASA,一种在各种情况下模拟甲基化阵列数据的新方法。然后,我们使用真实和模拟数据全面评估不同的数据分析工作流程,并建议最佳的从头到尾的分析工作流程。我们的研究表明,用于基于 DNA 甲基化的标志物发现的分析管道的选择至关重要,并且在不同情况下也不同。