Institute of Machine Learning and Systems Biology, College of Electronic and Information Engineering, Tongji University, Shanghai, 201804, P.R. China.
Sci Rep. 2019 Apr 3;9(1):5601. doi: 10.1038/s41598-019-42010-6.
Aberrant DNA methylation may contribute to development of cancer. However, understanding the associations between DNA methylation and cancer remains a challenge because of the complex mechanisms involved in the associations and insufficient sample sizes. The unprecedented wealth of DNA methylation, gene expression and disease status data give us a new opportunity to design machine learning methods to investigate the underlying associated mechanisms. In this paper, we propose a network-guided association mapping approach from DNA methylation to disease (NAMDD). Compared with existing methods, NAMDD finds methylation-disease path associations by integrating analysis of multiple data combined with a stability selection strategy, thereby mining more information in the datasets and improving the quality of resultant methylation sites. The experimental results on both synthetic and real ovarian cancer data show that NAMDD substantially outperforms former disease-related methylation site research methods (including NsRRR and PCLOGIT) under false positive control. Furthermore, we applied NAMDD to ovarian cancer data, identified significant path associations and provided hypothetical biological path associations to explain our findings.
异常的 DNA 甲基化可能有助于癌症的发展。然而,由于涉及的关联机制复杂且样本量不足,理解 DNA 甲基化与癌症之间的关联仍然是一个挑战。前所未有的 DNA 甲基化、基因表达和疾病状态数据为我们提供了一个新的机会,可以设计机器学习方法来研究潜在的相关机制。在本文中,我们提出了一种从 DNA 甲基化到疾病的网络引导关联映射方法(NAMDD)。与现有方法相比,NAMDD 通过整合分析多种数据并结合稳定性选择策略来寻找甲基化-疾病路径关联,从而挖掘数据集更多信息并提高所得甲基化位点的质量。在合成和真实卵巢癌数据上的实验结果表明,在假阳性控制下,NAMDD 显著优于以前的与疾病相关的甲基化位点研究方法(包括 NsRRR 和 PCLOGIT)。此外,我们将 NAMDD 应用于卵巢癌数据,确定了显著的路径关联,并提供了假设的生物学路径关联来解释我们的发现。