Tian Qi, Zou Jianxiao, Fang Yuan, Yu Zhongli, Tang Jianxiong, Song Ying, Fan Shicai
School of Automation Engineering, University of Electronic Science and Technology of China.
Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
Front Genet. 2019 Sep 5;10:774. doi: 10.3389/fgene.2019.00774. eCollection 2019.
DNA methylation is a widely investigated epigenetic mark that plays a vital role in tumorigenesis. Advancements in high-throughput assays, such as the Infinium 450K platform, provide genome-scale DNA methylation landscapes in single-CpG locus resolution, and the identification of differentially methylated loci has become an insightful approach to deepen our understanding of cancers. However, the situation with extremely unbalanced numbers of samples and loci (approximately 1:1,000) makes it rather difficult to explore differential methylation between the sick and the normal. In this article, a hybrid approach based on ensemble feature selection for identifying differentially methylated loci (HyDML) was proposed by incorporating instance perturbation and multiple function models. Experiments on data from The Cancer Genome Atlas showed that HyDML not only achieved effective DML identification, but also outperformed the single-feature selection approach in terms of classification performance and the robustness of feature selection. The intensive analysis of the DML indicated that different types of cancers have mutual patterns, and the stable DML sharing in pan-cancers is of the great potential to be biomarkers, which may strengthen the confidence of domain experts to implement biological validations.
DNA甲基化是一种被广泛研究的表观遗传标记,在肿瘤发生过程中起着至关重要的作用。诸如Infinium 450K平台等高通量检测技术的进步,能够以单CpG位点分辨率提供全基因组规模的DNA甲基化图谱,而识别差异甲基化位点已成为深化我们对癌症理解的一种有见地的方法。然而,样本数量与位点数量极不平衡(约为1:1000)的情况使得探索患病组与正常组之间的差异甲基化变得相当困难。在本文中,通过结合实例扰动和多种功能模型,提出了一种基于集成特征选择来识别差异甲基化位点的混合方法(HyDML)。对来自癌症基因组图谱(The Cancer Genome Atlas)的数据进行的实验表明,HyDML不仅实现了有效的差异甲基化位点识别,而且在分类性能和特征选择的稳健性方面优于单特征选择方法。对差异甲基化位点的深入分析表明,不同类型的癌症具有共同模式,泛癌中共享的稳定差异甲基化位点具有很大的潜力成为生物标志物,这可能会增强领域专家进行生物学验证的信心。