Torabi Moghadam Behrooz, Dabrowski Michal, Kaminska Bozena, Grabherr Manfred G, Komorowski Jan
Department of Cell and Molecular Biology, Computational and Systems Biology, Uppsala University, Uppsala, Sweden.
Laboratory of Bioinformatics, Neurobiology Center, Nencki Institute of Experimental Biology of Polish Academy of Sciences, Warsaw, Poland.
BMC Bioinformatics. 2016 Sep 23;17(1):393. doi: 10.1186/s12859-016-1259-3.
DNA methylation plays a key role in developmental processes, which is reflected in changing methylation patterns at specific CpG sites over the lifetime of an individual. The underlying mechanisms are complex and possibly affect multiple genes or entire pathways.
We applied a multivariate approach to identify combinations of CpG sites that undergo modifications when transitioning between developmental stages. Monte Carlo feature selection produced a list of ranked and statistically significant CpG sites, while rule-based models allowed for identifying particular methylation changes in these sites. Our rule-based classifier reports combinations of CpG sites, together with changes in their methylation status in the form of easy-to-read IF-THEN rules, which allows for identification of the genes associated with the underlying sites.
We utilized machine learning and statistical methods to discretize decision class (age) values to get a general pattern of methylation changes over the lifespan. The CpG sites present in the significant rules were annotated to genes involved in brain formation, general development, as well as genes linked to cancer and Alzheimer's disease.
DNA甲基化在发育过程中起关键作用,这体现在个体一生中特定CpG位点甲基化模式的变化上。其潜在机制复杂,可能影响多个基因或整个信号通路。
我们应用多变量方法来识别在发育阶段转换时发生修饰的CpG位点组合。蒙特卡罗特征选择产生了一份排名且具有统计学意义的CpG位点列表,而基于规则的模型则能够识别这些位点中特定的甲基化变化。我们基于规则的分类器报告CpG位点组合,以及它们甲基化状态的变化,形式为易于阅读的“如果-那么”规则,这使得能够识别与潜在位点相关的基因。
我们利用机器学习和统计方法对决策类(年龄)值进行离散化,以获得一生中甲基化变化的一般模式。重要规则中存在的CpG位点被注释到参与脑形成、一般发育的基因,以及与癌症和阿尔茨海默病相关的基因。