基于 MapReduce 的并行遗传算法在年龄预测中的 CpG 位点选择。

MapReduce-Based Parallel Genetic Algorithm for CpG-Site Selection in Age Prediction.

机构信息

Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran P.O. Box 14115-143, Iran.

Institute for Research in Fundamental Sciences (IPM), School of Computer Science, Tehran P.O. Box 14115-143, Iran.

出版信息

Genes (Basel). 2019 Nov 25;10(12):969. doi: 10.3390/genes10120969.

DOI:10.3390/genes10120969

PMID:31775313

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6947642/

Abstract

Genomic biomarkers such as DNA methylation (DNAm) are employed for age prediction. In recent years, several studies have suggested the association between changes in DNAm and its effect on human age. The high dimensional nature of this type of data significantly increases the execution time of modeling algorithms. To mitigate this problem, we propose a two-stage parallel algorithm for selection of age related CpG-sites. The algorithm first attempts to cluster the data into similar age ranges. In the next stage, a parallel genetic algorithm (PGA), based on the MapReduce paradigm (MR-based PGA), is used for selecting age-related features of each individual age range. In the proposed method, the execution of the algorithm for each age range (data parallel), the evaluation of chromosomes (task parallel) and the calculation of the fitness function (data parallel) are performed using a novel parallel framework. In this paper, we consider 16 different healthy DNAm datasets that are related to the human blood tissue and that contain the relevant age information. These datasets are combined into a single unioned set, which is in turn randomly divided into two sets of train and test data with a ratio of 7:3, respectively. We build a Gradient Boosting Regressor (GBR) model on the selected CpG-sites from the train set. To evaluate the model accuracy, we compared our results with state-of-the-art approaches that used these datasets, and observed that our method performs better on the unseen test dataset with a Mean Absolute Deviation (MAD) of 3.62 years, and a correlation (R) of 95.96% between age and DNAm. In the train data, the MAD and R are 1.27 years and 99.27%, respectively. Finally, we evaluate our method in terms of the effect of parallelization in computation time. The algorithm without parallelization requires 4123 min to complete, whereas the parallelized execution on 3 computing machines having 32 processing cores each, only takes a total of 58 min. This shows that our proposed algorithm is both efficient and scalable.

摘要

基因组生物标志物，如 DNA 甲基化（DNAm），被用于年龄预测。近年来，有几项研究表明 DNAm 的变化与其对人类年龄的影响之间存在关联。这种类型的数据的高维性质显著增加了建模算法的执行时间。为了解决这个问题，我们提出了一种两阶段并行算法，用于选择与年龄相关的 CpG 位点。该算法首先尝试将数据聚类到相似的年龄范围内。在下一阶段，使用基于 MapReduce 范例（基于 MR 的 PGA）的并行遗传算法（PGA）来选择每个年龄范围的与年龄相关的特征。在提出的方法中，使用新颖的并行框架执行每个年龄范围的算法（数据并行）、评估染色体（任务并行）和计算适应度函数（数据并行）。在本文中，我们考虑了 16 个不同的与人类血液组织相关的健康 DNAm 数据集，这些数据集包含相关的年龄信息。这些数据集被组合成一个单一的联合集，然后随机分为两个训练集和测试集，比例分别为 7:3。我们在训练集上选择的 CpG 位点上构建了一个梯度提升回归器（GBR）模型。为了评估模型的准确性，我们将我们的结果与使用这些数据集的最新方法进行了比较，并观察到我们的方法在未见过的测试数据集上表现更好，平均绝对偏差（MAD）为 3.62 岁，年龄和 DNAm 之间的相关性（R）为 95.96%。在训练数据中，MAD 和 R 分别为 1.27 岁和 99.27%。最后，我们根据并行化在计算时间方面的效果来评估我们的方法。没有并行化的算法需要 4123 分钟才能完成，而在 3 台具有 32 个处理核心的计算机上并行执行仅需要总共 58 分钟。这表明我们提出的算法既高效又可扩展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abc/6947642/9cf2c417a29e/genes-10-00969-g001.jpg

相似文献

MapReduce-Based Parallel Genetic Algorithm for CpG-Site Selection in Age Prediction.基于 MapReduce 的并行遗传算法在年龄预测中的 CpG 位点选择。

Genes (Basel). 2019 Nov 25;10(12):969. doi: 10.3390/genes10120969.

Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.基于DNA甲基化的年龄预测：大规模平行测序与随机森林回归

Forensic Sci Int Genet. 2017 Nov;31:19-28. doi: 10.1016/j.fsigen.2017.07.015. Epub 2017 Aug 1.

Identifying blood-specific age-related DNA methylation markers on the Illumina MethylationEPIC® BeadChip.鉴定 Illumina MethylationEPIC® BeadChip 上与血液相关的年龄相关 DNA 甲基化标记物。

Forensic Sci Int. 2019 Oct;303:109944. doi: 10.1016/j.forsciint.2019.109944. Epub 2019 Sep 12.

Systematic feature selection improves accuracy of methylation-based forensic age estimation in Han Chinese males.系统的特征选择可提高汉族男性基于甲基化的法医年龄估计的准确性。

Forensic Sci Int Genet. 2018 Jul;35:38-45. doi: 10.1016/j.fsigen.2018.03.009. Epub 2018 Mar 23.

Age Prediction of Human Based on DNA Methylation by Blood Tissues.基于血液组织 DNA 甲基化的人类年龄预测。

Genes (Basel). 2021 Jun 6;12(6):870. doi: 10.3390/genes12060870.

Human age prediction based on DNA methylation of non-blood tissues.基于非血液组织 DNA 甲基化的人类年龄预测。

Comput Methods Programs Biomed. 2019 Apr;171:11-18. doi: 10.1016/j.cmpb.2019.02.010. Epub 2019 Feb 19.

Individual CpG sites that are associated with age and life expectancy become hypomethylated upon aging.与年龄和预期寿命相关的单个CpG位点在衰老过程中会发生低甲基化。

Clin Epigenetics. 2017 Feb 2;9:9. doi: 10.1186/s13148-017-0315-9. eCollection 2017.

Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models.使用Hadoop MapReduce实现并行遗传算法：全局模型、网格模型和孤岛模型的比较

Evol Comput. 2018 Winter;26(4):535-567. doi: 10.1162/evco_a_00213. Epub 2017 Jun 29.

Detection and evaluation of DNA methylation markers found at SCGN and KLF14 loci to estimate human age.检测和评估在SCGN和KLF14基因座发现的DNA甲基化标记以估计人类年龄。

Forensic Sci Int Genet. 2017 Nov;31:81-88. doi: 10.1016/j.fsigen.2017.07.011. Epub 2017 Aug 7.

A Statistical Framework to Identify Deviation from Time Linearity in Epigenetic Aging.一种用于识别表观遗传衰老中时间线性偏差的统计框架。

PLoS Comput Biol. 2016 Nov 11;12(11):e1005183. doi: 10.1371/journal.pcbi.1005183. eCollection 2016 Nov.

本文引用的文献

Human age prediction based on DNA methylation of non-blood tissues.基于非血液组织 DNA 甲基化的人类年龄预测。

Comput Methods Programs Biomed. 2019 Apr;171:11-18. doi: 10.1016/j.cmpb.2019.02.010. Epub 2019 Feb 19.

Platform-independent models for age prediction using DNA methylation data.基于 DNA 甲基化数据的与平台无关的年龄预测模型。

Forensic Sci Int Genet. 2019 Jan;38:39-47. doi: 10.1016/j.fsigen.2018.10.005. Epub 2018 Oct 9.

Human Age Prediction Based on DNA Methylation Using a Gradient Boosting Regressor.基于梯度提升回归器利用DNA甲基化进行人类年龄预测

Genes (Basel). 2018 Aug 21;9(9):424. doi: 10.3390/genes9090424.

Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation.基于 DNA 甲基化的法医年龄预测三种统计预测模型的评估。

Forensic Sci Int Genet. 2018 May;34:128-133. doi: 10.1016/j.fsigen.2018.02.008. Epub 2018 Feb 9.

Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.基于DNA甲基化的年龄预测：大规模平行测序与随机森林回归

Forensic Sci Int Genet. 2017 Nov;31:19-28. doi: 10.1016/j.fsigen.2017.07.015. Epub 2017 Aug 1.

Gene selection for tumor classification using a novel bio-inspired multi-objective approach.基于新型生物启发式多目标方法的肿瘤分类基因选择。

Genomics. 2018 Jan;110(1):10-17. doi: 10.1016/j.ygeno.2017.07.010. Epub 2017 Aug 3.

DNA methylation-based age prediction from saliva: High age predictability by combination of 7 CpG markers.基于唾液DNA甲基化的年龄预测：通过7个CpG标记物组合实现高年龄预测性

Forensic Sci Int Genet. 2017 Jul;29:118-125. doi: 10.1016/j.fsigen.2017.04.006. Epub 2017 Apr 9.

DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing.使用人工神经网络和下一代测序技术基于DNA甲基化的法医年龄预测

Forensic Sci Int Genet. 2017 May;28:225-236. doi: 10.1016/j.fsigen.2017.02.009. Epub 2017 Feb 28.

Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification.一种两阶段基因选择方法的开发，该方法结合了一种使用布谷鸟优化算法和和声搜索的新型混合方法用于癌症分类。

J Biomed Inform. 2017 Mar;67:11-20. doi: 10.1016/j.jbi.2017.01.016. Epub 2017 Feb 3.

Human age estimation from blood using mRNA, DNA methylation, DNA rearrangement, and telomere length.利用信使核糖核酸、DNA甲基化、DNA重排和端粒长度从血液中估计人类年龄。

Forensic Sci Int Genet. 2016 Sep;24:33-43. doi: 10.1016/j.fsigen.2016.05.014. Epub 2016 May 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 MapReduce 的并行遗传算法在年龄预测中的 CpG 位点选择。

MapReduce-Based Parallel Genetic Algorithm for CpG-Site Selection in Age Prediction.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献