Nguyen Thi Mai, Le Hoang Long, Hwang Kyu-Baek, Hong Yun-Chul, Kim Jin Hee
Department of Integrative Bioscience & Biotechnology, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea.
Department of Computer Science & Engineering, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea.
Biomedicines. 2022 Jun 14;10(6):1406. doi: 10.3390/biomedicines10061406.
DNA methylation modification plays a vital role in the pathophysiology of high blood pressure (BP). Herein, we applied three machine learning (ML) algorithms including deep learning (DL), support vector machine, and random forest for detecting high BP using DNA methylome data. Peripheral blood samples of 50 elderly individuals were collected three times at three visits for DNA methylome profiling. Participants who had a history of hypertension and/or current high BP measure were considered to have high BP. The whole dataset was randomly divided to conduct a nested five-group cross-validation for prediction performance. Data in each outer training set were independently normalized using a min-max scaler, reduced dimensionality using principal component analysis, then fed into three predictive algorithms. Of the three ML algorithms, DL achieved the best performance (AUPRC = 0.65, AUROC = 0.73, accuracy = 0.69, and F1-score = 0.73). To confirm the reliability of using DNA methylome as a biomarker for high BP, we constructed mixed-effects models and found that 61,694 methylation sites located in 15,523 intragenic regions and 16,754 intergenic regions were significantly associated with BP measures. Our proposed models pioneered the methodology of applying ML and DNA methylome data for early detection of high BP in clinical practices.
DNA甲基化修饰在高血压的病理生理学中起着至关重要的作用。在此,我们应用了三种机器学习(ML)算法,包括深度学习(DL)、支持向量机和随机森林,以利用DNA甲基化组数据检测高血压。在三次就诊时收集了50名老年人的外周血样本,用于DNA甲基化组分析。有高血压病史和/或当前高血压测量值的参与者被视为患有高血压。将整个数据集随机划分以进行嵌套的五组交叉验证,以评估预测性能。每个外部训练集中的数据使用最小-最大缩放器进行独立归一化,使用主成分分析进行降维,然后输入到三种预测算法中。在这三种ML算法中,DL表现最佳(精确率-召回率曲线下面积[AUPRC]=0.65,受试者工作特征曲线下面积[AUROC]=0.73,准确率=0.69,F1分数=0.73)。为了证实将DNA甲基化组用作高血压生物标志物的可靠性,我们构建了混合效应模型,发现位于15523个基因内区域和16754个基因间区域的61694个甲基化位点与血压测量值显著相关。我们提出的模型开创了在临床实践中应用ML和DNA甲基化组数据进行高血压早期检测的方法。