Dogan Meeshanthini V, Grumbach Isabella M, Michaelson Jacob J, Philibert Robert A
Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa, United States of America.
Department of Psychiatry, University of Iowa, Iowa City, Iowa, United States of America.
PLoS One. 2018 Jan 2;13(1):e0190549. doi: 10.1371/journal.pone.0190549. eCollection 2018.
An improved method for detecting coronary heart disease (CHD) could have substantial clinical impact. Building on the idea that systemic effects of CHD risk factors are a conglomeration of genetic and environmental factors, we use machine learning techniques and integrate genetic, epigenetic and phenotype data from the Framingham Heart Study to build and test a Random Forest classification model for symptomatic CHD. Our classifier was trained on n = 1,545 individuals and consisted of four DNA methylation sites, two SNPs, age and gender. The methylation sites and SNPs were selected during the training phase. The final trained model was then tested on n = 142 individuals. The test data comprised of individuals removed based on relatedness to those in the training dataset. This integrated classifier was capable of classifying symptomatic CHD status of those in the test set with an accuracy, sensitivity and specificity of 78%, 0.75 and 0.80, respectively. In contrast, a model using only conventional CHD risk factors as predictors had an accuracy and sensitivity of only 65% and 0.42, respectively, but with a specificity of 0.89 in the test set. Regression analyses of the methylation signatures illustrate our ability to map these signatures to known risk factors in CHD pathogenesis. These results demonstrate the capability of an integrated approach to effectively model symptomatic CHD status. These results also suggest that future studies of biomaterial collected from longitudinally informative cohorts that are specifically characterized for cardiac disease at follow-up could lead to the introduction of sensitive, readily employable integrated genetic-epigenetic algorithms for predicting onset of future symptomatic CHD.
一种改进的检测冠心病(CHD)的方法可能会产生重大的临床影响。基于冠心病危险因素的全身效应是遗传和环境因素的集合这一观点,我们使用机器学习技术,整合来自弗雷明汉心脏研究的遗传、表观遗传和表型数据,构建并测试了一个用于有症状冠心病的随机森林分类模型。我们的分类器在n = 1545名个体上进行训练,由四个DNA甲基化位点、两个单核苷酸多态性(SNP)、年龄和性别组成。甲基化位点和SNP是在训练阶段选择的。然后在n = 142名个体上对最终训练好的模型进行测试。测试数据包括根据与训练数据集中个体的亲缘关系而排除的个体。这个综合分类器能够对测试集中个体的有症状冠心病状态进行分类,准确率、敏感性和特异性分别为78%、0.75和0.80。相比之下,一个仅使用传统冠心病危险因素作为预测指标的模型在测试集中的准确率和敏感性分别仅为65%和0.42,但特异性为0.89。对甲基化特征的回归分析表明我们有能力将这些特征映射到冠心病发病机制中的已知危险因素。这些结果证明了综合方法有效模拟有症状冠心病状态的能力。这些结果还表明,未来对从纵向信息队列中收集的生物材料进行研究,这些队列在随访时专门针对心脏病进行了特征描述,可能会引入敏感、易于应用的综合遗传 - 表观遗传算法来预测未来有症状冠心病的发病。