Xia Xiaoxuan, Weng Haoyi, Men Ruoting, Sun Rui, Zee Benny Chung Ying, Chong Ka Chun, Wang Maggie Haitian
Division of Biostatistics, Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong, SAR, China.
CUHK Shenzhen Research Institute, Shenzhen, China.
BMC Genet. 2018 Sep 17;19(Suppl 1):78. doi: 10.1186/s12863-018-0644-5.
An accumulation of evidence has revealed the important role of epigenetic factors in explaining the etiopathogenesis of human diseases. Several empirical studies have successfully incorporated methylation data into models for disease prediction. However, it is still a challenge to integrate different types of omics data into prediction models, and the contribution of methylation information to prediction remains to be fully clarified.
A stratified drug-response prediction model was built based on an artificial neural network to predict the change in the circulating triglyceride level after fenofibrate intervention. Associated single-nucleotide polymorphisms (SNPs), methylation of selected cytosine-phosphate-guanine (CpG) sites, age, sex, and smoking status, were included as predictors. The model with selected SNPs achieved a mean 5-fold cross-validation prediction error rate of 43.65%. After adding methylation information into the model, the error rate dropped to 41.92%. The combination of significant SNPs, CpG sites, age, sex, and smoking status, achieved the lowest prediction error rate of 41.54%.
Compared to using SNP data only, adding methylation data in prediction models slightly improved the error rate; further prediction error reduction is achieved by a combination of genome, methylation genome, and environmental factors.
越来越多的证据表明表观遗传因素在解释人类疾病的病因发病机制中发挥着重要作用。一些实证研究已成功将甲基化数据纳入疾病预测模型。然而,将不同类型的组学数据整合到预测模型中仍然是一项挑战,甲基化信息对预测的贡献仍有待充分阐明。
基于人工神经网络构建了一个分层药物反应预测模型,以预测非诺贝特干预后循环甘油三酯水平的变化。相关单核苷酸多态性(SNP)、选定的胞嘧啶-磷酸-鸟嘌呤(CpG)位点的甲基化、年龄、性别和吸烟状况被纳入作为预测因子。包含选定SNP的模型在5倍交叉验证中的平均预测错误率为43.65%。在模型中加入甲基化信息后,错误率降至41.92%。显著SNP、CpG位点、年龄、性别和吸烟状况的组合实现了最低的预测错误率,为41.54%。
与仅使用SNP数据相比,在预测模型中加入甲基化数据可略微提高错误率;通过基因组、甲基化基因组和环境因素的组合可进一步降低预测错误率。