Fan Shicai, Li Chengzhe, Ai Rizi, Wang Mengchi, Firestein Gary S, Wang Wei
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China Department of Chemistry and Biochemistry.
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China.
Bioinformatics. 2016 Jun 15;32(12):1773-8. doi: 10.1093/bioinformatics/btw089. Epub 2016 Feb 15.
DNA methylation signatures in rheumatoid arthritis (RA) have been identified in fibroblast-like synoviocytes (FLS) with Illumina HumanMethylation450 array. Since <2% of CpG sites are covered by the Illumina 450K array and whole genome bisulfite sequencing is still too expensive for many samples, computationally predicting DNA methylation levels based on 450K data would be valuable to discover more RA-related genes.
We developed a computational model that is trained on 14 tissues with both whole genome bisulfite sequencing and 450K array data. This model integrates information derived from the similarity of local methylation pattern between tissues, the methylation information of flanking CpG sites and the methylation tendency of flanking DNA sequences. The predicted and measured methylation values were highly correlated with a Pearson correlation coefficient of 0.9 in leave-one-tissue-out cross-validations. Importantly, the majority (76%) of the top 10% differentially methylated loci among the 14 tissues was correctly detected using the predicted methylation values. Applying this model to 450K data of RA, osteoarthritis and normal FLS, we successfully expanded the coverage of CpG sites 18.5-fold and accounts for about 30% of all the CpGs in the human genome. By integrative omics study, we identified genes and pathways tightly related to RA pathogenesis, among which 12 genes were supported by triple evidences, including 6 genes already known to perform specific roles in RA and 6 genes as new potential therapeutic targets.
The source code, required data for prediction, and demo data for test are freely available at: http://wanglab.ucsd.edu/star/LR450K/ CONTACT: wei-wang@ucsd.edu or gfirestein@ucsd.edu
Supplementary data are available at Bioinformatics online.
利用Illumina HumanMethylation450芯片已在类风湿关节炎(RA)的成纤维样滑膜细胞(FLS)中鉴定出DNA甲基化特征。由于Illumina 450K芯片仅覆盖不到2%的CpG位点,且全基因组亚硫酸氢盐测序对许多样本来说仍然过于昂贵,因此基于450K数据通过计算预测DNA甲基化水平对于发现更多与RA相关的基因将具有重要价值。
我们开发了一种计算模型,该模型使用全基因组亚硫酸氢盐测序和450K芯片数据在14种组织上进行训练。该模型整合了来自组织间局部甲基化模式相似性、侧翼CpG位点的甲基化信息以及侧翼DNA序列的甲基化倾向的信息。在留一组织交叉验证中,预测的甲基化值与测量值高度相关,皮尔逊相关系数为0.9。重要的是,使用预测的甲基化值正确检测出了14种组织中前10%差异甲基化位点中的大多数(76%)。将该模型应用于RA、骨关节炎和正常FLS的450K数据,我们成功地将CpG位点的覆盖范围扩大了18.5倍,约占人类基因组中所有CpG的30%。通过综合组学研究,我们鉴定出了与RA发病机制密切相关的基因和通路,其中12个基因得到了三重证据的支持,包括6个已知在RA中发挥特定作用的基因和6个作为新潜在治疗靶点的基因。
源代码、预测所需数据和测试演示数据可在以下网址免费获取:http://wanglab.ucsd.edu/star/LR450K/
wei-wang@ucsd.edu或gfirestein@ucsd.edu
补充数据可在《生物信息学》在线获取。